Every double-precision floating-point number can be specified with 17 significant decimal digits or less. A simple way to generate this 17-digit number is to round the full-precision decimal value of the double to 17 digits. For example, the double-precision value 0x1.6d4c11d09ffa1p-3, which in decimal is 1.783677474777478899614635565740172751247882843017578125 x 10-1, can be recovered from the decimal floating-point literal 1.7836774747774789e-1. The extra digits are unnecessary, since they will only take you to the same double.
On the other hand, an arbitrary, arbitrarily long decimal literal rounded or truncated to 17 digits may not convert to the double-precision value it’s supposed to. This is a subtle point, one that has even tripped up implementers of widely used decimal to floating-point conversion routines (glibc strtod() and Visual C++ strtod(), for example).
Deciding Which Floating-Point Number Is Closest
The job of a decimal to floating-point conversion routine is to pick the floating-point number that is closest to the decimal input. The hardest inputs to convert correctly are those that are halfway or near halfway between consecutive floating-point numbers. These are the very inputs for which more than 17 digits — for doubles, up to 768 of them! — may need to be parsed. The extra digits, despite how little they contribute, can put you just at or beyond the halfway point.
Here’s the interesting thing: we know every double has a 17-digit representative that maps to it (actually there are multiple, but for our purposes we’ll consider only one, the nearest), but it may take more than 17 digits of a decimal input to figure out which 17-digit representative to choose! But once we’ve processed our long decimal input, we could replace it with its simpler, 17-digit representative. We’ve still got the correct double, but now our “handle” to it is unambiguously close. It’s like we are using a more precise number to decide which less precise number to pick, although that extra precision is just an illusion.
Example 1: Just Above Halfway
3.08984926168550152811e-32, an example near-halfway case from Vern Paxson’s paper “A Program for Testing IEEE Decimal–Binary Conversion”, is a 21-digit number that converts to 0x1.40de48676653bp-105. Rounded (same as truncated in this case) to 17 digits it is 3.0898492616855015e-32. However, that converts to 0x1.40de48676653ap-105, which is one binary ULP below the correct answer. Even rounded (truncated) to 20 digits — 3.0898492616855015281e-32 — we still come up one ULP short. So we need all 21 digits.
The problem is that this input converts to a number very nearly halfway between two double-precision numbers. (It is slightly above halfway). Here are its first 130 digits in binary to illustrate this (bits 54 through 130 are highlighted):
1.010000001101111001001000011001110110011001010011101010000000000000000000000000000000000000000000000000000000000000000000000000001…p-105
Being so close to halfway, those extra decimal digits are needed to decide which way to go. Ignore those digits, and the conversion will land below the halfway point.
A 17-digit stand-in
The full decimal value of the desired floating-point number is
3.08984926168550180180110631344083416478369964008307296403646973245891422971698357636226306421889375997125171124935150146484375e-32
Rounded to 17 digits, it’s 3.0898492616855018e-32; that converts to the desired floating-point number.
Example 2: Halfway
1.00000000000000033306690738754696212708950042724609375 is an example I concocted using my decimal/binary converter; it is halfway between consecutive doubles:
1.00000000000000000000000000000000000000000000000000011
It is 54 digits, and all 54 are needed to decide that the correct conversion is 0x1.0000000000002p0 (round half to even) and not 0x1.0000000000001p0. Rounding (truncating) to 17 digits — 1.0000000000000003 — does not work.
A 17-digit stand-in
The full decimal value of the desired floating-point number is
1.000000000000000444089209850062616169452667236328125
Rounded to 17 digits, it’s 1.0000000000000004, which converts to the desired floating-point number.
Example 3: Just Below Halfway
8.36168422905420598437e-214, another 21-digit example from Vern Paxson’s paper, is a little less than halfway between two doubles. Here it is in binary:
1.00100000010000000011101001100010100010101001110010100111111111111111111111111111111111111111111111111111111111111111111111111110…p-708
It converts to 0x1.20403a628a9cap-708. But rounded to 17 digits — 8.361684229054206e-214 — it converts up to 0x1.20403a628a9cbp-708. (If you truncated instead of rounded, you would get the correct answer.) Again, assuming you are rounding and not truncating, you need to round it to 18 digits to convert it correctly: 8.36168422905420598e-214.
A 17-digit stand-in
The full decimal value of the desired floating-point number is
8.36168422905420515990295749156693…(509 digits not shown)…0625e-214
Rounded to 17 digits, it’s 8.3616842290542052e-214.
Example 4: sqrt(2)
In case you were thinking I could come up with only contrived examples, take a look at sqrt(2) ≈ 1.414213562373095048801688… . In binary, it’s
1.01101010000010011110011001100111111100111011110011001001… ,
which you can see is somewhat close to halfway. If you test it, you will find that you need 18 digits (rounded to 1.41421356237309505 or truncated to 1.41421356237309504) to get the correction conversion; 1.414213562373095 (rounded/truncated to 17 digits) does not work.
A 17-digit stand-in
The full decimal value of the desired floating-point number is
1.4142135623730951454746218587388284504413604736328125
Rounded to 17 digits, it’s 1.4142135623730951.
Example 5: pi/3
As another realistic example, consider pi/3 ≈ 1.047197551196597746154214… . In binary it’s
1.0000110000010101001000111000001011010111001101100101100001…
which is even closer to halfway than sqrt(2).
19 digits — 1.047197551196597746 — are required to convert it correctly; 1.0471975511965977 (rounded/truncated to 17 digits) does not work.
A 17-digit stand-in
The full decimal value of the desired floating-point number is
1.047197551196597853362391106202267110347747802734375
Rounded to 17 digits, it’s 1.0471975511965979.
Single-Precision
The same issue applies to floats of course. Whereas you can always find a 9-digit decimal stand-in for a float, you may need more digits than that to convert it correctly.
Summary
When entering decimal literals into a computer program, you need to be aware that you may need more than 17 (9) digits to get the correct conversion. Once you know what a given decimal input converts to, it’s easy to find its 17 (9) digit stand-in. But until you convert it, you must assume you need all the digits you have — or even more if your value represents an infinite decimal. Unless you are willing to do an analysis like mine, you won’t know how many digits you need.
(Please let me know if you know of any “underspecified” literals in real code.)
Further Reading
Please check out the related article by “carolomeetsbarolo”: “Mathematical Constants in Program Code”.
In the first line “Every double-precision floating-point number can be specified with 17 significant decimal digits or less.”
For binary-decimal-binary conversions, shouldn’t it be 17 significant decimal digits or more.
In the link on that line, in the table titled “Digit Ranges For Round-Trips To and From IEEE Binary Floating-Point”, it gives the minimum digits for
binary-decimal-binary round-trip for a double type as 17.
Thanks
If it is decimal-binary-decimal, then shouldn’t it be 15 significant decimal digits or less.
@Sean,
The article you refer to shows that any decimal representation of a double that is rounded to 17 digits or more will round-trip; in that context, 17 digits is the minimum. In other articles I show that some doubles can be round-tripped with less than 17 digits. So in the big picture 17 digits is the maximum you need, and thus I am correct in saying “Every double-precision floating-point number can be specified with 17 significant decimal digits or less.”
For the other direction you’d say “Any decimal number of 15 significant decimal digits or less will round-trip through a double.”
I think part of what may make this confusing is that the two directions are asymmetric: binary-decimal-binary is from a fixed number of bits to an unlimited number of digits; decimal-binary-decimal is from an unlimited number of digits to a fixed number of bits.