In my previous exploration of double rounding errors in decimal to float conversions I showed two decimal numbers that experienced a double rounding error when converted to float (single-precision) through an intermediate double (double-precision). I generated the examples indirectly by setting bit combinations that forced the error, using their corresponding exact decimal representations. As a result, the decimal numbers were long (55 digits each). Mark Dickinson derived a much shorter 17 digit example, but I hadn’t contemplated how to generate even shorter numbers — or whether they existed at all — until Per Vognsen wrote me recently to ask.
The easiest way for me to approach Per’s question was to search for examples, rather than try to find a way to construct them. As such, I wrote a simple Kotlin1 program to generate decimal strings and check them. I tested all float-range (including subnormal) decimal numbers of 9 digits or fewer, and tens of billions of random 10 to 17 digit float-range (normal only) numbers. I found example 7 to 17 digit numbers that, when converted to float through a double, suffer a double rounding error.
Here are examples I found:
|Decimal||#||To Float||To Double||To Double to Float|
A quick way to see the rounding errors is to look at the hexadecimal floating-point representations of the conversions. The decimal to float and decimal to double to float values differ by one ULP. (Remember that when used to represent a float, a 0 must be tacked on to fill out the last hex digit. This makes it appear like the float has 25 bits of precision, not 24, and that the double rounding error is two ULPs, when it is only ever one.)
You can verify the decimal to float and decimal to double conversions with my decimal to floating-point converter.
0.0691026858985424: Rounds up instead of down
To understand those hexadecimal floating-point representations of the conversions better, consider the full binary representation of 0.0691026858985424:
The single-precision rounding bit, significant bit 25 (the first bit highlighted), is 0, so the full binary representation rounds down to 0.000100011011000010110110101 when converted correctly to a float.
However, a double rounding error occurs when converting first to a double, and then from the double to a float. The double-precision rounding bit, significant bit 54 (the second bit highlighted), is 1, and there are 1 bits after it, so it rounds up to 0.0001000110110000101101101011 as a double. The last bit of the double, which would be bit 25 of a float, is 1, so round-to-nearest-even dictates it rounds up to a float as 0.00010001101100001011011011.
Here’s a summary of those three binary representations:
float: 0.000100011011000010110110101 double: 0.0001000110110000101101101011 double to float: 0.000100011011000010110110110
5.79090352403e-4: Rounds down instead of up
The full binary representation of 5.79090352403e-4 is
The pattern of bits between bits 24 and 54 is the “complement” of the first example: bit 24 is 0, bit 25 is 1, bits 26-53 are 0s, and bit 54 is a 0. A correctly converted float rounds up, but converted through an intermediate double it rounds down and then down again due to round-to-nearest-even.
About the Examples
There’s nothing special about the examples except that I chose ones with the smallest absolute value exponent.
Having tested all numbers with 9 digits or fewer, I found only one 7-digit number, nine 8-digit numbers, and 51 9-digit numbers. One of the 8-digit numbers was the 7-digit number with a trailing 0, and nine of the 9-digit numbers were a corresponding 8-digit number with a trailing 0.
Examples were hard to come by with testing of randomly generated decimal strings, although longer examples surfaced more frequently. I found 17-digit numbers roughly on the order of one per billion, whereas I found 10-digit numbers roughly on the order of one per ten billion. (Don’t quote me on those ratios: I did not run enough tests to establish them with high confidence.) Is this because shorter decimal numbers are sparser around eligible double-precision numbers?
While all the double rounding errors I found are verifiable, it’s possible that an incorrect conversion in Java (Kotlin uses Java’s FloatingDecimal class for conversions) could have missed some. (I have generally assumed these days though that FloatingDecimal is correct.)
Double Rounding Error Bit Patterns
Here are the significant bits of the examples above:
|Decimal||Significant Bits of Full Binary Representation|
These examples demonstrate two patterns in the significant bits of the full binary representation of a decimal number that cause these double rounding errors:
- Double round up pattern:
- Bit 24 is 1
- Bit 25 is 0
- Bits 26-53 are 1s
- Bit 54 is 1
- Double round down pattern:
- Bit 24 is 0
- Bit 25 is 1
- Bits 26-53 are 0s
- Bit 54 is 0
- At least one bit after bit 54 is 1
The double round up pattern rounds the float up when it should be rounded down; the double round down pattern rounds the float down when it should be rounded up. Both patterns create a double that becomes a float halfway case.
In the double round up case, the rounding up propagates all the way down to bit 25, setting it to 1 and all bits above it to 0. This in turn sets up the float rounding, which is a halfway case that’s resolved by rounding up to the nearest even value. In the double round down case, the rounding down to double removes bits 54 and above, information essential to proper float rounding.
A double round up scenario only triggered by exact decimal numbers
In the double round up pattern, the value of bits 55 and beyond are irrelevant. But to see a case where all those bits are 0s, you have to construct an exactly representable example, such as:
- 0.500000089406967107574786268742172978818416595458984375, which is 2-1 + 2-24 + 2-25 – 2-53 + 2-54
- 9007200865353727, which is 253 + 230 + 229 – 21 + 20
To get a string of 1s from bits 26 to 53, the first example uses 2-25 – 2-53, and the second example uses 229 – 21.
(In this case, if you round the first example to 17 digits, 0.50000008940696711, you still get the double rounding error, although there won’t be all 0s beyond bit 55.)
A third pattern
There is a third double rounding error pattern that’s a variation of the double round down pattern, but it’s not demonstrated in the examples:
- Double round down alternative pattern:
- Bit 24 is 0
- Bit 25 is 1
- Bits 26-53 are 0s
- Bit 54 is 1
- All bits after bit 54 are 0
To invoke this scenario you also have to construct an exactly representable example, like:
- 0.500000029802322443206463731257827021181583404541015625, which is 2-1 + 2-25 + 2-54
- 9007199791611905, which is 253 + 229 + 20
(If you round the first example to 17 digits, 0.50000002980232244, you still get the double rounding error, but the bit pattern switches from the double round down alternative pattern to the double round down pattern.)
Fast Path Conversions (the prompt for this article)
Per was interested in fast path conversion, where numbers with 15 decimal digits or fewer and power-of-ten exponents between -22 and 22 (relative to the decimal digits viewed as an integer) can be converted correctly using just double-precision arithmetic. Specifically, he wanted to know if double rounding errors were possible if you used double-precision fast path as an intermediate for converting to float.
The shortest fast path case is 9 digits, and there are five of them among the 51 9-digit numbers: 9.67498269e-11 (described below), 5.85052973e21, 9.49766107e23, 8.04624287e26, and 8.96981543e28. (Remember, I tested all numbers through 9 digits.) Two of the nine 8-digit numbers though, 4.1358803e34 and 8.2717606e34, are fast path case in disguise, meaning zeros can be tacked on to make each a fast path case: 4135880300000e22 and 8271760600000e22, respectively.
Here are the 9-15 digit examples from above (which I selected in the first place because they are fast-path eligible), expressed as a fast path calculation:
|Decimal||Significant Digits||Power of Ten||IEEE Operation|
(It’s just arbitrary that all the examples I selected require division, and that none require multiplication.)
The necessary condition that sets up a double rounding error is a double representation of a decimal number that rounds to a tie-breaking rounding case for a float. If the float representation in turn is rounded in the same direction as the decimal to double conversion, the error occurs.
To be correct, the original decimal number (its binary representation that is) must be rounded relative to bits 25 and beyond. If those bits are altered in a certain way by the rounding to the intermediate double, rounding goes in the opposite direction than intended.
While finishing this article I thought of a way to randomly generate double rounding cases directly, using Java’s BigDecimal. You can generate long, exactly represented decimal numbers and round them down (to 17 digits, 16 digits, etc.) until you find the shortest one that still causes the double rounding error.
1Why Kotlin? It’s just the language I’ve been using lately.