Double Rounding Errors in Decimal to Double to Float Conversions

In my previous exploration of double rounding errors in decimal to float conversions I showed two decimal numbers that experienced a double rounding error when converted to float (single-precision) through an intermediate double (double-precision). I generated the examples indirectly by setting bit combinations that forced the error, using their corresponding exact decimal representations. As a result, the decimal numbers were long (55 digits each). Mark Dickinson derived a much shorter 17 digit example, but I hadn’t contemplated how to generate even shorter numbers — or whether they existed at all — until Per Vognsen wrote me recently to ask.

The easiest way for me to approach Per’s question was to search for examples, rather than try to find a way to construct them. As such, I wrote a simple Kotlin1 program to generate decimal strings and check them. I tested all float-range (including subnormal) decimal numbers of 9 digits or fewer, and tens of billions of random 10 to 17 digit float-range (normal only) numbers. I found example 7 to 17 digit numbers that, when converted to float through a double, suffer a double rounding error.

Examples

Here are examples I found:

Double Rounding Error Examples
Decimal # To Float To Double To Double to Float
1.3006255030632019 17 0x1.4cf5cap0 0x1.4cf5cbp0 0x1.4cf5ccp0
6.467822313308716 16 0x1.9df0cep2 0x1.9df0cdp2 0x1.9df0ccp2
0.0691026858985424 15 0x1.1b0b6ap-4 0x1.1b0b6bp-4 0x1.1b0b6cp-4
0.025306879542768 14 0x1.9ea0bep-6 0x1.9ea0bfp-6 0x1.9ea0c0p-6
4.456769842065e-9 13 0x1.324452p-28 0x1.324453p-28 0x1.324454p-28
5.79090352403e-4 12 0x1.2f9c32p-11 0x1.2f9c31p-11 0x1.2f9c30p-11
3.0128387285e-10 11 0x1.4b43dep-32 0x1.4b43dfp-32 0x1.4b43e0p-32
7.582917533e-5 10 0x1.3e0cf6p-14 0x1.3e0cf5p-14 0x1.3e0cf4p-14
9.67498269e-11 9 0x1.a9829ep-34 0x1.a9829fp-34 0x1.a982a0p-34
4.1358803e34 8 0x1.fdc95ep114 0x1.fdc95fp114 0x1.fdc960p114
7.038531E-26 7 0x1.5c87fap-84 0x1.5c87fbp-84 0x1.5c87fcp-84

A quick way to see the rounding errors is to look at the hexadecimal floating-point representations of the conversions. The decimal to float and decimal to double to float values differ by one ULP. (Remember that when used to represent a float, a 0 must be tacked on to fill out the last hex digit. This makes it appear like the float has 25 bits of precision, not 24, and that the double rounding error is two ULPs, when it is only ever one.)

You can verify the decimal to float and decimal to double conversions with my decimal to floating-point converter.

0.0691026858985424: Rounds up instead of down

To understand those hexadecimal floating-point representations of the conversions better, consider the full binary representation of 0.0691026858985424:

0.00010001101100001011011010101111111111111111111111111111101100...

The single-precision rounding bit, significant bit 25 (the first bit highlighted), is 0, so the full binary representation rounds down to 0.000100011011000010110110101 when converted correctly to a float.

However, a double rounding error occurs when converting first to a double, and then from the double to a float. The double-precision rounding bit, significant bit 54 (the second bit highlighted), is 1, and there are 1 bits after it, so it rounds up to 0.0001000110110000101101101011 as a double. The last bit of the double, which would be bit 25 of a float, is 1, so round-to-nearest-even dictates it rounds up to a float as 0.00010001101100001011011011.

Here’s a summary of those three binary representations:

float:           0.000100011011000010110110101
double:          0.0001000110110000101101101011
double to float: 0.000100011011000010110110110

5.79090352403e-4: Rounds down instead of up

The full binary representation of 5.79090352403e-4 is

0.000000000010010111110011100001100010000000000000000000000000000000001...

The pattern of bits between bits 24 and 54 is the “complement” of the first example: bit 24 is 0, bit 25 is 1, bits 26-53 are 0s, and bit 54 is a 0. A correctly converted float rounds up, but converted through an intermediate double it rounds down and then down again due to round-to-nearest-even.

About the Examples

There’s nothing special about the examples except that I chose ones with the smallest absolute value exponent.

Having tested all numbers with 9 digits or fewer, I found only one 7-digit number, nine 8-digit numbers, and 51 9-digit numbers. One of the 8-digit numbers was the 7-digit number with a trailing 0, and nine of the 9-digit numbers were a corresponding 8-digit number with a trailing 0.

Examples were hard to come by with testing of randomly generated decimal strings, although longer examples surfaced more frequently. I found 17-digit numbers roughly on the order of one per billion, whereas I found 10-digit numbers roughly on the order of one per ten billion. (Don’t quote me on those ratios: I did not run enough tests to establish them with high confidence.) Is this because shorter decimal numbers are sparser around eligible double-precision numbers?

While all the double rounding errors I found are verifiable, it’s possible that an incorrect conversion in Java (Kotlin uses Java’s FloatingDecimal class for conversions) could have missed some. (I have generally assumed these days though that FloatingDecimal is correct.)

Double Rounding Error Bit Patterns

Here are the significant bits of the examples above:

Binary Analysis of Double Rounding Error Examples
Decimal Significant Bits of Full Binary Representation
1.3006255030632019 10100110011110101110010101111111111111111111111111111111110…
6.467822313308716 11001110111110000110011010000000000000000000000000000001100…
0.0691026858985424 10001101100001011011010101111111111111111111111111111101100…
0.025306879542768 11001111010100000101111101111111111111111111111111111100011…
4.456769842065e-9 10011001001000100010100101111111111111111111111111111100010…
5.79090352403e-4 10010111110011100001100010000000000000000000000000000000001…
3.0128387285e-10 10100101101000011110111101111111111111111111111111111100011…
7.582917533e-5 10011111000001100111101010000000000000000000000000000001000…
9.67498269e-11 11010100110000010100111101111111111111111111111111111110101…
4.1358803e34 11111110111001001010111101111111111111111111111111111101001…
7.038531E-26 10101110010000111111110101111111111111111111111111111110011…

These examples demonstrate two patterns in the significant bits of the full binary representation of a decimal number that cause these double rounding errors:

  • Double round up pattern:
    • Bit 24 is 1
    • Bit 25 is 0
    • Bits 26-53 are 1s
    • Bit 54 is 1
  • Double round down pattern:
    • Bit 24 is 0
    • Bit 25 is 1
    • Bits 26-53 are 0s
    • Bit 54 is 0
    • At least one bit after bit 54 is 1

The double round up pattern rounds the float up when it should be rounded down; the double round down pattern rounds the float down when it should be rounded up. Both patterns create a double that becomes a float halfway case.

In the double round up case, the rounding up propagates all the way down to bit 25, setting it to 1 and all bits above it to 0. This in turn sets up the float rounding, which is a halfway case that’s resolved by rounding up to the nearest even value. In the double round down case, the rounding down to double removes bits 54 and above, information essential to proper float rounding.

A double round up scenario only triggered by exact decimal numbers

In the double round up pattern, the value of bits 55 and beyond are irrelevant. But to see a case where all those bits are 0s, you have to construct an exactly representable example, such as:

  • 0.500000089406967107574786268742172978818416595458984375, which is 2-1 + 2-24 + 2-25 2-53 + 2-54
  • 9007200865353727, which is 253 + 230 + 229 21 + 20

To get a string of 1s from bits 26 to 53, the first example uses 2-25 2-53, and the second example uses 229 21.

(In this case, if you round the first example to 17 digits, 0.50000008940696711, you still get the double rounding error, although there won’t be all 0s beyond bit 55.)

A third pattern

There is a third double rounding error pattern that’s a variation of the double round down pattern, but it’s not demonstrated in the examples:

  • Double round down alternative pattern:
    • Bit 24 is 0
    • Bit 25 is 1
    • Bits 26-53 are 0s
    • Bit 54 is 1
    • All bits after bit 54 are 0

To invoke this scenario you also have to construct an exactly representable example, like:

  • 0.500000029802322443206463731257827021181583404541015625, which is 2-1 + 2-25 + 2-54
  • 9007199791611905, which is 253 + 229 + 20

(If you round the first example to 17 digits, 0.50000002980232244, you still get the double rounding error, but the bit pattern switches from the double round down alternative pattern to the double round down pattern.)

Fast Path Conversions (the prompt for this article)

Per was interested in fast path conversion, where numbers with 15 decimal digits or fewer and power-of-ten exponents between -22 and 22 (relative to the decimal digits viewed as an integer) can be converted correctly using just double-precision arithmetic. Specifically, he wanted to know if double rounding errors were possible if you used double-precision fast path as an intermediate for converting to float.

The shortest fast path case is 9 digits, and there are five of them among the 51 9-digit numbers: 9.67498269e-11 (described below), 5.85052973e21, 9.49766107e23, 8.04624287e26, and 8.96981543e28. (Remember, I tested all numbers through 9 digits.) Two of the nine 8-digit numbers though, 4.1358803e34 and 8.2717606e34, are fast path case in disguise, meaning zeros can be tacked on to make each a fast path case: 4135880300000e22 and 8271760600000e22, respectively.

Here are the 9-15 digit examples from above (which I selected in the first place because they are fast-path eligible), expressed as a fast path calculation:

Fast Path Calculation of Double Rounding Error Examples
Decimal Significant Digits Power of Ten IEEE Operation
0.0691026858985424 691026858985424 1016 Divide
0.025306879542768 25306879542768 1015 Divide
4.456769842065e-9 4456769842065 1021 Divide
5.79090352403e-4 579090352403 1015 Divide
3.0128387285e-10 30128387285 1020 Divide
7.582917533e-5 7582917533 1014 Divide
9.67498269e-11 967498269 1019 Divide

(It’s just arbitrary that all the examples I selected require division, and that none require multiplication.)

Summary

The necessary condition that sets up a double rounding error is a double representation of a decimal number that rounds to a tie-breaking rounding case for a float. If the float representation in turn is rounded in the same direction as the decimal to double conversion, the error occurs.

To be correct, the original decimal number (its binary representation that is) must be rounded relative to bits 25 and beyond. If those bits are altered in a certain way by the rounding to the intermediate double, rounding goes in the opposite direction than intended.

Endnote

While finishing this article I thought of a way to randomly generate double rounding cases directly, using Java’s BigDecimal. You can generate long, exactly represented decimal numbers and round them down (to 17 digits, 16 digits, etc.) until you find the shortest one that still causes the double rounding error.

1Why Kotlin? It’s just the language I’ve been using lately.

Dingbat

Copyright © 2008-2024 Exploring Binary

Privacy policy

Powered by WordPress

css.php