In the 1960s, I. Bennett Goldberg and David W. Matula published papers relating floating-point number systems of different bases, showing the conditions under which conversions between them round-trip; that is, when conversion to another base and back returns the original number. Independently, both authors derived the formula that specifies the number of significant digits required for round-trip conversions.

In his paper “27 Bits Are Not Enough for 8-Digit Accuracy”, Goldberg shows the formula in the context of decimal to binary floating-point conversions. He starts with a simple example — a 7-bit binary floating-point system — and shows that it does not have enough precision to round-trip all 2-digit decimal floating-point numbers. I took his example and put it into diagrams, giving you a high level view of what governs round-trip conversions. I also extended his example to show that the same concept applies to binary to decimal floating-point round-trips.

The well-known digit counts for round-trip conversions to and from IEEE 754 floating-point are dictated by these same principles.

## Seven Bits Looks Like Enough But It Isn’t

With 7 bits, you can represent 2^{7} = 128 binary integers. With two digits, you can represent 10^{2} = 100 decimal integers. Clearly, seven bits is enough to represent two-digit integers. But floating-point numbers need to be analyzed differently; the spacing of decimal and binary floating-point numbers comes into play.

## Overlapping Gaps

To understand why two digits can’t round-trip through seven bits, you have to look at how the decimal and binary spacings interact. Decimal spacing increases between increasing powers of ten, and binary spacing increases between increasing powers of two. Since powers of ten and powers of two are interleaved, the relative spacing of decimal and binary floating-point numbers is going to change at power of ten and power of two boundaries.

In the example system, there are 90 decimal numbers (and 90 gaps) between powers of ten, and 64 binary numbers (and 64 gaps) between powers of two. Because of the interleaving of powers, these will divided into regions at power of ten and power of two boundaries. This table shows how this plays out for the range [1,10000):

Spacing | Count | |||
---|---|---|---|---|

Region | Decimal | Binary | Decimal | Binary |

[1,2) | 0.1 | 0.015625 | 10 | 64 |

[2,4) | 0.1 | 0.03125 | 20 | 64 |

[4,8) | 0.1 | 0.0625 | 40 | 64 |

[8,10) | 0.1 | 0.125 | 20 | 16 |

[10,16) | 1 | 0.125 | 6 | 48 |

[16,32) | 1 | 0.25 | 16 | 64 |

[32,64) | 1 | 0.5 | 32 | 64 |

[64,100) | 1 | 1 | 36 | 36 |

[100,128) | 10 | 1 | 3 | 28 |

[128,256) | 10 | 2 | 13 | 64 |

[256,512) | 10 | 4 | 26 | 64 |

[512,1000) | 10 | 8 | 48 | 61 |

[1000,1024) | 100 | 8 | 1 | 3 |

[1024,2048) | 100 | 16 | 10 | 64 |

[2048,4096) | 100 | 32 | 20 | 64 |

[4096,8192) | 100 | 64 | 41 | 64 |

[8192,10000) | 100 | 128 | 18 | 15 |

The numbers highlighted in red show where the problems occur; in these regions, binary numbers are spaced further apart than decimal numbers, and so there are fewer of them.

## 20 Decimal Numbers Map 16 Binary Numbers

In the region [8,10) there are 20 decimal numbers, spaced 0.1 apart: 8.0, 8.1, 8.2, … , 9.8, 9.9. Also in this region are 16 binary numbers, spaced 0.125 apart: 1000, 1000.001, 1000.01, … , 1001.11, 1001.111. (It’s easier to compare decimal and binary numbers when binary numbers are written in decimal; conveniently, all binary numbers have exact decimal representations, so let’s write these as 8.0, 8.125, 8.25, … , 9.75, 9.875.) Since we have more decimal numbers than binary numbers, some decimal numbers will convert (i.e., round) to the same binary number. When such a binary number is converted back to decimal, only one of the multiple decimal numbers can be chosen — the one closest to it. The ones not chosen are not recoverable.

Let’s see what this looks like:

In this example, there are four pairs of duplicates, and in each case each pair is equally close to the binary number it maps to. (It’s just a coincidence in this example that all the duplicates map to a halfway point in this region; for instance, this isn’t the case for the duplicates in [8192,10000), which I have not shown.) Regardless of how halfway cases are rounded back to decimal, only four of the eight numbers will be recoverable, so only 16 of the 20 decimal numbers will round-trip.

## So How Many Bits Do You Need?

Informally, “you need one more bit than you think you need” to round-trip numbers. In this case, that’s 8 bits. (I will not state or derive the formula in this article.) Here’s how the two-digit numbers map to 8 bits:

Every decimal number in this region converts to a unique binary number, which means they all will round-trip. And it turns out 8 bits covers *all* two-digit numbers, with any exponent.

## Review: Gap Size Is Key

The relative gap size between decimal and binary floating-point numbers determines whether decimal numbers will round-trip through binary floating-point. For a region containing only integers, the gap size for both sets of numbers is 1, and the gaps will line up (integers convert exactly). In all other regions, the gaps will be different. Occasionally, endpoints of gaps will line up — for numbers like 8.5 with exact binary representations. All other numbers will not line up and hence need to be rounded to the closest binary number.

To round-trip these numbers, the gaps between decimal numbers must be greater than the gaps between binary numbers. Since gap size changes at power of ten and power of two boundaries, the relative spacing between the two sets of numbers is constantly changing. There must be enough binary precision to ensure that the binary gaps are smaller over the entire range of exponents.

## 3 Digits Are Not Enough For 7-Bit Accuracy

This analysis works equally well for round-trip conversions originating from the binary side.

To round-trip 7-bit numbers, we know right away that two digits is not enough: 2^{7} = 128 > 10^{2} = 100. You may think three digits is enough since 128 < 10^{3} = 1000, but it isn’t.

In the region [0.1,0.125), decimal numbers are spaced 0.001 apart, and binary numbers are spaced 0.0009765625 apart. These gaps are very close, but the decimal numbers are spaced further apart. This sets the stage for duplicate mappings. I examined the values in that region and found one pair of binary numbers that convert to the same decimal number: 0.000110101 (0.103515625) and 0.0001101011 (0.1044921875) both convert to 0.104. 0.103515625 is closer, so 0.1044921875 is not recoverable.

It turns out that *four* digits is enough. In the example, 0.103515625 would round to 0.1035, and 0.1044921875 would round to 0.1045.

## Round-Trips To and From IEEE 754 Floating-Point

The concepts behind this example explain the well-known digit counts for round-trip conversions to and from IEEE 754 floating-point:

- Decimal floating-point numbers of 15 significant digits or less will round-trip through double-precision binary floating-point.
- Decimal floating-point numbers of 6 significant digits or less will round-trip through single-precision binary floating-point.
- Double-precision numbers converted to decimal floating-point numbers of 17 significant digits or more will round-trip.
- Single-precision numbers converted to decimal floating-point numbers of 9 significant digits or more will round-trip.

I wonder what the round-trip digit counts are for half precision, i.e. IEEE-754 binary16. I don’t recall seeing that information published anywhere. The reason I am asking is because recently half precision has been mentioned frequently in connection with deep learning applications.

It looks like the round-trip digit counts for IEEE-754 binary16 may be {3, 5}, but a rigorous proof of that would be welcome. As for IEEE-754 binary128, a.k.a. quadruple precision, sources seem to disagree. A write-up by W. Kahan suggests the round-trip digit counts are {33, 36} [http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF], while this standard proposal suggests the numbers are {34, 36} [http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1822.pdf].

@njuffa,

The formula, which I have not discussed, derives from this inequality, as stated in I. Goldberg’s paper: for decimal->floating-point->decimal round-trips, if 10

^{p}< 2^{q-1}, then q bits are enough for p-digit accuracy.So if binary16 is 11 bits, the longest decimal you can round-trip is 3 digits (just play around with the numbers; you can solve the inequality using logarithms though). If binary128 is 113 bits, then 33 digits is the max.

For floating-point->decimal->floating-point, you can just switch the bases around (not stated in I. Goldberg’s paper): 2

^{r}< 10^{s-1}. So for binary 16, you need at least 5 digits; for binary128, you need at least 36 digits.So I agree with {3, 5} and {33, 36}, respectively.

@njuffa,

I contacted the author of http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1822.pdf and he in fact confirmed that 34 is an error (should be 33).

A very clear explanation of floating-point characteristics. Thank you for an excellent collection of articles.