The maximum digit counts are useful if you want to print the full decimal value of a floating-point number (worst case format specifier and buffer size) or if you are writing or trying to understand a decimal to floating-point conversion routine (worst case number of input digits that must be converted).
The binary fraction 0.101 converts to the decimal fraction 0.625; the binary fraction 0.1010001 converts to the decimal fraction 0.6328125; the binary fraction 0.00111011011 converts to the decimal fraction 0.23193359375. In each of those examples, the binary fraction converts to a decimal fraction — that is, a terminating decimal representation — that has the same number of digits as the binary fraction has bits.
One digit per bit? We know that’s not true for binary integers. But it is true for binary fractions; every binary fraction of length n has a corresponding equivalent decimal fraction of length n.
This is the reason why you get all those “extra” digits when you print the full decimal value of an IEEE binary floating-point fraction, and why glibc strtod() and Visual C++ strtod() were once broken.
In a computer, decimal floating-point numbers are converted to binary floating-point numbers for calculation, and binary floating-point numbers are converted to decimal floating-point numbers for display or storage. In general, these conversions are inexact; they are rounded, and rounding is governed by the spacing of numbers in each set.
Floating-point numbers are unevenly spaced, and the spacing varies with the base of the number system. Binary floating-point numbers have power of two sized gaps that change size at power of two boundaries. Decimal floating-point numbers are similarly spaced, but with power of ten sized gaps changing size at power of ten boundaries. In this article, I will discuss the spacing of decimal floating-point numbers.
An IEEE 754 binary floating-point number is a number that can be represented in normalized binary scientific notation. This is a number like 1.00000110001001001101111 x 2-10, which has two parts: a significand, which contains the significant digits of the number, and a power of two, which places the “floating” radix point. For this example, the power of two turns the shorthand 1.00000110001001001101111 x 2-10 into this ‘longhand’ binary representation: 0.000000000100000110001001001101111.
The significands of IEEE binary floating-point numbers have a limited number of bits, called the precision; single-precision has 24 bits, and double-precision has 53 bits. The range of power of two exponents is also limited: the exponents in single-precision range from -126 to 127; the exponents in double-precision range from -1022 to 1023. (The example above is a single-precision number.)
Limited precision makes binary floating-point numbers discontinuous; there are gaps between them. Precision determines the number of gaps, and precision and exponent together determine the size of the gaps. Gap size is the same between consecutive powers of two, but is different for every consecutive pair.
(Updated June 22, 2015: added a tenth display form, “decimal integer times a power of ten”.)
In the strictest sense, converting a decimal number to binary floating-point means putting it in IEEE 754 format — a multi-byte structure composed of a sign field, an exponent field, and a significand field. Viewing it in this raw form (binary or hex) is useful, but there are other forms that are more enlightening.
I’ve written an online converter that takes a decimal number as input, converts it to floating-point, and then displays its exact floating-point value in ten forms (including the two raw IEEE forms). I will show examples of these forms in this article.
I’ve written about two implementations of decimal string to double-precision binary floating-point conversion: David Gay’s strtod(), and glibc’s strtod(). GCC, the GNU Compiler Collection, has yet another implementation; it uses it to convert decimal floating-point literals to double-precision. It is much simpler than David Gay’s and glibc’s implementations, but there’s a hitch: limited precision causes it to produce some incorrect conversions. Nonetheless, I wanted to explain how it works, since I’ve been studying it recently. (I looked specifically at the conversion of floating-point literals in C code, although the same code is used for other languages.)
In decimal, “0.9 repeating”, or 0.9, equals 1. In binary, a similar thing is true: “0.1 repeating”, or 0.1, equals 1. I’ll show you three ways to prove it, using the three bicimal to fraction conversion algorithms I described recently.
Many new programmers become aware of binary floating-point after seeing their programs give odd results: “Why does my program print 0.10000000000000001 when I enter 0.1?”; “Why does 0.3 + 0.6 = 0.89999999999999991?”; “Why does 6 * 0.1 not equal 0.6?” Questions like these are asked every day, on online forums like stackoverflow.com.
The answer is that most decimals have infinite representations in binary. Take 0.1 for example. It’s one of the simplest decimals you can think of, and yet it looks so complicated in binary:
The bits go on forever; no matter how many of those bits you store in a computer, you will never end up with the binary equivalent of decimal 0.1.
In my article “Binary Division” I showed how binary long division converts a fraction to a repeating bicimal. In this article, I’ll show you a well-known procedure — what I call the subtraction method — to do the reverse: convert a repeating bicimal to a fraction.
There is no widely accepted term for fractional binary numbers like 0.11001. A fractional decimal number like 0.427 is called a decimal or decimal fraction. A fractional binary number is called many things, including binary fraction, binary decimal, binary expansion, bicimal, binimal, binary radix fraction, and binary fractional (my term). In this article, I’m going to argue that bicimal should be the universal term.
(Please let me know what you think — take the poll at the end of this article.)
The pencil-and-paper method of binary division is the same as the pencil-and-paper method of decimal division, except that binary numerals are manipulated instead. As it turns out though, binary division is simpler. There is no need to guess and then check intermediate quotients; they are either 0 are 1, and are easy to determine by sight.