The maximum digit counts are useful if you want to print the full decimal value of a floating-point number (worst case format specifier and buffer size) or if you are writing or trying to understand a decimal to floating-point conversion routine (worst case number of input digits that must be converted).
(Updated June 22, 2015: added a tenth display form, “decimal integer times a power of ten”.)
In the strictest sense, converting a decimal number to binary floating-point means putting it in IEEE 754 format — a multi-byte structure composed of a sign field, an exponent field, and a significand field. Viewing it in this raw form (binary or hex) is useful, but there are other forms that are more enlightening.
I’ve written an online converter that takes a decimal number as input, converts it to floating-point, and then displays its exact floating-point value in ten forms (including the two raw IEEE forms). I will show examples of these forms in this article.
Many new programmers become aware of binary floating-point after seeing their programs give odd results: “Why does my program print 0.10000000000000001 when I enter 0.1?”; “Why does 0.3 + 0.6 = 0.89999999999999991?”; “Why does 6 * 0.1 not equal 0.6?” Questions like these are asked every day, on online forums like stackoverflow.com.
The answer is that most decimals have infinite representations in binary. Take 0.1 for example. It’s one of the simplest decimals you can think of, and yet it looks so complicated in binary:
The bits go on forever; no matter how many of those bits you store in a computer, you will never end up with the binary equivalent of decimal 0.1.
In my article “Binary Division” I showed how binary long division converts a fraction to a repeating bicimal. In this article, I’ll show you a well-known procedure — what I call the subtraction method — to do the reverse: convert a repeating bicimal to a fraction.
In my article “Using Integers to Check a Floating-Point Approximation,” I briefly mentioned “bigcomp,” an optimization strtod() uses to reduce big integer overhead when checking long decimal inputs. bigcomp does a floating-point to decimal conversion — right in the middle of a decimal to floating-point conversion mind you — to generate the decimal expansion of the number halfway between two target floating-point numbers. This decimal expansion is compared to the input decimal string, and the result of the comparison dictates which of the two target numbers is the correctly rounded result.
In this article, I’ll explain how bigcomp works, and when it applies. Also, I’ll talk briefly about its performance; my informal testing shows that, under the default setting, bigcomp actually worsens performance for some inputs.
In general, to convert an arbitrary decimal number into a binary floating-point number, arbitrary-precision arithmetic is required. However, a subset of decimal numbers can be converted correctly with just ordinary limited-precision IEEE floating-point arithmetic, taking what I call the fast path to conversion. Fast path conversion is an optimization used in practice: it’s in David Gay’s strtod() function and in Java’s FloatingDecimal class. I will explain how fast path conversion works, and describe the set of numbers that qualify for it.
Producing correctly rounded decimal to floating-point conversions is hard, but only because it is made to be done efficiently. There is a simple algorithm that produces correct conversions, but it’s too slow — it’s based entirely on arbitrary-precision integer arithmetic. Nonetheless, you should know this algorithm, because it will help you understand the highly-optimized conversion routines used in practice, like David Gay’s strtod() function. I will outline the algorithm, which is easily implemented in a language like C, using a “big integer” library like GMP.
The decimal representations of oppositely signed powers of two and powers of five look alike, as seen in these examples: 2-3 = 0.125 and 53 = 125; 5-5 = 0.00032 and 25 = 32. The significant digits in each pair of powers is the same, even though one is a fraction and one is an integer. In other words, a negative power of one base looks like a positive power of the other.
This relationship is not coincidence; it’s a by-product of how fractions are represented as decimals. I’ll show you simple algebra that proves it, as well as algebra that proves similar properties — in products involving negative powers.
PARI/GP is an open source computer algebra system I use frequently in my study of binary numbers. It doesn’t manipulate binary numbers directly — input, and most output, is in decimal — so I use it mainly to do the next best thing: calculate with powers of two. Calculations with powers of two are, indirectly, calculations with binary numbers.
PARI/GP is a sophisticated tool, with several components — yet it’s easy to install and use. I use its command shell in particular, the PARI/GP calculator, or gp for short. I will show you how to use simple gp commands to explore binary numbers.
Interestingly, programming languages vary in how much precision they allow in printed floating-point fractions. You would think they’d all be the same, allowing you to print as many decimal places as you ask for. After all, a floating-point fraction is a dyadic fraction; it has as many decimal places as it has bits in its fractional representation.
Consider the dyadic fraction 5,404,319,552,844,595/253. Its decimal expansion is 0.59999999999999997779553950749686919152736663818359375, and its binary expansion is 0.10011001100110011001100110011001100110011001100110011. Both are 53 digits long. The ideal programming language lets you print all 53 decimal places, because all are meaningful. Unfortunately, many languages won’t let you do that; they typically cap the number of decimal places at between 15 and 17, which for our example might be 0.59999999999999998.
Now that you know how the powers of two are named, lets look at other, nonstandard ways to name them. You will see these names on the internet as well as in books. We will not use them on this site other than in this article, and we only discuss them here to make you aware of their use. As a by-product of this discussion, you may gain some insight into the nature of the powers of two. But beware — you may become confused as well!