Recently I wrote about my retesting of the gcc C compiler’s string to double conversions and how it appeared that its incorrect conversions were due to an architecture-dependent bug. My examples converted incorrectly on 32-bit systems, but worked on 64-bit systems — at least most of them. I decided to dig into gcc’s source code and trace its execution, and I found the architecture dependency I was looking for. But I found more than that: due to limited precision, gcc will do incorrect conversions on any system. I’ve constructed an example to demonstrate this.
Three years ago I wrote about how the gcc C compiler and the glibc strtod() function do some decimal to double-precision floating-point conversions incorrectly. I recently retested their conversions and found out two things: glibc’s strtod() has been fixed, and gcc’s conversion code, while still unfixed, produces correct conversions on some machines.
A reader of my blog, John Harrison, suggested a way to improve how David Gay’s strtod() converts large integers to doubles. Instead of approximating the conversion and going through the correction loop to check and correct it — the signature processes of strtod() — he proposed doing the conversion directly from a binary big integer representation of the decimal input. strtod() does lots of processing with big integers, so the facility to do this is already there.
I implemented John’s idea in a copy of strtod(). The path for large integers is so much simpler and faster that I can’t believe it never occurred to me to do it this way. It’s also surprising that strtod() never implemented it this way to begin with.
I read about an interesting method for decimal/binary conversion in chapter two of Gerald R. Rising’s book “Inside Your Calculator”. Unlike the standard string-oriented conversion algorithms, the algorithms in his book perform arithmetic on an encoding I’ve dubbed “deci-binary”. Using this encoding, binary numbers are input and output as decimal strings consisting of only ones and zeros, exploiting built-in language facilities for decimal input and output. I will demonstrate this conversion process with Java code I have written.
Paul Bristow, a Boost.Math library author and reader of my blog, recently alerted me to a problem he discovered many years ago in Visual C++: some double-precision floating-point values fail to round-trip through a stringstream as a 17-digit decimal string. Interestingly, the 17-digit strings that C++ generates are not the problem; they are correctly rounded. The problem is that the conversion of those strings to floating-point is sometimes incorrect, off by one binary ULP.
I’ve previously discovered that Visual Studio makes incorrect decimal to floating-point conversions, and that Microsoft is OK with it — at least based on their response to my now deleted bug report. But incorrect decimal to floating-point conversions in this context seems like a problem that needs fixing. When you serialize a double to a 17-digit decimal string, shouldn’t you get the same double back later? Apparently Microsoft doesn’t think so, because Paul’s bug report has also been deleted.
The exact decimal equivalent of an arbitrary double-precision binary floating-point number is typically an unwieldy looking number, like this one:
In general, when you print a floating-point number, you don’t want to see all its digits; most of them are “garbage” in a sense anyhow. But how many digits do you need? You’d like a short string, yet you’d want it long enough so that it identifies the original floating-point number. A well-known result in computer science is that you need 17 significant decimal digits to identify an arbitrary double-precision floating-point number. If you were to round the exact decimal value of any floating-point number to 17 significant digits, you’d have a number that, when converted back to floating-point, gives you the original floating-point number; that is, a number that round-trips. For our example, that number is 0.10000000000000001.
But 17 digits is the worst case, which means that fewer digits — even as few as one — could work in many cases. The number required depends on the specific floating-point number. For our example, the short string 0.1 does the trick. This means that 0.1000000000000000055511151231257827021181583404541015625 and 0.10000000000000001 and 0.1 are the same, at least as far as their floating-point representations are concerned.
Every integer has an equivalent representation in decimal and binary. Except for 0 and 1, the binary representation of an integer has more digits than its decimal counterpart. To find the number of binary digits (bits) corresponding to any given decimal integer, you could convert the decimal number to binary and count the bits. For example, the two-digit decimal integer 29 converts to the five-digit binary integer 11101. But there’s a way to compute the number of bits directly, without the conversion.
Sometimes you want to know, not how many bits are required for a specific integer, but how many are required for a d-digit integer — a range of integers. A range of integers has a range of bit counts. For example, four-digit decimal integers require between 10 and 14 bits. For any d-digit range, you might want to know its minimum, maximum, or average number of bits. Those values can be computed directly as well.
Numbers are entered into computers as strings of text. These strings are converted to binary, into the numeric form understood by the computer’s hardware. Numbers with a decimal point — numbers we think of as real numbers — are converted into a format called binary floating-point. The procedure that converts decimal strings to binary floating-point — IEEE double-precision binary floating-point in particular — goes by the name strtod(), which stands for string to double.
Converting decimal strings to doubles correctly and efficiently is a surprisingly complex task; fortunately, David M. Gay did this for us when he wrote this paper and released this code over 20 years ago. (He maintains this reference copy of strtod() to this day.) His code appears in many places, including in the Python, PHP, and Java programming languages, and in the Firefox, Chrome, and Safari Web browsers.
I’ve spent considerable time reverse engineering strtod(); neither the paper nor the code are easy reads. I’ve written articles about how each of its major pieces work, and I’ve discovered bugs (as have a few of my readers) along the way. This article ties all of my strtod() research together.
I’ve discussed how David Gay’s strtod() function computes an initial floating-point approximation and then uses a loop to check and correct it, but I have not discussed how the correction is made. That is the last piece of the strtod() puzzle, and I will cover it in this article.
Many new programmers become aware of binary floating-point after seeing their programs give odd results: “Why does my program print 0.10000000000000001 when I enter 0.1?”; “Why does 0.3 + 0.6 = 0.89999999999999991?”; “Why does 6 * 0.1 not equal 0.6?” Questions like these are asked every day, on online forums like stackoverflow.com.
The answer is that most decimals have infinite representations in binary. Take 0.1 for example. It’s one of the simplest decimals you can think of, and yet it looks so complicated in binary:
The bits go on forever; no matter how many of those bits you store in a computer, you will never end up with the binary equivalent of decimal 0.1.
I’ve shown you two ways to convert a bicimal to a fraction: the subtraction method and the direct method. In this article, I will show you a third method — a common method I call the series method — that uses the formula for infinite geometric series to create the fraction.
There are several ways to convert a repeating bicimal to a fraction. I’ve shown you the subtraction method; now I’ll show you the direct method, my name for the method that creates a fraction directly, using a numerator and denominator of well-known form.