GCC was recently fixed so that its decimal to floating-point conversions are done correctly; it now calls the MPFR function mpfr_strtofr() instead of using its own algorithm. However, GCC still does its conversion in two steps: first it converts to an intermediate precision (160 or 192 bits), and then it rounds that result to a target precision (53 bits for double-precision floating-point). That is double rounding — how does it avoid double rounding errors? It uses round-to-odd rounding on the intermediate result.

Continue reading …

I want to contribute to the Hour of Code event happening now during Computer Science Education Week.

I don’t write about computer programming, but I do write extensively about how computers work — in particular, about how they do arithmetic with binary numbers. For your “hour of code” I’d like to introduce you to binary numbers and binary addition. I’ve selected several of my articles for you to read, and I’ve written some exercises you can try on my online calculators.

Continue reading …

GCC, the GNU Compiler Collection, recently fixed this eight and a half year old bug: “GCC Bugzilla – Bug 21718: real.c rounding not perfect.” This bug was the cause of incorrect decimal string to binary floating-point conversions. I first wrote about it over three years ago, and then recently in September and October. I also just wrote a detailed description of GCC’s conversion algorithm last month.

This fix, which will be available in version 4.9, scraps the old algorithm and replaces it with a call to MPFR function mpfr_strtofr(). I tested the fix on version 4.8.1, replacing its copy of gcc/real.c with the fixed one. I found no incorrect conversions.

Continue reading …

While running some of GCC’s string to double conversion testcases I discovered a bug in David Gay’s strtod(): it converts some very small subnormal numbers incorrectly. Unlike numbers 2^{-1075} or smaller, which should convert to zero under round-to-nearest/ties-to-even rounding, numbers between 2^{-1075} and 2^{-1074} should convert to 2^{-1074}, the smallest number representable in double-precision binary floating-point. strtod() correctly converts the former to 0, but it incorrectly converts the latter to 0 as well.

(**Update 11/25/13**: This bug has been fixed.)

Continue reading …

In my article “Floating-Point Questions Are Endless on stackoverflow.com” I showed examples of the many questions asked that demonstrate lack of knowledge of the most basic property of floating-point — that not all decimal values are representable in binary. In response to a reader’s comment on my article I wrote:

It would be interesting to know how it’s taught today (it’s been a very long time since I was taught it). I can’t imagine though that the person teaching it wouldn’t say — within a sentence or two of saying “floating-point” — that it “can’t represent all decimal numbers accurately”.

That prompted me to look through my box of thirty plus year old college (undergraduate) notebooks. I found notebooks for four classes in which I was taught floating-point. The notes from three of those classes confirm what I thought — that we were warned early of the decimal/binary mismatch. But in the first class of the four — the beginner’s class — it’s less clear what we were told. I’ll show you images of the relevant excerpts from my notes. (I notice I had some elements of cursive in my handwriting back then.)

Continue reading …

A reader of my blog, Water Qian, reported a bug to me after reading my article “How GLIBC’s strtod() Works”. I recently tested strtod(), which was was fixed to do correct rounding in glibc 2.17; I had found no incorrect conversions.

Water tested the conversion of 2^{-1075} — in retrospect an obvious corner case I should have tried — and found that it converted incorrectly to 0x0.0000000000001p-1022. That’s 2^{-1074}, the smallest double-precision value. It should have converted to 0, under round-to-nearest/ties-to-even rounding.

(**Update 11/13/13**: This bug has been fixed for version 2.19.)

Continue reading …

For years I’ve followed, through RSS, floating-point related questions on stackoverflow.com. Every day it seems there is a question like “why does 19.24 plus 6.95 equal 26.189999999999998?” I decided to track these questions, to see if my sense of their frequency was correct. I found that, in the last 40 days, there were 18 such questions. That’s not one per day, but still — a lot!

Continue reading …

I’ve written about two implementations of decimal string to double-precision binary floating-point conversion: David Gay’s strtod(), and glibc’s strtod(). GCC, the GNU Compiler Collection, has yet another implementation; it uses it to convert decimal floating-point literals to double-precision. It is much simpler than David Gay’s and glibc’s implementations, but there’s a hitch: limited precision causes it to produce some incorrect conversions. Nonetheless, I wanted to explain how it works, since I’ve been studying it recently. (I looked specifically at the conversion of floating-point literals in C code, although the same code is used for other languages.)

Continue reading …

The *string to double* function, strtod(), converts decimal numbers represented as strings into binary numbers represented in IEEE double-precision floating-point. Many programming environments implement their string to double conversions with David Gay’s strtod(); glibc, the GNU C Library, does not.

Like David Gay’s strtod(), glibc’s strtod() produces correctly rounded conversions. But it uses a simpler algorithm: it doesn’t have a floating-point only fast path for small inputs; it doesn’t compute a floating-point approximation to the correct result; it doesn’t check the approximation with big integers; it doesn’t adjust the approximation and recheck it; it doesn’t have an optimization for really long inputs. Instead, it handles all inputs uniformly, converting their integer and fractional parts separately, using only big integers. I will give an overview of how glibc’s strtod() works.

Continue reading …

Recently I wrote about my retesting of the gcc C compiler’s string to double conversions and how it appeared that its incorrect conversions were due to an architecture-dependent bug. My examples converted incorrectly on 32-bit systems, but worked on 64-bit systems — at least most of them. I decided to dig into gcc’s source code and trace its execution, and I found the architecture dependency I was looking for. But I found more than that: due to limited precision, gcc will do incorrect conversions on *any* system. I’ve constructed an example to demonstrate this.

Continue reading …

Three years ago I wrote about how the gcc C compiler and the glibc strtod() function do some decimal to double-precision floating-point conversions incorrectly. I recently retested their conversions and found out two things: glibc’s strtod() has been fixed, and gcc’s conversion code, while still unfixed, produces correct conversions on some machines.

Continue reading …

A reader of my blog, John Harrison, suggested a way to improve how David Gay’s strtod() converts large integers to doubles. Instead of approximating the conversion and going through the correction loop to check and correct it — the signature processes of strtod() — he proposed doing the conversion directly from a binary big integer representation of the decimal input. strtod() does lots of processing with big integers, so the facility to do this is already there.

I implemented John’s idea in a copy of strtod(). The path for large integers is so much simpler and faster that I can’t believe it never occurred to me to do it this way. It’s also surprising that strtod() never implemented it this way to begin with.

Continue reading …