GCC, the GNU Compiler Collection, recently fixed this eight and a half year old bug: “GCC Bugzilla – Bug 21718: real.c rounding not perfect.” This bug was the cause of incorrect decimal string to binary floating-point conversions. I first wrote about it over three years ago, and then recently in September and October. I also just wrote a detailed description of GCC’s conversion algorithm last month.

This fix, which will be available in version 4.9, scraps the old algorithm and replaces it with a call to MPFR function mpfr_strtofr(). I tested the fix on version 4.8.1, replacing its copy of gcc/real.c with the fixed one. I found no incorrect conversions.

## Description of The Fix

GCC calls mpfr_strtofr() with a precision of SIGNIFICAND_BITS and a rounding mode of round-to-zero. GCC’s computation of SIGNIFICAND_BITS remains unchanged; it is 160 bits for 32-bit systems, and 192 bits for 64-bit systems. (I still have no explanation for why this constant is architecture-dependent. But it ends up not mattering anymore.) mpfr_strtofr() uses sufficient precision for its calculations internally, but then rounds its result to SIGNIFICAND_BITS. It also returns a value indicating whether the conversion was exact or not.

GCC uses its existing MPFR infrastructure to convert mpfr_strtofr()’s result to GCC’s *real* type. Interestingly, the process involves calling mpfr_get_str() to create a hexadecimal floating-point constant, and then calling real_from_string() — the same function that calls mpfr_strtofr(), but a different path — to convert that to the *real* type. GCC then rounds that *real* result to the target precision as before, using round-to-nearest/ties-to-even rounding. (Use of an intermediate rounded-to-zero result in combination with knowledge of an inexact conversion avoids double rounding errors.)

## Discussion

Here are a few things I wanted to note:

- The old conversion code still exists in real.c, presumably because it is used for other purposes.
- In the process of testing this fix, I found a bug in David Gay’s strtod().
- Since glibc strtod() has also recently been fixed,
**compile time and run time conversions will now match**.

“I still have no explanation for why this constant is architecture-dependent”

Maybe the reason is to use all bits of an integer

5 * 32 == 160 bits for 32-bit systems

3 * 64 == 192 bits for 64-bit systems.

@Jeorg,

SIGNIFICAND_BITS is defined as 128 + HOST_BITS_PER_LONG. Couldn’t it just have been defined as 192? 192 = 6 * 32 or 3 * 64.

“Use of an intermediate rounded-to-zero result in combination with knowledge of an inexact conversion avoids double rounding.”

Rick, can you provide a little more idea about this? It seems the double rounding is very hard to avoid with high-precision floating. If you get a return of high-precision floating which is close to mid-point between 2 IEEE doubles and just a little below the point, how can you know if the round off part is significant?

Water,

I just learned how this works myself, during the process of investigating this fix (see discussion in the bug report). Check out When Double Rounding is Odd and “What Every Computer Scientist Should Know About Floating-Point Arithmetic” (p. 43) for explanations. Let me know what you think.

Presumably 160 bits is used on 32-bit systems because that is all that is needed, and going to 192 bits on a 32-bit system would increase the cost. On a 64-bit system it is actually simpler to use 192 bits (3 64-bit values) rather than 160 bits (2.5 64-bit values).

Bruce,

I don’t see why the precision would vary by system. For IEEE double-precision, 160 bits wasn’t enough (well neither was 192). GCC apparently handles other floating-point formats, but I would think they would need more precision as well. When I posted my findings of this architecture dependency in the bug report, there was no discussion about why it exists (existed).

Rick Regan, thank you for pointing me to the correct information. I read the “What Every Computer Scientist Should Know About Floating-Point Arithmetic” several times before but I was unable to catch the technique described in P.43. I agree round-to-zero+inexact bit resolves the issue.