GCC Avoids Double Rounding Errors With Round-To-Odd

GCC was recently fixed so that its decimal to floating-point conversions are done correctly; it now calls the MPFR function mpfr_strtofr() instead of using its own algorithm. However, GCC still does its conversion in two steps: first it converts to an intermediate precision (160 or 192 bits), and then it rounds that result to a target precision (53 bits for double-precision floating-point). That is double rounding — how does it avoid double rounding errors? It uses round-to-odd rounding on the intermediate result.

Double Rounding Error Cases

Decimal to floating-point conversions that are inexact are rounded, using round-to-nearest (ties-to-even) rounding. But two successive roundings to nearest can result in a double rounding error. Specifically, this happens when rounding to the intermediate precision creates a midpoint (a tie) between two floating-point numbers in the target precision. The two scenarios are:

  • The first rounding is down to a midpoint, and the second rounding is down again, to the next lower floating-point number.
  • The first rounding is up to a midpoint, and the second rounding is up again, to the next higher floating-point number.

These two scenarios are shown in detail on page 77 of the Handbook of Floating-Point Arithmetic. (There is an error in the description of the second case. It says there must be “at least one 0 somewhere” in the trailing bits, but the value of these bits does not matter.)

These errors can be prevented by rounding-to-odd in the first rounding, and then rounding-to-even in the second rounding.

Round-to-Odd

Round-to-odd rounding , also called Von Neumann rounding or jamming, works as follows: the trailing bits of the higher precision value are discarded, but if any of them are non-zero, the low order bit of the result is set to 1. (This makes the low order bit act like a “sticky” bit.) For example, let’s round some 12-bit values to 8 bits using round-to-odd:

100010100110 rounds to 
10001011

100010110110 rounds to 
10001011

100010100000 rounds to 
10001010

100010110000 rounds to 
10001011

Round-to-odd rounding is essentially truncation, except when the low order bit is 0 and there is a 1 among the trailing bits (as in the first example above).

Why Round-To-Odd Prevents Double Rounding Errors

Round-to-odd rounding prevents double rounding errors because it prevents the intermediate precision rounding from creating a midpoint in the target precision. (If it’s already a midpoint before rounding then there’s no problem.) The second rounding — round-to-nearest in the target precision — will then give the correct result.

For round-to-odd to work for this purpose, the intermediate precision must be at least two bits greater than the target precision; this keeps the rounding bit of the target precision intact. For example, the 12-bit value 100010110001 rounded to 9 bits using round-to-odd is 100010111, and that rounded to nearest to 8 bits is 10001100. But the correct result is 10001011, which is what you get when you round the 12 bits directly to 8.

How GCC Implements Round-To-Odd

Because round-to-odd is not a rounding mode offered by MPFR, GCC implements it — in two steps. First, it calls mpfr_strtofr(), asking it to round the intermediate value towards zero, and to return a 1 when the trailing bits are non-zero. Here’s the code in real.c:

mpfr_init2 (m, SIGNIFICAND_BITS);
inexact = mpfr_strtofr (m, str, NULL, 10, GMP_RNDZ);

Next, it sets the low order bit of the intermediate result based on the sticky bit (the inexact flag):

/* Set a sticky bit if mpfr_strtofr was inexact.  */
r->sig[0] |= inexact;

GCC Examples

I will show you two examples that GCC would convert incorrectly if not for round-to-odd. In the first example, the double rounding error would occur because both roundings would be downward; in the second example, the double rounding error would occur because both roundings would be upward. These examples are easiest to understand in binary, so that’s how I’ll show them.

In each example, I’ve highlighted the two rounding bits: the first rounding bit is significant bit 193, and the second rounding bit is significant bit 54. The values of each rounding bit, the bit that precedes it, and the bits that follow it were manipulated to force the rounding behavior I desired.

Example 1

In this example, bit 54 is a 1 and the value of the bits beyond is non-zero:

1.0000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001

Rounding to nearest directly to 53 bits would give the correct result:

1.0000000000000000000000000000000000000000000000000001

which is 0x1.0000000000001p+0 as a hexadecimal floating-point constant.

However, rounding to nearest to 192 bits and then rounding to nearest to 53 bits gives the wrong answer. Rounding to 192 bits gives this 54 bit result, since bit 193 is 0:

1.00000000000000000000000000000000000000000000000000001

This is a halfway case for double-precision. To 53 bits, that rounds down to 1 (0x1p+0).

Rounding to odd would set bit 192 to 1, forcing the 53-bit rounding to be upward, as expected.

Testing in GCC

To test this example in GCC, I converted it to this 194-digit decimal value:

1.0000000000000001110223024625156540423631668090820312500000796545955566226138514440198883855902795552277596309393036942926693081450756529080471544937360091342970491723463055677711963653564453125

GCC converts this decimal literal correctly.

To prove that two round-to-nearest roundings would fail, I modified real.c to use round-to-nearest instead of round-to-odd: I changed the call to mpfr_strtofr(), replacing GMP_RNDZ (round-to-zero) with GMP_RNDN (round-to-nearest), and I commented out the setting of the sticky bit. The modified code returned the wrong answer, as shown above.

Example 2

In this example, bit 54 is a 0 and the bits beyond are all 1s:

1.0000000000000000000000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111

Rounding to nearest directly to 53 bits would give the correct result:

1.0000000000000000000000000000000000000000000000000001

which is 0x1.0000000000001p+0.

However, two roundings to nearest gives the wrong answer. Rounded to 192 bits it gives this 54 bit result:

1.00000000000000000000000000000000000000000000000000011

This is a halfway case for double-precision; rounded to 53 bits it is

1.000000000000000000000000000000000000000000000000001

which is 0x1.0000000000002p+0.

Rounding to odd would just truncate the intermediate value to 192 bits, preventing a 1 from propagating to bit 54. This forces the 53-bit rounding to be downward, as expected.

Testing in GCC

To test this in GCC, I used the decimal value of the input:

1.0000000000000003330669073875469621270895004272460937499999203454044433773861485559801116144097204447722403690606963057073306918549243470919528455062639908657029508276536944322288036346435546875

GCC converts it correctly. (It converts it incorrectly when I change round-to-odd to round-to-nearest.)

Dingbat

6 comments

  1. Rick, precisely the phenomenon you describe here prevents all multi-precision libraries that represent values by chained doubles from yielding correct results with an odds ratio of about 1:4096 (if I remember Muller’s handbook correctly) when being executed on the double-extended precision registers of any x87-like FPU (64 bits x87-FPU registers, 53 bits in memory). The only chance to make these libraries work always correctly is to set appropriate compiler flags and/or the floating-point control word in order to get the calculations done by the SSE part of such processors, because then no double rounding occurs at all. Of course, this kind of error compromises all floating-point calculations on any x87 FPU, but it has dramatic consequences only in case of chained doubles, because then the leading double might not be the nearest one.

  2. Hi Pascal,

    Thanks for the links.

    I can’t say I follow your “use round-to-nearest only” code — at least not on my first read. Have you written about this?

  3. I wrote the function as a contrarian, in response to Bruce Dawson’s remark in his own blog:

    > For extra credit, explain why a float precision version of fmadd cannot easily be implemented on processors that lack it by using double math (or alternately, prove the opposite by implementing fmadd).

    I may have under-documented it in order to show how easy it was 🙂

    Bruce had the same complaint so I wrote these comments separately from the code:

    http://randomascii.wordpress.com/2013/07/16/floating-point-determinism/#comment-7513

  4. There’s a terminology error in “Round-to-odd rounding, also called Von Neumann rounding or jamming, works as follows”. Actually in Von Neumann rounding, all the values (even the exact ones) are rounded to an odd significand: it consists in replacing the least significant bit of the truncation by a 1; the goal was to get a statistically unbiased rounding without carry propagation. For more information about it, see A. W. Burks, H. H. Goldstine, and J. von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument”, 1963 (taken from report to U.S. Army Ordnance Department, 1946). For the more useful round-to-odd rounding, I prefer the less confusing term “sticky rounding”, used by J. S. Moore, T. Lynch, and M. Kaufmann, “A Mechanically Checked Proof of the Correctness of the Kernel of the AMD5K86™ Floating-Point Division Algorithm”, 1996. This rounding was already suggested in 1972 by Brent, in his article “On the precision attainable with various floating-point number systems”.

  5. Vincent,

    Thanks for the correction and references. (I have struck out the reference to Von Neumann rounding and jamming.) In my defense, I made that statement based on Boldo and Melquiond’s slide presentation “When Double Rounding is Odd” (https://www.lri.fr/~melquion/doc/05-imacs17_1-expose.pdf); in it they say “Rounding to odd, also called Von Neumann’s rounding”. I had also done a Web search for the term, finding others describing it the same way. I had not read the original Burks, Goldstine, and von Neumann paper though, but I just found a copy based on your citation — can you point me to the page where it is defined?.

    Update: On p. 106 I found this description: “The round-off procedures…fall into two broad classes. The first class is characterized by its ignoring all digits beyond the nth, and even the nth digit itself, which it replaces by a 1.”

Comments are closed.

Copyright © 2008-2024 Exploring Binary

Privacy policy

Powered by WordPress

css.php