Correctly Rounded Conversions
I’ve written about how decimal numbers are sometimes rounded to floating-point numbers incorrectly. A decimal number, in general, can only be approximated in binary floating-point; as such, it needs to be rounded to one of two floating-point numbers surrounding it. Conventionally, it is rounded to the nearest floating-point number, with ties broken using the round-half-to-even rule. Alas, the nearest number is not always chosen, although fortunately — at least in the implementations I’ve tested — the second nearest is chosen. This results in a floating-point number that is one unit in the last place (ULP) away from the correct one.
A similar problem exists in the other direction; that is, for floating-point to decimal conversions. Although every binary floating-point number has an exact decimal representation, rounding is required when printing to a fixed number of decimal digits. To be correctly rounded, the nearest of the two n-digit decimal numbers surrounding the floating-point number must be chosen, with ties broken according to a rule — typically round-half-away-from-zero or round-half-to-even. Sometimes the nearest number is not chosen, resulting in a decimal number that is one ULP away from the correct one.
To round floating-point numbers to decimal numbers correctly, two things must be done:
- The full-precision decimal equivalent of the floating-point number — or at least enough digits to make a correct rounding decision — must be generated.
- The full-precision decimal equivalent (or sufficiently long substring thereof) must be rounded properly to the specified number of digits.
Step 1 is the hard part, so I assume this is where things go wrong when conversions are done incorrectly — and things do go wrong, as demonstrated by the examples below.
Using sprintf() for Floating-Point to Decimal Conversions
In C, floating-point numbers are converted to decimal strings using the printf() family of functions, which are supplied by the run time library associated with a compiler. I used the sprintf() function, which allowed me to automate the search for incorrectly converted values; this is the form I used:
The “%.*e” format specifier prints a floating-point number to a selected number of significant digits after the decimal point, in normalized scientific notation (the digit before the decimal point, though significant, is not counted).
For example, in Visual C++,
sets decimalString to 8.438e-001, and
sets decimalString to 8.4e-001.
Finding Incorrect Floating-Point to Decimal Conversions
I generated example floating-point numbers to convert by randomly generating decimal numbers and then converting them using David Gay’s strtod() function. I formatted the floating-point numbers with sprintf() and compared its output to the correctly rounded output of David Gay’s dtoa() function. (I wrapped dtoa() in a version of David Gay’s g_fmt() function, modified to account for exponent and trailing zero formatting differences.) Of the many examples I found in Visual C++, MinGW, and Digital Mars, I selected a few for analysis and presentation.
For each example I show five things:
- The randomly generated input number.
- The correctly rounded double-precision floating-point equivalent of the input number, written as a hexadecimal floating-point constant.
- The correctly rounded double-precision floating-point equivalent of the input number, written in decimal (I computed this by converting the input number to binary, rounding it to 53 significant bits by hand, and then converting it back to decimal.)
- The decimal equivalent of the double-precision floating-point number, rounded correctly to the specified number of digits.
- The decimal equivalent of the double-precision floating-point number, as rounded incorrectly to the specified number of digits by sprintf().
I present the examples without ‘e’ notation, since I think it makes comparison of the rounded numbers easier.
Visual C++ (2010) / MinGW GCC C (4.5.0) on Windows
In this section, I’ll show three examples that are incorrectly rounded under Visual C++ and MinGW (Visual C++ and MinGW use the same run time library, so they get the same results).
|Nearest Double (hex)||0x1.0e214ad362e90p+0|
|Nearest Double (decimal)||1.055195499999999952933649183250963687896728515625|
|Rounded to 7 Digits||1.055195|
|Printed to 7 Digits||1.055196|
The input number 1.0551955 converts correctly to the double-precision floating-point number 1.055195499999999952933649183250963687896728515625 (which equals 0x1.0e214ad362e90p+0 as a hexadecimal floating-point constant). This double-precision value rounded correctly to seven digits is 1.055195, since the value of decimal place seven and beyond is less than one-half ULP (the last place being the sixth decimal place). Visual C++ and MinGW round it incorrectly to 1.055196.
(For the remaining examples, the analysis is similar; I will let the tables speak for themselves.)
|Nearest Double (hex)||0x1.0a92a4efa9e08p+3|
|Nearest Double (decimal)||8.3304009133271534892628551460802555084228515625|
|Rounded to 16 Digits||8.330400913327153|
|Printed to 16 Digits||8.330400913327154|
|Nearest Double (hex)||0x1.30bbe881f761fp+3|
|Nearest Double (decimal)||9.5229380167393724576641034218482673168182373046875|
|Rounded to 16 Digits||9.522938016739372|
|Printed to 16 Digits||9.522938016739373|
A Note About the “%a” Format Specifier in MinGW
Interestingly, MinGW printf() does not appear to support “%a”, the format specifier that prints hexadecimal floating-point constants. Using it caused my program to crash. This was unexpected — does it use the same run time library as Visual C++ or not?
To verify that the floating point values tested in MinGW were as expected — without using “%a” — I printed them using my function print_double_binsci().
Digital Mars C (v852) on Windows
In this section, I’ll show three examples that are incorrectly rounded under Digital Mars C.
|Nearest Double (hex)||0x1.1f5201256a42ap+13|
|Nearest Double (decimal)||9194.25055964485000004060566425323486328125|
|Rounded to 14 Digits||9194.2505596449|
|Printed to 14 Digits||9194.2505596448|
|Nearest Double (hex)||0x1.98221fc83c830p+9|
|Nearest Double (decimal)||816.266594914957750006578862667083740234375|
|Rounded to 16 Digits||816.2665949149578|
|Printed to 16 Digits||816.2665949149577|
|Nearest Double (hex)||0x1.7de2cfc5d1761p+6|
|Nearest Double (decimal)||95.4714957150549849984599859453737735748291015625|
|Rounded to 16 Digits||95.47149571505498|
|Printed to 16 Digits||95.47149571505499|
Linux GCC (4.4.3) / eglibc (2.11.1)
I found no incorrect floating-point to decimal conversions in Linux GCC. This makes sense, given it is the only one of the four compilers able to generate all of the significant decimal digits of a floating-point number.
On Round-Trip Conversions
This article is about floating-point to decimal conversions, not round-trip decimal to floating-point to decimal conversions. Nonetheless, I selected four of my examples purposely to bring up an interesting discussion about round-trip conversions.
Examples 3 and 6 show conversions that round-trip but shouldn’t, and examples 2 and 5 show conversions that don’t round-trip but should. You might get fooled into judging the correctness of these conversions based on whether the output matches the input. But this thinking is wrong; the output decimal number is determined solely by the floating-point number, which may only be an approximation to the input decimal number.