Quick and Dirty Floating-Point to Decimal Conversion

http://www.exploringbinary.com/quick-and-dirty-floating-point-to-decimal-conversion/


In my article “Quick and Dirty Decimal to Floating-Point Conversion” I presented a small C program that uses double-precision floating-point arithmetic to convert decimal strings to binary floating-point numbers. The program converts some numbers incorrectly, despite using an algorithm that’s mathematically correct; its limited precision calculations are to blame. I dubbed the program “quick and dirty” because it’s simple, and overall converts reasonably accurately.

For this article, I took a similar approach to the conversion in the opposite direction — from binary floating-point to decimal string. I wrote a small C program that combines two mathematically correct algorithms: the classic “repeated division by ten” algorithm to convert integer values, and the classic “repeated multiplication by ten” algorithm to convert fractional values. The program uses double-precision floating-point arithmetic, so like its quick and dirty decimal to floating-point counterpart, its conversions are not always correct — though reasonably accurate. I’ll present the program and analyze some example conversions, both correct and incorrect.

Correct Floating-Point to Decimal Conversion

Every binary floating-point number has an exact decimal equivalent, which can be expressed as a decimal string of finite length. For example, the double-precision floating point representation of 3.14159, which in binary scientific notation is 1.100100100001111110011111000000011011100001100110111 x 21, is equal to the decimal value 3.14158999999999988261834005243144929409027099609375.

For the purposes of this article, I’ll consider a floating-point to decimal conversion correct if it’s the exact decimal equivalent of the binary floating-point number. I won’t consider rounding to a smaller number of digits, except to note — in specific examples — how rounding to 17 significant digits affects round trip conversions.

Program to Convert a Floating-Point Number to a Decimal String

Here’s the “quick and dirty” C program I wrote to convert a floating-point number to a decimal string; it attempts to generate all the digits of the decimal number (which it would do if it used arbitrary-precision floating-point):

#include <math.h>

int main(void)
{
 char conversion[1076], intPart_reversed[311];
 int i, charCount = 0;
 double fp_int, fp_frac, fp = 3.14159;

 fp_frac = modf(fp,&fp_int); //Separate integer/fractional parts

 while (fp_int > 0) //Convert integer part, if any
   {
    intPart_reversed[charCount++] = '0' + (int)fmod(fp_int,10);
    fp_int = floor(fp_int/10);
   }

 for (i=0; i<charCount; i++) //Reverse the integer part, if any
   conversion[i] = intPart_reversed[charCount-i-1];

 conversion[charCount++] = '.'; //Decimal point

 while (fp_frac > 0) //Convert fractional part, if any
   {
    fp_frac*=10;
    fp_frac = modf(fp_frac,&fp_int);
    conversion[charCount++] = '0' + (int)fp_int;
   }

 conversion[charCount] = 0; //String terminator
}

The Algorithm

The program converts binary to decimal using binary arithmetic. In a nutshell, here’s how it works:

  • It converts the integer part of a floating-point number by repeatedly dividing it by ten and appending the remainders to form the equivalent decimal string — in reverse. The process stops when the integer part becomes zero.
  • It converts the fractional part of a floating-point number by repeatedly multiplying it by ten and stripping off the integer part to form the equivalent decimal string. The process stops when the fractional part becomes zero.

These are the same algorithms used in my PHP conversion routines dec2bin_i() and dec2bin_f() and in my C conversion program fp2bin.c, except for using 10 as a divisor and multiplier instead of 2.

The Code

The code is a condensed version of fp2bin.c. I could have copied fp2bin.c in its entirety, renamed it fp2dec.c, renamed some variables, and substituted occurrences of 10 for 2. But my goal for this article was to keep the program simple, to show the essential elements of the conversion.

Here are some things to note about the code:

  • The number to be converted starts out as a hardcoded decimal literal, which in the program as shown is 3.14159; it is converted by the compiler to double-precision floating-point. The program converts that floating-point number back to a decimal string. (Whether the compiler does the conversion to floating-point correctly or not does not matter — all I’m testing is that the resulting floating-point value, whatever it is, converts to its decimal equivalent. But for what it’s worth, I did verify that the examples in this article were converted to floating-point correctly, by comparing them to the conversions done by David Gay’s strtod() function.)
  • Pure integer values are formatted as “i.”, and pure fractional values are formatted as “.f”. I did that to keep the program simple, rather than handle the special cases (printing integers without a decimal point, printing fractions with a leading ‘0.’).
  • The program only works with positive numbers.
  • The program doesn’t handle the special IEEE values for not-a-number (NaN) and infinity.
  • The constants 311 and 1076 represent the maximum size for a converted integer value and fractional value, respectively. The maximum positive double-precision integer has 309 digits, and the minimum positive double-precision fraction has 1074 digits. Each constant also accounts for the decimal point and the string terminator.

Testing the Conversions

To test conversions I packaged the program above in some additional C code, not shown. I used David Gay’s dtoa() function, and formatted its output so that the decimal number was displayed with all its digits and not in scientific notation. I generated random values to convert and then compared the quick and dirty conversion to dtoa()’s formatted output. I selected three example conversions for analysis below, one correct and two incorrect.

I also verified each of the three conversions by hand, by converting the initial decimal string to binary, rounding it by hand to 53 significant bits, and then converting it back to decimal.

Example 1: A Correct Conversion

The double-precision floating-point number 0x1.921f9f01b866ep+1, converted by the compiler from the decimal literal 3.14159, is converted correctly by the quick and dirty program, to this decimal number:

3.14158999999999988261834005243144929409027099609375

The conversion succeeds because at no point during the computation does it require more than 53 significant bits. To show this, I did two things:

  • I modified the quick and dirty program to use GMP arbitrary-precision floating-point arithmetic, to produce correct conversions.
  • I traced both the quick and dirty and the GMP-based conversions, using my function fp2bin(). I printed the binary value of the fractional part at each step, after it was multiplied by ten and before it had its integer part subtracted out. (I didn’t trace the conversion of the integer part ‘3’; this trivial conversion incurs no floating-point errors.)

Both programs produced the same trace; here it is (step 0 shows the initial value of the fractional part):

0:     0.00100100001111110011111000000011011100001100110111
1:     1.0110101001111000011011000010001001101000000010011
2:   100.001010001011010000111001010110000001000001011111
3:     1.10010111000010100011110101110000101000111011011
4:   101.1110011001100110011001100110011001100101000111
5:  1000.111111111111111111111111111111111111001100011
6:  1001.11111111111111111111111111111111011111101111
7:  1001.1111111111111111111111111111101011110101011
8:  1001.111111111111111111111111110011011001010111
9:  1001.11111111111111111111111000000111110110011
10: 1001.1111111111111111111011000100111001111111
11: 1001.111111111111111100111011000100001111011
12: 1001.11111111111110000100111010101001100111
13: 1001.1111111110110011000100101010000000011
14: 1001.111111001111111010111010010000001111
15: 1001.11100001111100110100011010001001011
16: 1000.1101001110000000110000010101110111
17: 1000.010000110000011110001101101010011
18:   10.10011110010010111000100010011111
19:  110.0010111011110011010101100011011
20:    1.110101011000000101011110000111
21: 1000.01010111000011011010110100011
22:   11.0110011010001000110000101111
23:  100.000000010101011110011101011
24:    0.00001101011011000010010111
25:    0.1000011000111001011110011
26:  101.001111100011111010111111
27:   10.01101110011100110111011
28:  100.0101000010000010100111
29:   11.001001010001101000011
30:    1.01110011000001001111
31:  100.0111111000110001011
32:  100.111011011110110111
33: 1001.01001011010010011
34:   10.1111000011011111
35: 1001.011010001011011
36:  100.00010111000111
37:    0.1110011100011
38: 1001.000001101111
39:    0.01000101011
40:   10.1011010111
41:  111.000110011
42:    0.11111111
43: 1001.1111011
44: 1001.100111
45:  110.00011
46:    0.1111
47: 1001.011
48:   11.11
49:  111.1
50:  101.

Here are two observations:

  • The maximum precision required during the computation is 51 bits, reached at step 2.
  • The fractional part shrinks by one bit at each step. There are 50 steps, equaling the number of fractional bits of the initial floating-point number (this includes leading zeros). One decimal digit is produced at each step, giving a conversion with a 50 digit fractional part. (This neat “lose one bit per step” behavior is only guaranteed to occur when there is no loss of floating-point precision.)

Example 2: An Incorrect Conversion of a Fractional Value

The double-precision floating-point number 0x1.9eb851eb851ecp-1, converted by the compiler from the decimal literal 0.81, is converted incorrectly by the quick and dirty program:

Correct = 0.810000000000000053290705182007513940334320068359375
Q and D = 0.810000000000000142108547152020037174224853515625

(For presentation, I prefixed the “Q and D” output with a ‘0’ before the decimal point.)

To see why the conversion goes wrong, let’s look at the trace of the correct conversion:

0:     0.110011110101110000101000111101011100001010001111011
1:  1000.00011001100110011001100110011001100110011001100111
2:     1.0000000000000000000000000000000000000000000000011
3:     0.000000000000000000000000000000000000000000001111
4:     0.00000000000000000000000000000000000000001001011
5:     0.0000000000000000000000000000000000000101110111
6:     0.000000000000000000000000000000000011101010011
7:     0.00000000000000000000000000000010010010011111
8:     0.0000000000000000000000000001011011100011011
9:     0.000000000000000000000000111001001110000111
10:    0.00000000000000000000100011110000110100011
11:    0.0000000000000000010110010110100000101111
12:    0.000000000000001101111110000100011101011
13:    0.00000000001000101110110010110010010111
14:    0.0000000101011101001111101111011110011
15:    0.000011011010010001110101101010111111
16:    0.10001000011011001001100010110111011
17:  101.0101010000111101111101110010100111
18:   11.010010100110101110100111101000011
19:   10.11101000001101001000110001001111
20: 1001.0001001000001101011110110001011
21:    0.101101001000011011001110110111
22:  111.00001101010001000001010010011
23:    0.1000010010101000110011011111
24:  101.001011101001100000001011011
25:    1.11010001111100000111000111
26: 1000.0011001101100100011100011
27:   10.000000011110110001101111
28:    0.00010011001111000101011
29:    0.1100000001011011010111
30:  111.100000111001000110011
31:  101.00100011101011111111
32:    1.0110010011011111011
33:   11.111100001011100111
34: 1001.01100111010000011
35:  100.0000100010001111
36:    0.010101011001011
37:   11.01010111110111
38:   11.0110111010011
39:  100.010100011111
40:   11.00110011011
41:   10.0000000111
42:    0.000100011
43:    0.10101111
44:  110.1101011
45: 1000.010111
46:   11.10011
47:  101.1111
48: 1001.011
49:   11.11
50:  111.1
51:  101.

Step 1 requires a 54 significant bit result, so in double-precision floating-point it will be rounded; here is the trace of the quick and dirty conversion, which shows the rounding:

0:     0.110011110101110000101000111101011100001010001111011
1:  1000.00011001100110011001100110011001100110011001101   
2:     1.0000000000000000000000000000000000000000000001
3:     0.000000000000000000000000000000000000000000101
4:     0.00000000000000000000000000000000000000011001
5:     0.0000000000000000000000000000000000001111101
6:     0.000000000000000000000000000000001001110001
7:     0.00000000000000000000000000000110000110101
8:     0.0000000000000000000000000011110100001001
9:     0.000000000000000000000010011000100101101
10:    0.00000000000000000001011111010111100001
11:    0.0000000000000000111011100110101100101
12:    0.000000000000100101010000001011111001
13:    0.00000000010111010010000111011011101
14:    0.0000001110100011010100101001010001
15:    0.001001000110000100111001110010101
16:    1.01101011110011000100000111101001
17:  100.0011010111111010100100110001101
18:   10.000110111100100110111111000001
19:    1.00010101111000010111011000101
20:    0.1101101011001110100111011001
21: 1000.100011000001001000100111101
22:  101.01111000101101011000110001
23:  100.1011011100010111011110101
24:  111.001001101110101011001001
25:    1.10000101001010111101101
26:  101.0011001110110110100001
27:   10.000001010010000100101
28:    0.00110011010010111001
29:   10.0000000011110011101
30:    0.000010011000010001
31:    0.01011111001010101
32:   11.1011011110101001
33:  111.001011001001101
34:    1.10111110000001
35:  111.0110110000101
36:  100.001110011001
37:   10.00111111101
38:   10.0111110001
39:  100.110110101
40: 1000.10001001
41:  101.0101101
42:   11.100001
43:  101.00101
44:    1.1001
45:  101.101
46:  110.01
47:   10.1
48:  101.

Here are three observations:

  • Only the first 15 digits of the quick and dirty conversion are correct.
  • The quick and dirty conversion has three less decimal digits than the correct one. That’s because the rounding at bit 54 propagates to bit 51, effectively shortening the floating-point number by three significant bits (the grayed segments in the traces — in the first step — show where the rounding occurs).
  • Hand-rounded to 17 digits, the correct conversion is 0.81000000000000005 and the quick and dirty conversion is 0.81000000000000014; that is a nine decimal ULP error. This error is big enough that the quick and dirty conversion does not round-trip, as it’s supposed to as a 17 significant digit number. The quick and dirty conversion would convert back to floating-point (using a correctly rounding conversion routine) as 0x1.9eb851eb851edp-1, which is one binary ULP away from 0x1.9eb851eb851ecp-1.

Example 3: An Incorrect Conversion of an Integer Value

The double-precision floating-point number 0x1.0000000000000p+57, converted by the compiler from the decimal literal 144115188075855877, is converted incorrectly by the quick and dirty program:

Correct = 144115188075855872
Q and D = 144115188075855882

(A different way to look at those numbers: 144115188075855877 = 257+5, 144115188075855872 = 257, and 144115188075855882 = 257+10.)

I traced both the quick and dirty and correct conversions. For these traces, I printed the running integer quotient and the remainder at each step. In the quick and dirty conversion, things go wrong because the quotient, at some point, needs more than 53 significant bits to be represented accurately. Let’s look at the correct conversion first:

0:  1000000000000000000000000000000000000000000000000000000000
1:  110011001100110011001100110011001100110011001100110011   r   10
2:  101000111101011100001010001111010111000010100011110      r  111
3:  100000110001001001101110100101111000110101001111         r 1000
4:  11010001101101110001011101011000111000100001             r  101
5:  10100111110001011010110001000111000110110                r  101
6:  10000110001101111011110100000101101011                   r 1000
7:  1101011010111111100101001101010111                       r  101
8:  1010101111001100011101110001000                          r  111
9:  1000100101110000010111110100                             r    0
10: 110110111110011011111110                                 r 1000
11: 101011111110101111111                                    r 1000
12: 100011001011110011                                       r    1
13: 11100001001011                                           r  101
14: 10110100001                                              r    1
15: 10010000                                                 r    1
16: 1110                                                     r  100
17: 1                                                        r  100
18: 0                                                        r    1

Step 1 requires a 54 significant bit result, so in double-precision floating-point it will be rounded; here is the trace of the quick and dirty conversion:

0:  1000000000000000000000000000000000000000000000000000000000
1:  110011001100110011001100110011001100110011001100110100   r   10
2:  101000111101011100001010001111010111000010100011110      r 1000
3:  100000110001001001101110100101111000110101001111         r 1000
4:  11010001101101110001011101011000111000100001             r  101
5:  10100111110001011010110001000111000110110                r  101
6:  10000110001101111011110100000101101011                   r 1000
7:  1101011010111111100101001101010111                       r  101
8:  1010101111001100011101110001000                          r  111
9:  1000100101110000010111110100                             r    0
10: 110110111110011011111110                                 r 1000
11: 101011111110101111111                                    r 1000
12: 100011001011110011                                       r    1
13: 11100001001011                                           r  101
14: 10110100001                                              r    1
15: 10010000                                                 r    1
16: 1110                                                     r  100
17: 1                                                        r  100
18: 0                                                        r    1

Here are three observations:

  • Only the first 16 digits of the quick and dirty conversion are correct.
  • Both conversions are 18 decimal digits long; the loss of precision does not affect the number of digits in the number.
  • Hand-rounded to 17 digits, the correct conversion is 144115188075855870 and the quick and dirty conversion is 144115188075855880. That’s only a one decimal ULP error, and the rounded quick and dirty conversion round-trips back to floating-point correctly.

On Using Floating-Point to Convert To and From Floating-Point

The two quick and dirty programs I’ve written — decimal to floating-point and floating-point decimal — use double-precision floating-point to convert to and from double-precision floating-point. It seems like a reasonable approach: use IEEE 754 arithmetic to produce IEEE 754 numbers. However, double-precision floating-point isn’t up to the task; it takes higher-precision floating-point to give correct results for all conversions.

Dealing with higher-precision floating-point is a little tricky; for one, you need to figure out how much precision is enough. That’s why, in practice, different conversion algorithms are used, ones that work with high-precision integer arithmetic — at least for the cases that need more than double-precision arithmetic. David Gay’s conversion routines are a good example of this approach.

Endnote

While writing this article I discovered a bug in David Gay’s strtod() function. On a certain class of inputs it would give wildly inaccurate results (see the change log for a description). The bug was fixed 11/5/10.

(See my follow-up article 15-Digit Quick and Dirty Conversions Don’t Round-Trip.)

Dingbat

9 Responses to “Quick and Dirty Floating-Point to Decimal Conversion”

  1. Jonas Tärnström Says:

    Nice article.

    I’d like to ask you if I could use the source code in a project of ours?

    We simply need a quick and dirty double to string encoder

    Thanks!

  2. Rick Regan Says:

    @Jonas,

    I don’t have an explicit policy in place, but I fully expect that the code snippets in my articles will be used by my readers (although my expectations to this point would have been that my code would be useful mostly for educational purposes). As long as you accept the code as is, feel free to use it (and if you’re at liberty to share details of your project, and why this code meets your needs, I’d be interested to hear about it).

    Thanks for asking.

  3. Don Williamson Says:

    Thanks for this post, Rick!

    I believe the integer construction is a little broken, however. An input value of 23 will print out 22 because 23/10 is 2.999… in floating point.

    I’ve added a +0.5 before the integer truncation and it seems to handle that just fine.

    Cheers,
    – Don

  4. Rick Regan Says:

    @Don,

    I tried my program with “fp = 23″ and it works just fine. (BTW, I think you meant 23/10 = 2.299…). On what machine/compiler are you running?

    Thanks for the feedback.

    | Additional thoughts:

    The digits of the integer are generated by fmod(x,y), which according to the man pages ‘returns x-i*y for some integer i’. I believe if x and y are exact floating-point integers then the return value will be an exact integer as well (I don’t think fmod() ever divides x by y — at least not a floating-point divide). In any case, I tested my program on all integers up to 232-1 and it worked fine.

    The fp_int/10 calculation can tolerate the inexactness that results in integers ending in digits 1 through 9 — I only need the floor of the result, which will be the nearest multiple of ten below it.

  5. Don Williamson Says:

    Good catch, I’m using a version of fmod that does a divide (the standard doesn’t mention that divides ).

    Thanks!

  6. Rick Regan Says:

    @Don,

    What library are you using? I’ll have to check out different fmods now to see how they work. (I wasn’t counting on different fmod results — that makes my method even ‘dirtier’.)

  7. Don Williamson Says:

    Yeah it’s minix, here: https://gforge.cs.vu.nl/gf/project/minix/scmsvn/?action=browse&path=%2Ftrunk%2Fsrc%2Flib%2Flibc%2Fmath%2Ffmod.c&view=markup

    I think their fmod implementation is simply broken!

  8. Don Williamson Says:

    This version of fmod works without modifying dtoa: http://www.opensource.apple.com/source/groff/groff-12/groff/libgroff/fmod.c

  9. Rick Regan Says:

    @Don,

    The MINIX version fails for me too (the first positive integer it fails for is 12). The opensource.apple.com version appears to work, despite the floating-point division — and being just 2 lines of code! Contrast this with the glibc implementation of fmod(), which is about 90 lines of code! I wonder — why the difference?

    This is very interesting — thanks for bringing it to my attention.

Leave a Comment

(To reduce spam, cookies must be enabled)


css.php