A Better Way to Convert Integers in David Gay’s strtod()

A reader of my blog, John Harrison, suggested a way to improve how David Gay’s strtod() converts large integers to doubles. Instead of approximating the conversion and going through the correction loop to check and correct it — the signature processes of strtod() — he proposed doing the conversion directly from a binary big integer representation of the decimal input. strtod() does lots of processing with big integers, so the facility to do this is already there.

I implemented John’s idea in a copy of strtod(). The path for large integers is so much simpler and faster that I can’t believe it never occurred to me to do it this way. It’s also surprising that strtod() never implemented it this way to begin with.

Big Integer Conversion By Hand

The idea of conversion from a big integer should have been obvious to me since I do this frequently with my arbitrary-precision decimal to binary converter. I take the converter’s output and round it to 53 bits by hand. For example, 1.2345689012e37 (my converter does not accept scientific notation, so you have to enter 12345689012000000000000000000000000000) converts to

1001010010011011000101110110100011011010011101010111110100010000001010011001011110111000101100100000000000000000000000000000

The highlighted bit is bit 54 — where the rounding starts. I’ll look at this and see that it’s 1, and that there are 1 bits beyond it, and thus know to round it up to 53 bits:

1001010010011011000101110110100011011010011101011000000000000000000000000000000000000000000000000000000000000000000000000000

(In decimal this is 12345689012000000866809654767368798208.)

Big Integer Conversion In Code

A big integer can be converted to a double in code similarly, except you would exploit an optimization that reduces the size of the big integer. If you recall, strtod() expresses the input as an integer times a power of ten; for my example, that would be 12345689012 x 10²⁷. You can separate out the power of two and write that as 12345689012 x 5²⁷ x 2²⁷. You can create a big integer from 12345689012 x 5²⁷ = 91982551008462905883789062500, which converts to

1001010010011011000101110110100011011010011101010111110100010000001010011001011110111000101100100

This contains all the significant bits of 12345689012 x 10²⁷, but has 27 less trailing zeros — the factor 2²⁷. This will be accounted for in the double’s exponent.

You can put the 53 most significant bits in a 64-bit integer, and round by inspecting the remaining bits. If the 53-bit significand must be rounded up, the 64-bit integer is incremented.

You can compute the exponent by adding the factored out power of two exponent and the length of the big integer in bits, minus one. For my example, that’s 27 + 97 – 1 = 123. The resulting double, in normalized binary scientific notation, is

1.001010010011011000101110110100011011010011101011 x 2¹²³

The Code In strtod()

I put the code to implement this in a copy of strtod(), after the fast path check and before computing the approximation. (The fast path still handles “small” integers.) Specifically, right before this comment

/* Get starting approximation = rv * 10**e1 */

I added this code

if (e >= 0) {
   bd = s2b(s0, nd0, nd, y, 1); //Convert digits to a big integer
   bd = pow5mult(bd, e); //Multiply by 5^e
   dval(&rv) = bigint_to_double(bd, e);
   Bfree(bd);
   goto ret;
}

My bigint_to_double() function iterates through the words of the big integer representation, extracts the top 53 bits, and then rounds them. It computes the exponent and sets the fields of the resulting double directly.

Discussion

This method is much faster than the existing code, but it’s hard to say what the overall performance impact will be. What percentage of conversions in the real world are for large integers?
I’m adding a special case to strtod(), but it is already full of special cases.
With this new implementation, the “bigcomp” optimization would no longer apply to integer inputs. It would be easy to work in, but I don’t see the value. I think it would take an input of hundreds, if not thousands of digits before the full-blown correction loop with bigcomp() would perform better than this new calculation.
In my article “Correct Decimal To Floating-Point Using Big Integers” I describe a simple big integer based algorithm that does decimal to floating-point conversion. This new algorithm is conceptually like the big integer algorithm I describe there, except instead of using math — scaling, division, and checking remainders — it uses low-level operations on the binary representation.

4 comments

John Harrison says:

September 21, 2013 at 4:13 am

I’d be interested to know if any of the various languages that use versions of David Gay’s code make use of this optimization.
Rick Regan says:

September 21, 2013 at 10:10 am

John,

Every copy of dtoa.c I had looked at (except Python’s) was pretty vanilla, with perhaps minor changes to make it work in a particular environment. Python made extensive changes, but did not add anything like this. (Which reminds me — I’ll contact Mark Dickinson to see what he thinks of this.)

gcc and glibc strtod() have their own conversion routines. I took a look at them years ago before I knew what I was looking for. They work quite differently than Gay’s strtod(), so it’s possible they incorporate this idea. I will look when I get the chance (my notes say gcc’s code is in /gcc/real.c and glibc’s is in /stdlib/strtod_l.c if you want to get a head start).
Rick Regan says:

September 30, 2013 at 12:22 pm

John,

Looking at glibc strtod() I see that it does in fact handle integers this way. In fact, it extracts the bits of the integer part of “mixed” numbers in this way as well. (10/22/13: See my article How GLIBC’s strtod() Works for details.)
Rick Regan says:

November 6, 2013 at 12:21 pm

John,

gcc also handles integers in this same way; see my article How GCC Converts Decimal Literals to Floating-Point.

Comments are closed.