Bigcomp: Deciding Truncated, Near Halfway Conversions

In my article “Using Integers to Check a Floating-Point Approximation,” I briefly mentioned “bigcomp,” an optimization strtod() uses to reduce big integer overhead when checking long decimal inputs. bigcomp does a floating-point to decimal conversion — right in the middle of a decimal to floating-point conversion mind you — to generate the decimal expansion of the number halfway between two target floating-point numbers. This decimal expansion is compared to the input decimal string, and the result of the comparison dictates which of the two target numbers is the correctly rounded result.

In this article, I’ll explain how bigcomp works, and when it applies. Also, I’ll talk briefly about its performance; my informal testing shows that, under the default setting, bigcomp actually worsens performance for some inputs.

When Bigcomp Applies

Bigcomp is enabled by default in strtod() (it can be disabled by “#defining” the variable “NO_STRTOD_BIGCOMP”). It applies to decimal strings longer than STRTOD_DIGLIM significant digits, which by default is “#defined” to 40. For decimal inputs dStr of 40 significant digits or less, the normal approximation check is done, using all significant digits of dStr for its big integer representation d; for inputs longer than 40 digits, only the first 18 digits are used for d. This truncation of d leads to smaller big integers, but it can lead to cases where the approximation check cannot decide between adjacent floating-point values. When that happens, the bigcomp procedure kicks in, through a call to the bigcomp() function.

Specifically, bigcomp() is called when the approximation check cannot decide between the approximation b and its successor, b + u. (As with all my articles in this series, I will only be discussing how strtod() works for round-to-nearest/round-half-to-even rounding.) This happens when the truncated value of d falls between b and b + h; that is, between b and one-half ULP above b.

(In this article, I distinguish between the input dStr and its big integer representation d, which may or may not be a truncation of dStr.)

Overview of the Algorithm

bigcomp() compares the significant digits of dStr to a string representation of the significant digits of b + h. It goes left-to-right, digit by significant digit, stopping when the two strings differ. (In the bulk of cases, the difference occurs between the 16th and 18th digits, although in the worst case the difference can occur at the 768th digit; it depends on how close the decimal number is to b + h.) If dStr is less than b + h, bigcomp() returns b; if dStr is greater than b + h, bigcomp() returns b + u; if dStr equals b + h, either b or b + u is chosen, according to round-half-to-even rounding.

bigcomp() generates the significant digits of b + h using essentially the same algorithm as the dtoa() function, the floating-point to decimal conversion routine co-located in David Gay’s dtoa.c. It uses big integer arithmetic, including big integer division.

bigcomp() starts by creating a fraction that represents the significant digits of b + h (not b + h itself). In this form, it generates a digit by multiplying the numerator of the fraction by ten, and then using integer division to form a quotient and remainder. The quotient represents the next digit; the remainder represents the remaining digits, which are accessed one at a time by repeating this process. Digits are generated until there is a mismatch between dStr and b + h.

The Algorithm, By Example

I’ll break the algorithm into five steps, and illustrate it with two examples. I set STRTOD_DIGLIM to 19 so I could generate shorter examples (both are 20 digits).

Example 1: Check 1.3694713649464322631e-11

In this example, we want to check if dStr = 1.3694713649464322631e-11 is closer to b = 0x1.e1d703bb5749cp-37 or b + u = 0x1.e1d703bb5749dp-37. We do this by comparing dStr to b + h = 0x1.e1d703bb5749c8p-37. (I describe b + h using a hexadecimal floating-point constant, even though it won’t be represented as an IEEE floating-point number. This number has 54 bits, with bit 54 being a 1. It is b with a hexadecimal ‘8’ tacked on at the end.)

1. Represent b + h as an integer times a power of two

We start by representing b as a 53-bit integer times a power of two, as we do in the basic approximation check:

b = bInt x 2bExp 
  = 8476617176609948 x 2-89

Since bInt is 53-bit integer, u = 2bExp, and h is 2bExp-1. Therefore,

b + h = bInt x 2bExp + 2bExp-1
      = (bInt x 2 + 1) x 2bExp-1
      = (8476617176609948 x 2 + 1) x 2-90
      = 16953234353219897 x 2-90

In code: We create bInt and bExp from b’s underlying floating-point representation: we treat its significand as a 53-bit integer, and subtract 52 from its power of two exponent. We create a big integer from bInt, and then we make it the integer portion of b + h by shifting it left and ‘ORing’ a 1 into its least significant bit. We maintain the exponent of the power of two as a native integer, setting it to bExp-1.

2. Scale b + h so its first significant digit is just to the left of the decimal point

If you were to compute b + h from the expression above, you’d see its value is


(We’re not going to do that; I just wanted to show you where we’re headed. By the way, I computed it using PARI/GP.)

What we want to do is scale b + h so that it looks like


In other words, we want to multiply it by a power of ten so that, when viewed as a decimal number, its significant digits look normalized; 1011 does the trick, effectively shifting b + h left by 11 decimal places:

Scaled b + h = 16953234353219897 x 2-90 x 1011

This “normalization” sets up the digit generation algorithm.

In code: A full-blown floating-point to decimal conversion would need to use logarithms to compute the power of ten; but in this case, we can obtain the power of ten exponent directly from dStr.

3. Combine powers of two

If we decompose the power of ten into a power of two and a power of five, we get

Scaled b + h = 16953234353219897 x 2-90 x 211 x 511

We can now combine powers of two to reduce the size of the resulting big integers.

Scaled b + h = 16953234353219897 x 2-79 x 511

In code: The code is simple: just add the exponents of the power of two from b + h and the power of two from the power of ten scaling factor.

4. Express scaled b + h as a fraction

The negative power of two, 2-79, makes this a fraction:

Scaled b + h = (16953234353219897 x 511)/279
             = 827794646153315283203125/604462909807314587353088

In code: Compute the powers as big integers, using the absolute values of the exponents; then, multiply the terms out, as appropriate.

5. Generate and check digits

Scaled b + h has a numerator num and denominator den:

num = 827794646153315283203125
den = 604462909807314587353088

The significant digits are generated from these integers using the “repeated multiplication by ten” algorithm; this is what’s done at each iteration (dig stands for ‘digit’, and rem stands for ‘remainder’):

  1. dig = num/den (just the integer part of the quotient)
  2. rem = numdig x den
  3. num = rem x 10

In other words, at each iteration, a digit is peeled off and the rest of the number is shifted left one decimal place. This step is repeated until a corresponding digit from dStr and b + h don’t match; here are the first three iterations:

1. 827794646153315283203125/den  = 1 rem 223331736346000695850037
2. 2233317363460006958500370/den = 3 rem 419928634038063196441106
3. 4199286340380631964411060/den = 6 rem 572508881536744440292532
4. ...

This stops after step 19; that is, dStr and b + h differ at the 19th digit. Here are the digits of dStr and b + h aligned, to show where they differ:

Digits of dStr  = 13694713649464322631
Digits of b + h = 1369471364946432260444165412355907332584564750632

Since 3 is greater than 0, dStr is closer to b + u than it is to b, so b + u is the answer.

Example 2: Check 9.3170532238714134438e+16

In this example, we want to check if dStr = 9.3170532238714134438e+16 is closer to b = 0x1.4b021afd9f651p+56 or b + u = 0x1.4b021afd9f652p+56. We do this by comparing dStr to b + h = 0x1.4b021afd9f6518p+56.

1. Represent b + h as an integer times a power of two

b = bInt x 2bExp 
  = 5823158264919633 x 24

b + h = (bInt x 2 + 1) x 2bExp-1
      = (5823158264919633 x 2 + 1) x 23
      = 11646316529839267 x 23

2. Scale b + h so its first significant digit is just to the left of the decimal point

If you were to compute b + h from the expression above, you’d see its value is


We want to shift it right 16 decimal places so it looks like


Scaled b + h = 11646316529839267 x 23 x 10-16

3. Combine powers of two

If we decompose the power of ten into a power of two and a power of five, we get

Scaled b + h = 11646316529839267 x 23 x 2-16 x 5-16

Combining powers of two we get

Scaled b + h = 11646316529839267 x 2-13 x 5-16

4. Express scaled b + h as a fraction

The negative powers of two and five make this a fraction:

Scaled b + h = 11646316529839267/(213 x 516)
             = 11646316529839267/1250000000000000

5. Generate and check digits

Scaled b + h has a numerator num and denominator den:

num = 11646316529839267
den = 1250000000000000

Here are the first three iterations of the repeated multiplication by ten algorithm:

1. 11646316529839267/den = 9 rem 396316529839267
2. 3963165298392670/den  = 3 rem 213165298392670
3. 2131652983926700/den  = 1 rem 881652983926700
4. ...

This stops after step 17; that is, dStr and b + h differ at the 17th digit. Here are the digits of dStr and b + h aligned, to show where they differ:

Digits of dStr  = 93170532238714134438
Digits of b + h = 93170532238714136

Since 4 is less than 6, dStr is closer to b than it is to b + u, so b is the answer.


strtod() has an extra step, which I’ve not described. It scales the numerator and denominator by a common power of two, which is necessary for its ‘quorem()’ function to compute the quotient and remainder properly. The power of two exponent is determined by a function called ‘dshift().’

If dStr represents a fractional value, step 2 (scale by power of ten) and step 3 (combine powers of two) are technically unnecessary. Without them, step 5 (generate digits) would generate leading zeros, which could be ignored. (Of course, the pre-scaling is more elegant, as it reduces everything to one form.)

The 40-Digit Cutoff Is Too Low

bigcomp() does a lot of work with big integers, which made me wonder why it performed better than the basic approximation check. I ran some tests, and found that it doesn’t perform better — not at the default STRTOD_DIGLIM value of 40! My tests indicate that bigcomp() only performs better starting at somewhere between 80 and 83 digits. (I don’t know why 40 was chosen as the default.) bigcomp() outperforms the basic check for really long inputs; for example, the difference in performance is great at 500 digits, and substantial at 1000 digits.

Let’s look at a 1000-digit input to see the difference in the big integers involved:

dStr =

Without bigcomp, using the basic approximation check, these are the variables involved:

dS =

bS =

hS =

|dS-bS| =

|dS-bS| is less than hS, so b is chosen.

With bigcomp, there are two sets of big integers involved: one set for the basic approximation check (on the truncated value), and one set for bigcomp. Here are the big integers used for the basic approximation check:

dS = 539393324717760434
bS = 539393324717760400
hS = 40
|dS-bS| = 34

Here are the big integers used for bigcomp (not shown is the extra factor of 28 in the numerator and denominator due to dshift()):

num = 13484833117944011
den = 2500000000000000

bigcomp() finds that digit 17 of dStr (3) is less than digit 17 of b + h (4), so it chooses b.


Thanks to Mark Dickinson for his commentary on bigcomp, both in Python’s version of David Gay’s dtoa.c, and in private communication.



Comments are closed.

Copyright © 2008-2024 Exploring Binary

Privacy policy

Powered by WordPress