Why “Volatile” Fixes the 2.2250738585072011e-308 Bug

Recently I discovered a serious bug in x87 builds of PHP: PHP’s decimal to floating-point conversion routine, zend_strtod(), went into an infinite loop when converting the decimal string 2.2250738585072011e-308 to double-precision binary floating-point. This problem was fixed with a simple one line of code change to zend_strtod.c:

This line

double aadj, aadj1, adj;

was changed to

volatile double aadj, aadj1, adj;

Why does this fix the problem? I uncovered the very specific reason: it prevents a double rounding on underflow error.

The Failing Calculation in zend_strtod()

For input “2.2250738585072011e-308”, zend_strtod() is supposed to return 0x0.fffffffffffffp-1022, or in normalized binary scientific notation,

1.111111111111111111111111111111111111111111111111111 x 2-1023.

This is a subnormal number, 2-1074 below DBL_MIN, which is 2-1022.

Instead, zend_strtod() goes into an infinite loop. This section of code, appearing inside a loop, is responsible for the problem:

/* Compute adj so that the IEEE rounding rules will
 * correctly round rv + adj in some half-way cases.
 * If rv * ulp(rv) is denormalized (i.e.,
 * y <= (P-1)*Exp_msk1), we must adjust aadj to avoid
 * trouble from bits lost to denormalization;
 * example: 1.2e-307 .
 */
if (y <= (P-1)*Exp_msk1 && aadj >= 1.) {
	aadj1 = (double)(int)(aadj + 0.5);
	if (!dsign)
		aadj1 = -aadj1;
}
adj = aadj1 * ulp(value(rv));
value(rv) += adj;

The two lines I’ve highlighted adjust the double-precision converted result, value(rv), so that it converges to the correctly rounded result. Although the source code expression of the calculation is correct, it fails when implemented in extended-precision arithmetic — which is what happens when the variable adj is not declared volatile.

zend_strtod() Executes In Double-Precision Mode

zend_strtod() executes in double-precision mode, so this is not your run-of-the-mill double-precision/extended-precision mismatch. So how does extended-precision come into play? Extended-precision numbers in the double-precision subnormal exponent range (-1023 through -1074) have 53 bits of precision, not the usual 1 to 52 bits. As I’ll show, this leads to a special case of double rounding error — a double rounding on underflow error. The result is that value(rv) remains unchanged — and zend_strtod() goes into an infinite loop!

Volatile Almost Forces Double-Precision Semantics

Floating-point calculations done with x87 FPU instructions are done in extended-precision registers, even when the processor is in double-precision mode and the variables being operated on are double-precision. This means that true double-precision semantics can’t be emulated. Why? Because extended-precision values have a wider exponent range. This can almost be fixed by declaring double variables ‘volatile’.

The ‘volatile’ keyword forces the compiler to store a variable in memory each time it’s written, and to load it from memory each time it’s read. This ensures that the exponent is constrained to the double-precision range; a number that is too large for double-precision will overflow, and a number that is too small will underflow. So far, so good.

But there remains one problem, in the handling of underflow: subnormal numbers will be doubly rounded. In double-precision mode, extended-precision results are rounded to 53 bits. If the result is subnormal relative to double-precision, it will be rounded again when written to memory — to between 1 and 52 significant bits, the range of precision for a subnormal double-precision number.

That said, the ‘volatile’ keyword does fix the PHP problem at hand. It doesn’t eliminate double rounding, but it does eliminate the specific double rounding error that causes the problem. (Does this mean there are other double rounding errors lurking? If so, hopefully the result is just an incorrect conversion, and not another infinite loop.)

The Generated Code Without ‘Volatile’

I took a fixed version of PHP — PHP 5.3.5 — and removed the ‘volatile’ keyword to restore it to its pre-fixed state. I built it on an Ubuntu Linux system and traced it using the gdb debugger. Without the ‘volatile’ keyword, the two highlighted lines of source code compile to these instructions:

0x8302a4a  call 0x8301290 <ulp>
           ... 
           0x83012b9  fldl -0x10(%ebp)    ; load ulp(value(rv)) 
           0x83012bc  leave
           0x83012bd  ret
           ...
0x8302a4f  mov -0x8c(%ebp),%eax
0x8302a55  fldl -0x30(%ebp)         ; load value(rv) 
0x8302a58  fldl -0x98(%ebp)         ; load aadj1 
0x8302a5e  fmulp %st,%st(2)         ; adj = aadj1 * ulp(value(rv)) 
0x8302a60  faddp %st,%st(1)         ; value(rv) += adj 
0x8302a62  fstpl -0x30(%ebp)        ; store value(rv) 

The problem occurs because the result of the multiplication, adj, is left on the FPU register stack, where it is then added to value(rv). This leads to a double rounding on underflow error, which I’ll explain below.

The result of the fmulp instruction (adj)

Before the fmulp instruction executes, the value of its two operands — aadj1 and ulp(value(rv)) — are on the stack (not shown). This is the FPU stack after the fmulp instruction at 0x8302a5e executes (I formatted gdb’s output for display):

st0  2.2250738585072013830902327173324041e-308  (raw 0x3c018000000000000000)
st1  -2.8309023271733244206437071197035129e-324 (raw 0xbbcc92aee22acdf2d000)
st2  0	                                        (raw 0x00000000000000000000)
st3  0	                                        (raw 0x00000000000000000000)
st4  0                                          (raw 0x00000000000000000000)
st5  0	                                        (raw 0x00000000000000000000)
st6  0	                                        (raw 0x00000000000000000000)
st7  -0.57298100991279032889735844946699217	(raw 0xbffe92aee22acdf2d000)

The result of the multiplication, adj, is in st1: 0xbbcc92aee22acdf2d000. That hexadecimal value represents an extended-precision floating-point number, which has three fields: a 1 bit sign field, a 15 bit excess-16383 exponent field, and a 64 bit fraction field (there is no implicit leading 1 bit in extended-precision). Translated into binary and separated into fields, 0xbbcc92aee22acdf2d000 is

1
011101111001100
1001001010101110111000100010101011001101111100101101000000000000

The exponent field 011101111001100 is 15308 in decimal, and after subtracting 16383, represents an exponent of -1075. So the number this floating-point value represents is

-1.001001010101110111000100010101011001101111100101101 x 2-1075.

The Result of the faddp Instruction (value(rv))

Before the faddp instruction executes, the stack contains its two operands: the initial value of value(rv) in st0 (loaded with fldl instruction at address 0x8302a55) and adj in st1. value(rv) is 0x3c018000000000000000:

0
011110000000001
1000000000000000000000000000000000000000000000000000000000000000.

This represents 2-1022.

This is the FPU stack after the faddp instruction at 0x8302a60 executes:

st0  2.225073858507201136057409796709132e-308   (raw 0x3c00fffffffffffff800)
st1  0	                                        (raw 0x00000000000000000000)
st2  0	                                        (raw 0x00000000000000000000)
st3  0	                                        (raw 0x00000000000000000000)
st4  0                                          (raw 0x00000000000000000000)
st5  0	                                        (raw 0x00000000000000000000)
st6  -0.57298100991279032889735844946699217	(raw 0xbffe92aee22acdf2d000)
st7  2.2250738585072013830902327173324041e-308	(raw 0x3c018000000000000000)

The result — the new value of value(rv) — is in st0; it is 0x3c00fffffffffffff800:

0
011110000000000
1111111111111111111111111111111111111111111111111111100000000000

This represents the 53 significant bit value

1.1111111111111111111111111111111111111111111111111111 x 2-1023.

The Result of the fstpl Instruction (value(rv))

The fstpl at address 0x8302a62 stores the copy of value(rv) in st0 back to memory, which converts it to double-precision. Since the value of st0 is a subnormal number in double-precision, it must be rounded to 52 bits. This rounds value(rv) to 2-1022 (which ironically makes it normal). In other words, value(rv) remains unchanged!

What Went Wrong: A Double Rounding on Underflow Error

You have to look at the calculation of value(rv) + adj more closely to see what happened.

value(rv) = 2-1022
adj = -1.001001010101110111000100010101011001101111100101101 x 2-1075

If you add them by hand, using binary addition, you get this 104-bit result (I’d have shown you my calculation, but alignment of the exponents made the lines too long):

1.1111111111111111111111111111111111111111111111111110110110101010001
000111011101010100110010000011010011 x 2-1023

The highlighted bit is significant bit 54; since the value of bit 54 and beyond is greater than 1/2 ULP, this number, to 53 bits, rounds up to

1.1111111111111111111111111111111111111111111111111111 x 2-1023

As I said above, this is a subnormal number in double-precision, so it must now be rounded to 52 bits. The highlighted bit — bit 53 — is 1, and all bits beyond are zero. This is a halfway case, so the tie is broken with the round-half-to-even rule. In this case, that means rounding up to 2-1022.

If the result were rounded only once, directly to 52 bits, it would have been correct:

1.111111111111111111111111111111111111111111111111111 x 2-1023

In other words, this is a double rounding error. It essentially undoes the subtraction. The second rounding adds back 2-1074, which is the amount we were really trying to subtract in the first place!

The Generated Code With ‘Volatile’

After restoring the ‘volatile’ keyword to the source code, I rebuilt PHP and traced it again. With the ‘volatile’ keyword, the two highlighted lines of source code compile to these instructions:

0x8302ba4  call 0x8301290 <ulp>
           ... 
           0x83012b9  fldl -0x10(%ebp)    ; load ulp(value(rv)) 
           0x83012bc  leave
           0x83012bd  ret
           ...
0x8302ba9  mov -0xac(%ebp),%eax
0x8302baf  fldl -0x38(%ebp)         ; load aadj1
0x8302bb2  fmulp %st,%st(1)         ; adj = aadj1 * ulp(value(rv)) 
0x8302bb4  fstpl -0x40(%ebp)        ; store adj
0x8302bb7  fldl -0x48(%ebp)         ; load value(rv)
0x8302bba  fldl -0x40(%ebp)         ; load adj
0x8302bbd  faddp %st,%st(1)         ; value(rv) += adj 
0x8302bbf  fstpl -0x48(%ebp)        ; store value(rv) 

In this code, the fmulp and faddp instructions are separated by a load/store sequence, forcing adj to double-precision before adding it to value(rv). This sidesteps the double rounding issue.

The Result of the fmulp Instruction (adj)

This is the FPU stack after the fmulp instruction at 0x8302bb2 executes:

st0  -2.8309023271733244206437071197035129e-324 (raw 0xbbcc92aee22acdf2d000)
st1  0                                          (raw 0x00000000000000000000)
st2  0	                                        (raw 0x00000000000000000000)
st3  0	                                        (raw 0x00000000000000000000)
st4  0                                          (raw 0x00000000000000000000)
st5  0	                                        (raw 0x00000000000000000000)
st6  2.2250738585072013830902327173324041e-308  (raw 0x3c018000000000000000)
st7  -0.57298100991279032889735844946699217	(raw 0xbffe92aee22acdf2d000)

The result of the multiplication, adj, is the same as in the unfixed code, except it lives in st0:

-1.001001010101110111000100010101011001101111100101101 x 2-1075.

The Result of the fstpl Instruction (value(rv))

The fstpl at address 0x8302bb4 stores the copy of adj in st0 back to memory, which converts it to double-precision. Since the value of st0 is a subnormal number in double-precision, it must be rounded to one bit, since that’s all that remains of the 52 maximum bits of subnormal precision at exponent 2-1074.

To see where the rounding occurs, rewrite st0 as

-0.1001001010101110111000100010101011001101111100101101 x 2-1074

This rounds up to -2-1074, the smallest value double-precision can represent.

The Result of the fldl Instruction (adj)

This is the FPU stack after the fldl instruction at 0x8302bba executes:

st0  -4.9406564584124654417656879286822137e-324 (raw 0xbbcd8000000000000000)
st1  2.2250738585072013830902327173324041e-308  (raw 0x3c018000000000000000)
st2  0	                                        (raw 0x00000000000000000000)
st3  0	                                        (raw 0x00000000000000000000)
st4  0                                          (raw 0x00000000000000000000)
st5  0	                                        (raw 0x00000000000000000000)
st6  0	                                        (raw 0x00000000000000000000)
st7  2.2250738585072013830902327173324041e-308	(raw 0x3c018000000000000000)

This confirms that adj, which is now in st0 as 0xbbcd8000000000000000, contains -2-1074:

1
011101111001101
1000000000000000000000000000000000000000000000000000000000000000

The Result of the faddp Instruction (value(rv))

This is the FPU stack after the faddp instruction at 0x8302bbd executes:

st0  2.2250738585072008890245868760858599e-308  (raw 0x3c00fffffffffffff000)
st1  0                                          (raw 0x00000000000000000000)
st2  0	                                        (raw 0x00000000000000000000)
st3  0	                                        (raw 0x00000000000000000000)
st4  0                                          (raw 0x00000000000000000000)
st5  0	                                        (raw 0x00000000000000000000)
st6  2.2250738585072013830902327173324041e-308  (raw 0x3c018000000000000000)
st7  -4.9406564584124654417656879286822137e-324	(raw 0xbbcd8000000000000000)

The addition works this time, changing value(rv), making it 0x3c00fffffffffffff000:

0
011110000000000
1111111111111111111111111111111111111111111111111111000000000000

This is 1.111111111111111111111111111111111111111111111111111 x 2-1023, the correctly rounded result. (It is 52 bits, so when it’s stored back to memory with the subsequent fstpl instruction, no rounding occurs.)

So the loop now terminates, after only one adjustment to value(rv).

Summary

PHP got stuck converting a decimal number whose double-precision binary floating-point representation is DBL_MIN – 2-1074, a subnormal number at the normalized/unnormalized number boundary. Its conversion routine started with an approximation of DBL_MIN and tried to adjust it by subtracting a number a tad smaller than 2-1074 — a number too small for a double to represent. The addition of the ‘volatile’ keyword forced the adjustment to be rounded to 2-1074, which allowed the subtraction to work as intended — and avoid a double rounding on underflow error.

A Good Quote

I found this quote regarding double rounding on underflow, in an appendix of “What Every Computer Scientist Should Know About Floating-Point Arithmetic” titled “Differences Among IEEE 754 Implementations” (see subsection “Programming Language Support for Extended Precision”, bullet 4):

“Of course, this form of double-rounding is highly unlikely to affect any practical program adversely.”

🙂

Endnote

I was initially using the kdbg debugger (version 2.2.1) but switched to gdb after discovering a bug in it: it displayed the decimal values of some FPU registers incorrectly. In my case, it displayed the raw values 0xbbcd8000000000000000 and 0xbbcc92aee22acdf2d000 both as -4.94065645841246544177e-324. (I reported this bug to Johannes Sixt.)

Dingbat

17 comments

  1. Hi!

    Do you have an idea why Sun’s Java 6 hangs when trying to assign the value ‘2.2250738585072012e-308’ to a ‘double’ variable? Note that the number is slightly different from the one which hangs PHP; (I think) it should convert to the smallest normalized floating-point number.

    It happens at least under Windows with latest Sun JRE/JDK 1.6.0_23 (with both the 32 bit and the 64 bit editions), I also got the problem at OpenSuse Linux (32 bit). For example, if you do the following in a Java program:

    System.out.println(“Test:”);
    double d = Double.parseDouble(“2.2250738585072012e-308”);
    System.out.println(“Value: ” + d);

    Then the current executing thread hangs (loops).
    If you do this:

    double d = 2.2250738585072012e-308;
    System.out.println(“Value: ” + d);

    and try to compile it (e.g. with Javac), then even the Java Compiler hangs.
    If you try other numbers around that value (e.g. PHP’s 2.2250738585072011e-308), it does not hang.

    I don’t know if this has anything to do with the Bug in PHP, but anyway it’s a bit strange.

    I think such a bug in Java is less critical than in PHP, because although Java is used for dynamic webseites (I use Java Servlets, JSP), it has a strong type system, so you would have to explicitly use Double.parseDouble() on a Java-based website to make the site “vulnerable” to attacks.

  2. Konstantin,

    Very cool find!! I can reproduce both problems on my 32-bit Windows system, using jdk1.6.0_20/jre1.6.0_16. (And yes, 2.2250738585072012e-308 should convert to the smallest normal: DBL_MIN = 2-1022.)

    I don’t know if java uses David Gay’s dtoa.c, which is what PHP’s zend_strtod() is based on, but I am under the impression it uses its own implementation. I’ll poke around; but in the meantime, you should report this as a java bug. Please let us know how it turns out! Thanks.

  3. Hi!

    Well, I already issued a bug report three weeks ago (shortly after I read your article about the PHP bug), but I have not yet received a response from Sun/Oracle. Maybe the bug gets fixed – in 10 years or so. 😀

    I do not have exact knowledge about internal Java code and have only basic knowledge about converting a String containing a decimal floating point number to a binary floating point number (although I have “computer arithmetics” this semester at my university :D), but I suspect that the loop happens in sun.misc.FloatingDecimal.doubleValue() inside the “correction loop” at the bottom of the function (lines 1459-1584 in the JDK 1.6.0_23 source file “j2se/src/share/classes/sun/misc/FloatingDecimal.java”).
    If I comment that block out, the function does not hang any more.

  4. Konstantin,

    In my experience, bugs like this get fixed quickly when they’re publicized 🙂 .

    Do you have a link to the bug report? I could not find it.

    | BTW, when you comment it out, does it “fix” both the runtime hang and the compile time hang?

  5. Hello,

    unfortunetaly I also don’t have a link to the bug report, because Sun publishes Bugs only after they reviewed them, and I didn’t get a response for my report, so I don’t have a Bug ID yet.

    When I out-comment that part, it can only fix the runtime hang, as the Double.parseDouble(String s) which calls sun.misc.FloatingDecimal.readJavaFormatString(s).doubleValue() is purely Java code, not native code.
    However it uses floating point arithmetics, so maybe it is also a problem with the compile settings with wich the Java Runtime and the Compiler itself were compiled (from C code).

    Whithout that correction loop, the bits of the double value that I get are these (Big Endian):
    00000000 00001111 11111111 11111111 11111111 11111111 11111111 11111111
    e.g. it converts to the biggest subnormal floating-point value, as the E is 0.
    The same happens with “2.2250738585072013e-308” (without the correction loop), but If I add the correction loop back, this converts correctly into
    00000000 00010000 00000000 00000000 00000000 00000000 00000000 00000000
    but with “2.2250738585072012e-308”, it hangs.
    I think the “correction loop” is probably the same as with PHP, it tries to make the current double value a better approximation to the decimal number, but hangs as the rounding error does not get smaller (or something like that). 🙂

  6. Konstantin,

    Great detective work. It looks like Java is having the “opposite” problem to PHP’s: it has an estimate just below DBL_MIN — 0x0.fffffffffffffp-1022 — and is trying to get up to DBL_MIN — 0x1.0000000000000p-1022.

    (I think you’ll probably get an “A” in your computer arithmetic class 🙂 .)

  7. This is an awesome find! Way to go! Waiting for news about hackers abusing this bug around the world for financial gain

  8. Nice analysis.

    My very simple solution to the problem would be not to use the FPU at all. When I compile programs, I try to make the compiler use vector math as much as possible, it’s usually a tad faster too. But of course, if you want software that runs on processors that don’t have SSE/VFP/Neon/Altivec/etc. units, you can’t do that. Luckily, most modern CPUs integrate such vector coprocessors.

    In the above case, I guess the SSE version wouldn’t run faster (unless you had FMA instructions as in the forthcoming AVX), but it would at least solve the rounding problem…

  9. Great article; Rick. Thanks for the detective work and the fine write-up. Re: Chetan Bhat: It’s been done — have you see “Office Space?”

  10. Pingback: Year of the Snail

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

(Cookies must be enabled to leave a comment...it reduces spam.)

Copyright © 2008-2024 Exploring Binary

Privacy policy

Powered by WordPress

css.php