Numbers Greater Than DBL_MAX Should Convert To DBL_MAX When Rounding Toward Zero

I was testing David Gay’s most recent fixes to strtod() with different rounding modes and discovered that Apple Clang C++ (Xcode) and Microsoft Visual Studio C++ produce incorrect results for round towards zero and round down modes: their strtod()s convert numbers greater than DBL_MAX to infinity, not DBL_MAX. At first I thought Gay’s strtod() was wrong, but Dave pointed out that the IEEE 754 spec requires such conversions to be monotonic.

I submitted bug reports to Microsoft (“strtod()/strtof() convert values greater than DBL_MAX/FLT_MAX incorrectly when using directed rounding”) and Apple (FB13656697, “strtod() converts values greater than DBL_MAX incorrectly when using directed rounding”). (Apple’s bug reports don’t appear to be public.) Both bug reports include strtof(), which convert numbers greater than FLT_MAX to infinity, not FLT_MAX.

Testcase

This code tests the conversion of 2e308 by strtod() and strtof() (2e308 is an arbitrary value greater than DBL_MAX, which is 1.7976931348623157e+308, and FLT_MAX, which is 3.40282347e+38):

fesetround(FE_TONEAREST);
std::cout << std::hexfloat << "strtod(\"2e308\") FE_TONEAREST = " << strtod("2e308", NULL) << "\n";

fesetround(FE_UPWARD);
std::cout << std::hexfloat << "strtod(\"2e308\") FE_UPWARD = " << strtod("2e308", NULL) << "\n";

fesetround(FE_DOWNWARD);
std::cout << std::hexfloat << "strtod(\"2e308\") FE_DOWNWARD = " << strtod("2e308", NULL) << "\n";

fesetround(FE_TOWARDZERO);
std::cout << std::hexfloat << "strtod(\"2e308\") FE_TOWARDZERO = " << strtod("2e308", NULL) << "\n";

fesetround(FE_TONEAREST);
std::cout << std::hexfloat << "strtof(\"2e308\") FE_TONEAREST = " << strtof("2e308", NULL) << "\n";

fesetround(FE_UPWARD);
std::cout << std::hexfloat << "strtof(\"2e308\") FE_UPWARD = " << strtof("2e308", NULL) << "\n";

fesetround(FE_DOWNWARD);
std::cout << std::hexfloat << "strtof(\"2e308\") FE_DOWNWARD = " << strtof("2e308", NULL) << "\n";

fesetround(FE_TOWARDZERO);
std::cout << std::hexfloat << "strtof(\"2e308\") FE_TOWARDZERO = " << strtof("2e308", NULL) << "\n";

This is the expected output (as hexadecimal floating-point constants, DBL_MAX = 0x1.fffffffffffffp+1023 and FLT_MAX = 0x1.fffffep+127):

strtod("2e308") FE_TONEAREST = inf
strtod("2e308") FE_UPWARD = inf
strtod("2e308") FE_DOWNWARD = 0x1.fffffffffffffp+1023
strtod("2e308") FE_TOWARDZERO = 0x1.fffffffffffffp+1023
strtof("2e308") FE_TONEAREST = inf
strtof("2e308") FE_UPWARD = inf
strtof("2e308") FE_DOWNWARD = 0x1.fffffep+127
strtof("2e308") FE_TOWARDZERO = 0x1.fffffep+127

(David Gay’s strtod() and GNU GCC v11.3.0 give these expected results.)

This is the output from Visual Studio and Xcode:

strtod("2e308") FE_TONEAREST = inf
strtod("2e308") FE_UPWARD = inf
strtod("2e308") FE_DOWNWARD = inf
strtod("2e308") FE_TOWARDZERO = inf
strtof("2e308") FE_TONEAREST = inf
strtof("2e308") FE_UPWARD = inf
strtof("2e308") FE_DOWNWARD = inf
strtof("2e308") FE_TOWARDZERO = inf

Visual Studio Used to Get This Right

Visual Studio apparently had the correct behavior until it was undone with this issue: “C standard library: under FE_TOWARDZERO and FE_DOWNWARD NAN and INFINITY incorrectly evaluates to 0.000000 and FLT_MAX”.

IEEE Arithmetic Shows The Correct Behavior

Look to how IEEE Arithmetic does the rounding to convince yourself what the correct rounding is:

double dblMax = __DBL_MAX__; // Use __DBL_MAX__ for Xcode, DBL_MAX for Visual Studio

fesetround(FE_TONEAREST);
std::cout << std::hexfloat << "dblMax * dblMax FE_TONEAREST = " << dblMax * dblMax << "\n";

fesetround(FE_UPWARD);
std::cout << std::hexfloat << "dblMax * dblMax FE_UPWARD = " << dblMax * dblMax << "\n";

fesetround(FE_DOWNWARD);
std::cout << std::hexfloat << "dblMax * dblMax FE_DOWNWARD = " << dblMax * dblMax << "\n";

fesetround(FE_TOWARDZERO);
std::cout << std::hexfloat << "dblMax * dblMax FE_TOWARDZERO = " << dblMax * dblMax << "\n";

Visual Studio and Xcode match the expected output:

dblMax * dblMax FE_TONEAREST = inf
dblMax * dblMax FE_UPWARD = inf
dblMax * dblMax FE_DOWNWARD = 0x1.fffffffffffffp+1023
dblMax * dblMax FE_TOWARDZERO = 0x1.fffffffffffffp+1023

Note that if you use __DBL_MAX__ * __DBL_MAX__ (or DBL_MAX * DBL_MAX for Visual Studio) instead of dblMax * dblMax the compiler, which is not sensitive to rounding mode (that is another issue), would have done the arithmetic.

Bug Applies to Negative Numbers

For negative numbers, the expected behavior for FE_UPWARD and FE_DOWNWARD “swaps”. For example, for input -2e308, FE_UPWARD should give -DBL_MAX (-FLT_MAX for single-precision) and FE_DOWNWARD should give -infinity. Xcode and Visual Studio return -infinity for FE_UPWARD, which is incorrect.

Dingbat

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

(Cookies must be enabled to leave a comment...it reduces spam.)

Copyright © 2008-2024 Exploring Binary

Privacy policy

Powered by WordPress

css.php