Visual C++ and GLIBC strtod() Ignore Rounding Mode

When a decimal number is converted to a binary floating-point number, the floating-point number, in general, is only an approximation to the decimal number. Large integers, and most decimal fractions, require more significant bits than can be represented in the floating-point format. This means the decimal number must be rounded, to one of the two floating-point numbers that surround it.

Common practice considers a decimal number correctly rounded when the nearest of the two floating-point numbers is chosen (and when both are equally near, when the one with significant bit number 53 equal to 0 is chosen). This makes sense intuitively, and also reflects the default IEEE 754 rounding mode — round-to-nearest. However, there are three other IEEE 754 rounding modes, which allow for directed rounding: round toward positive infinity, round toward negative infinity, and round toward zero. For a conversion to be considered truly correctly rounded, it must honor all four rounding modes — whichever is currently in effect.

I evaluated the Visual C++ and glibc strtod() functions under the three directed rounding modes, like I did for round-to-nearest mode in my articles “Incorrectly Rounded Conversions in Visual C++” and “Incorrectly Rounded Conversions in GCC and GLIBC”. What I discovered was this: they only convert correctly about half the time — pure chance! — because they ignore the rounding mode altogether.

Continue reading “Visual C++ and GLIBC strtod() Ignore Rounding Mode”

Incorrectly Rounded Conversions in GCC and GLIBC

Visual C++ rounds some decimal to double-precision floating-point conversions incorrectly, but it’s not alone; the gcc C compiler and the glibc strtod() function do the same. In this article, I’ll show examples of incorrect conversions in gcc and glibc, and I’ll present a C program that demonstrates the errors.

Continue reading “Incorrectly Rounded Conversions in GCC and GLIBC”

Incorrectly Rounded Conversions in Visual C++

In my analysis of decimal to floating-point conversion I noted an example that was converted incorrectly by the Microsoft Visual C++ compiler. I’ve found more examples — including a class of examples — that it converts incorrectly. I will analyze those examples in this article.

Continue reading “Incorrectly Rounded Conversions in Visual C++”

Copyright © 2008-2022 Exploring Binary

Privacy policy

Powered by WordPress

css.php