Visual C++ and GLIBC strtod() Ignore Rounding Mode

When a decimal number is converted to a binary floating-point number, the floating-point number, in general, is only an approximation to the decimal number. Large integers, and most decimal fractions, require more significant bits than can be represented in the floating-point format. This means the decimal number must be rounded, to one of the two floating-point numbers that surround it.

Common practice considers a decimal number correctly rounded when the nearest of the two floating-point numbers is chosen (and when both are equally near, when the one with significant bit number 53 equal to 0 is chosen). This makes sense intuitively, and also reflects the default IEEE 754 rounding mode — round-to-nearest. However, there are three other IEEE 754 rounding modes, which allow for directed rounding: round toward positive infinity, round toward negative infinity, and round toward zero. For a conversion to be considered truly correctly rounded, it must honor all four rounding modes — whichever is currently in effect.

I evaluated the Visual C++ and glibc strtod() functions under the three directed rounding modes, like I did for round-to-nearest mode in my articles “Incorrectly Rounded Conversions in Visual C++” and “Incorrectly Rounded Conversions in GCC and GLIBC”. What I discovered was this: they only convert correctly about half the time — pure chance! — because they ignore the rounding mode altogether.

Incorrectly Rounded Conversions in GCC and GLIBC

Visual C++ rounds some decimal to double-precision floating-point conversions incorrectly, but it’s not alone; the gcc C compiler and the glibc strtod() function do the same. In this article, I’ll show examples of incorrect conversions in gcc and glibc, and I’ll present a C program that demonstrates the errors.

Decimal to Floating-Point Needs Arbitrary Precision

In my article “Quick and Dirty Decimal to Floating-Point Conversion” I presented a small C program that converts a decimal string to a double-precision binary floating-point number. The number it produces, however, is not necessarily the closest — or so-called correctly rounded — double-precision binary floating-point number. This is not a failing of the algorithm; mathematically speaking, the algorithm is correct. The flaw comes in its implementation in limited precision binary floating-point arithmetic.

The quick and dirty program is implemented in native C, so it’s limited to double-precision floating-point arithmetic (although on some systems, extended precision may be used). Higher precision arithmetic — in fact, arbitrary precision arithmetic — is needed to ensure that all decimal inputs are converted correctly. I will demonstrate the need for high precision by analyzing three examples, all taken from Vern Paxson’s paper “A Program for Testing IEEE Decimal–Binary Conversion”.

Quick and Dirty Decimal to Floating-Point Conversion

This little C program converts a decimal value — represented as a string — into a double-precision floating-point number:

```#include <string.h>

int main (void)
{
double intPart = 0, fracPart = 0, conversion;
unsigned int i;
char decimal[] = "3.14159";

i = 0; /* Left to right */
while (decimal[i] != '.') {
intPart = intPart*10 + (decimal[i] - '0');
i++;
}

i = strlen(decimal)-1; /* Right to left */
while (decimal[i] != '.') {
fracPart = (fracPart + (decimal[i] - '0'))/10;
i--;
}

conversion = intPart + fracPart;
}
```

The conversion is done using the elegant Horner’s method, summing each digit according to its decimal place value. So why do I call it “quick and dirty?” Because the binary floating-point value it produces is not necessarily the closest approximation to the input decimal value — the so-called correctly rounded result. (Remember that most real numbers cannot be represented exactly in floating-point.) Most of the time it will produce the correctly rounded result, but sometimes it won’t — the result will be off in its least significant bit(s). There’s just not enough precision in floating-point to guarantee the result is correct every time.

I will demonstrate this program with different input values, some of which convert correctly, and some of which don’t. In the end, you’ll appreciate one reason why library functions like strtod() exist — to perform efficient, correctly rounded conversion.

Are there any multiple digit hexadecimal number palindromes that are also palindromic in binary and decimal? I have been searching but have not found any.

I started my search with my program that finds multiple-base palindromes. I generated palindromes in binary, and then checked them to see if they were also palindromes in hexadecimal and decimal. I looked for decimal/binary/hexadecimal palindromes up to 16 hex digits long, but did not find any.

To continue my search into bigger numbers, I wrote a program that uses arbitrary-precision integer arithmetic and a more efficient algorithm. Despite being able to search much further, I still have not found any.

In this article, I’ll analyze the size of the palindrome “search space”, explain my improved algorithm, and discuss the state of my search.

When Doubles Don’t Behave Like Doubles

In my article “When Floats Don’t Behave Like Floats” I explained how calculations involving single-precision floating-point variables may be done, under the covers, in double or extended precision. This leads to anomalies in expected results, which I demonstrated with two C programs — compiled with Microsoft Visual C++ and run on a 32-bit Intel Core Duo processor.

In this article, I’ll do a similar analysis for double-precision floating-point variables, showing how similar anomalies arise when extended precision calculations are done. I modified my two example programs to use doubles instead of floats. Interestingly, the doubles version of program 2 does not exhibit the anomaly. I’ll explain.

When Floats Don’t Behave Like Floats

These two programs — compiled with Microsoft Visual C++ and run on a 32-bit Intel Core Duo processor — demonstrate an anomaly that occurs when using single-precision floating point variables:

Program 1

```#include "stdio.h"
int main (void)
{
float f1 = 0.1f, f2 = 3.0f, f3;

f3 = f1 * f2;
if (f3 != f1 * f2)
printf("Not equal\n");
}
```

Prints “Not equal”.

Program 2

```#include "stdio.h"
int main (void)
{
float f1 = 0.7f, f2 = 10.0f, f3;
int i1, i2;

f3 = f1 * f2;
i1 = (int)f3;
i2 = (int)(f1 * f2);
if (i1 != i2)
printf("Not equal\n");
}
```

Prints “Not equal”.

In each case, f3 and f1 * f2 differ. But why? I’ll explain what’s going on.

Ending Digits of Powers of Five Form a Binary Tree

In my article “Patterns in the Last Digits of the Positive Powers of Five” I showed that the cycles of ending digits of the positive powers of five could be represented with a binary tree:

The tree layout shows that certain pairs of ending digits are related, and that these pairs differ by five in their starting digits. I will show why this is true.

In my article “Counting Binary and Hexadecimal Palindromes” I derived formulas for counting binary palindromes and hexadecimal palindromes. For each type of palindrome, I derived two pairs of formulas: one pair to count n-digit palindromes, and one pair to count palindromes of n digits or less.

In this article, I will derive similar formulas to count binary/hexadecimal palindromes — multi-base palindromes I’ve shown to have an algorithmically defined structure.

How to Install and Run GMP on Windows Using MPIR

To perform arbitrary-precision arithmetic in C and C++ programs on Windows, I use GMP. In particular, I use MPIR, a Windows port of GMP. MPIR is a simple alternative to using GMP under Cygwin or MinGW.

I will show you how to install MPIR in Microsoft Visual C++ as a static, 32-bit library. I will also show you how to install the optional C++ interface — also as a static, 32-bit library. I will provide two example C programs that call the GMP integer and floating-point functions, and two equivalent C++ programs — programs that use the same GMP functions, only indirectly through the C++ interface.