## Decimal to Floating-Point Needs Arbitrary Precision

In my article “Quick and Dirty Decimal to Floating-Point Conversion” I presented a small C program that converts a decimal string to a double-precision binary floating-point number. The number it produces, however, is not necessarily the closest — or so-called correctly rounded — double-precision binary floating-point number. This is not a failing of the algorithm; mathematically speaking, the algorithm is correct. The flaw comes in its implementation in limited precision binary floating-point arithmetic.

The quick and dirty program is implemented in native C, so it’s limited to double-precision floating-point arithmetic (although on some systems, extended precision may be used). Higher precision arithmetic — in fact, arbitrary precision arithmetic — is needed to ensure that all decimal inputs are converted correctly. I will demonstrate the need for high precision by analyzing three examples, all taken from Vern Paxson’s paper “A Program for Testing IEEE Decimal–Binary Conversion”.

## Quick and Dirty Decimal to Floating-Point Conversion

This little C program converts a decimal value — represented as a string — into a double-precision floating-point number:

```#include <string.h>

int main (void)
{
double intPart = 0, fracPart = 0, conversion;
unsigned int i;
char decimal[] = "3.14159";

i = 0; /* Left to right */
while (decimal[i] != '.') {
intPart = intPart*10 + (decimal[i] - '0');
i++;
}

i = strlen(decimal)-1; /* Right to left */
while (decimal[i] != '.') {
fracPart = (fracPart + (decimal[i] - '0'))/10;
i--;
}

conversion = intPart + fracPart;
}
```

The conversion is done using the elegant Horner’s method, summing each digit according to its decimal place value. So why do I call it “quick and dirty?” Because the binary floating-point value it produces is not necessarily the closest approximation to the input decimal value — the so-called correctly rounded result. (Remember that most real numbers cannot be represented exactly in floating-point.) Most of the time it will produce the correctly rounded result, but sometimes it won’t — the result will be off in its least significant bit(s). There’s just not enough precision in floating-point to guarantee the result is correct every time.

I will demonstrate this program with different input values, some of which convert correctly, and some of which don’t. In the end, you’ll appreciate one reason why library functions like strtod() exist — to perform efficient, correctly rounded conversion.

## When Doubles Don’t Behave Like Doubles

In my article “When Floats Don’t Behave Like Floats” I explained how calculations involving single-precision floating-point variables may be done, under the covers, in double or extended precision. This leads to anomalies in expected results, which I demonstrated with two C programs — compiled with Microsoft Visual C++ and run on a 32-bit Intel Core Duo processor.

In this article, I’ll do a similar analysis for double-precision floating-point variables, showing how similar anomalies arise when extended precision calculations are done. I modified my two example programs to use doubles instead of floats. Interestingly, the doubles version of program 2 does not exhibit the anomaly. I’ll explain.

## When Floats Don’t Behave Like Floats

These two programs — compiled with Microsoft Visual C++ and run on a 32-bit Intel Core Duo processor — demonstrate an anomaly that occurs when using single-precision floating point variables:

Program 1

```#include "stdio.h"
int main (void)
{
float f1 = 0.1f, f2 = 3.0f, f3;

f3 = f1 * f2;
if (f3 != f1 * f2)
printf("Not equal\n");
}
```

Prints “Not equal”.

Program 2

```#include "stdio.h"
int main (void)
{
float f1 = 0.7f, f2 = 10.0f, f3;
int i1, i2;

f3 = f1 * f2;
i1 = (int)f3;
i2 = (int)(f1 * f2);
if (i1 != i2)
printf("Not equal\n");
}
```

Prints “Not equal”.

In each case, f3 and f1 * f2 differ. But why? I’ll explain what’s going on.

## Displaying the Raw Fields of a Floating-Point Number

A double-precision floating-point number is represented internally as 64 bits, divided into three fields: a sign field, an exponent field, and a fraction field. You don’t need to know this to use floating-point numbers, but knowing it can help you understand them. This article shows you how to access those fields in C code, and how to print them — in binary or hex.

## Converting Floating-Point Numbers to Binary Strings in C

If you want to print a floating-point number in binary using C code, you can’t use printf() — it has no format specifier for it. That’s why I wrote a program to do it, a program I describe in this article.

(If you’re wondering why you’d want to print a floating-point number in binary, I’ll tell you that too.)

## A Simple C Program That Prints 2,098 Powers of Two

To write a computer program to print the first 1000 nonnegative powers of two, do you think you’d need to use arbitrary precision arithmetic? After all, 21000 is a 302-digit number. How about printing the first 1000 negative powers of two? 2-1000 weighs in at a whopping 1000 decimal places. It turns out all you need is standard double-precision floating-point arithmetic — and the right compiler!

## Print Precision of Floating-Point Integers Varies Too

Recently I showed that programming languages vary in how much precision they allow in printed floating-point fractions. Not only do they vary, but most don’t meet my standard — printing, to full precision, decimal values that have exact floating-point representations. Here I’ll present a similar study for floating-point integers, which had similar results.

## Print Precision of Dyadic Fractions Varies by Language

Interestingly, programming languages vary in how much precision they allow in printed floating-point fractions. You would think they’d all be the same, allowing you to print as many decimal places as you ask for. After all, a floating-point fraction is a dyadic fraction; it has as many decimal places as it has bits in its fractional representation.

Consider the dyadic fraction 5,404,319,552,844,595/253. Its decimal expansion is 0.59999999999999997779553950749686919152736663818359375, and its binary expansion is 0.10011001100110011001100110011001100110011001100110011. Both are 53 digits long. The ideal programming language lets you print all 53 decimal places, because all are meaningful. Unfortunately, many languages won’t let you do that; they typically cap the number of decimal places at between 15 and 17, which for our example might be 0.59999999999999998.

## Ten Ways to Check if an Integer Is a Power Of Two in C

To write a program to check if an integer is a power of two, you could follow two basic strategies: check the number based on its decimal value, or check it based on its binary representation. The former approach is more human-friendly but generally less efficient; the latter approach is more machine-friendly but generally more efficient. We will explore both approaches, comparing ten different but equivalent C functions.

## What Powers of Two Look Like Inside a Computer

A power of two, when expressed as a binary number, is easy to spot: it has one, and only one, 1 bit. For example, 1000, 10, and 0.001 are powers of two. Inside a computer, however, numbers are more generally represented in binary code, not as “pure” binary numbers. As a result, you may not be able to look at the binary representation of a number and tell at a glance whether it’s a power of two or not; it depends on how it’s encoded.