Decimal to Floating-Point Needs Arbitrary Precision

In my article “Quick and Dirty Decimal to Floating-Point Conversion” I presented a small C program that converts a decimal string to a double-precision binary floating-point number. The number it produces, however, is not necessarily the closest — or so-called correctly rounded — double-precision binary floating-point number. This is not a failing of the algorithm; mathematically speaking, the algorithm is correct. The flaw comes in its implementation in limited precision binary floating-point arithmetic.

The quick and dirty program is implemented in native C, so it’s limited to double-precision floating-point arithmetic (although on some systems, extended precision may be used). Higher precision arithmetic — in fact, arbitrary precision arithmetic — is needed to ensure that all decimal inputs are converted correctly. I will demonstrate the need for high precision by analyzing three examples, all taken from Vern Paxson’s paper “A Program for Testing IEEE Decimal–Binary Conversion”.

Continue reading “Decimal to Floating-Point Needs Arbitrary Precision”

Quick and Dirty Decimal to Floating-Point Conversion

This little C program converts a decimal value — represented as a string — into a double-precision floating-point number:

#include <string.h>

int main (void)
 double intPart = 0, fracPart = 0, conversion;
 unsigned int i;
 char decimal[] = "3.14159";

 i = 0; /* Left to right */
 while (decimal[i] != '.') {
    intPart = intPart*10 + (decimal[i] - '0');

 i = strlen(decimal)-1; /* Right to left */
 while (decimal[i] != '.') {
    fracPart = (fracPart + (decimal[i] - '0'))/10;

 conversion = intPart + fracPart;

The conversion is done using the elegant Horner’s method, summing each digit according to its decimal place value. So why do I call it “quick and dirty?” Because the binary floating-point value it produces is not necessarily the closest approximation to the input decimal value — the so-called correctly rounded result. (Remember that most real numbers cannot be represented exactly in floating-point.) Most of the time it will produce the correctly rounded result, but sometimes it won’t — the result will be off in its least significant bit(s). There’s just not enough precision in floating-point to guarantee the result is correct every time.

I will demonstrate this program with different input values, some of which convert correctly, and some of which don’t. In the end, you’ll appreciate one reason why library functions like strtod() exist — to perform efficient, correctly rounded conversion.

Continue reading “Quick and Dirty Decimal to Floating-Point Conversion”

When Doubles Don’t Behave Like Doubles

In my article “When Floats Don’t Behave Like Floats” I explained how calculations involving single-precision floating-point variables may be done, under the covers, in double or extended precision. This leads to anomalies in expected results, which I demonstrated with two C programs — compiled with Microsoft Visual C++ and run on a 32-bit Intel Core Duo processor.

In this article, I’ll do a similar analysis for double-precision floating-point variables, showing how similar anomalies arise when extended precision calculations are done. I modified my two example programs to use doubles instead of floats. Interestingly, the doubles version of program 2 does not exhibit the anomaly. I’ll explain.

Continue reading “When Doubles Don’t Behave Like Doubles”

When Floats Don’t Behave Like Floats

These two programs — compiled with Microsoft Visual C++ and run on a 32-bit Intel Core Duo processor — demonstrate an anomaly that occurs when using single-precision floating point variables:

Program 1

#include "stdio.h"
int main (void)
 float f1 = 0.1f, f2 = 3.0f, f3;

 f3 = f1 * f2;
 if (f3 != f1 * f2)
   printf("Not equal\n");

Prints “Not equal”.

Program 2

#include "stdio.h"
int main (void)
 float f1 = 0.7f, f2 = 10.0f, f3;
 int i1, i2;

 f3 = f1 * f2;
 i1 = (int)f3;
 i2 = (int)(f1 * f2);
 if (i1 != i2)
   printf("Not equal\n");

Prints “Not equal”.

In each case, f3 and f1 * f2 differ. But why? I’ll explain what’s going on.

Continue reading “When Floats Don’t Behave Like Floats”

Finding Numbers That Are Palindromic In Multiple Bases

A palindromic number, or number palindrome, is a number like 74347, which is the same written forward and backward.

A number can be palindromic in any base, not just decimal. For example, 101101 is a palindrome in binary. A number can also be palindromic in more than one base, like decimal 719848917, which is 101010111010000000010111010101 in binary and 5272002725 in octal.

An efficient way to find palindromes in a single base is to generate them, iterating through each integer and constructing palindromes from them. An efficient way to find numbers that are palindromic in multiple bases is to take a palindrome in one base and test if it’s a palindrome in one or more additional bases.

In this article, I’ll show you C code I wrote that finds multi-base numeric palindromes. I used this code to generate tables of numbers that are palindromic in decimal and binary, decimal and hexadecimal, and decimal and octal. I also used this code to solve Euler problem 36, which asks for the sum of all numbers, less than one million, that are palindromic in decimal and binary.

Continue reading “Finding Numbers That Are Palindromic In Multiple Bases”

Exploring Binary Numbers With PARI/GP Calculator

PARI/GP is an open source computer algebra system I use frequently in my study of binary numbers. It doesn’t manipulate binary numbers directly — input, and most output, is in decimal — so I use it mainly to do the next best thing: calculate with powers of two. Calculations with powers of two are, indirectly, calculations with binary numbers.

PARI/GP is a sophisticated tool, with several components — yet it’s easy to install and use. I use its command shell in particular, the PARI/GP calculator, or gp for short. I will show you how to use simple gp commands to explore binary numbers.

PARI/GP Calculator (Example Calculations) (My Setup on Windows)
PARI/GP Calculator (Sample of Calculations Used to Explore Binary Numbers)

Continue reading “Exploring Binary Numbers With PARI/GP Calculator”

Nines in Binary

I discovered a cool property of positive integers of the form 10n-1, that is, integers made up of n digits of 9s: they have binary representations that have exactly n digits of trailing 1s. For example, 9,999,999 in decimal is 100110001001011001111111 in binary.

The property is interesting in and of itself, but what is more interesting is the process I went through to discover it. It’s a small-scale example of experimental mathematics: I observed something interesting, experimented to collect more data, developed a hypothesis, and constructed a proof.

Continue reading “Nines in Binary”

How I Taught My Mother Binary Numbers

I introduced my mother to binary numbers a few weeks ago when I showed her my One Hundred Cheerios in Binary poster. It shows the decimal number 100 in binary — 1100100. She’s not an engineer but she’s good with numbers, so I knew she would get it — if only I could find the right way to explain it. Two days ago, I found the right way.

Continue reading “How I Taught My Mother Binary Numbers”

Converting Floating-Point Numbers to Binary Strings in C

If you want to print a floating-point number in binary using C code, you can’t use printf() — it has no format specifier for it. That’s why I wrote a program to do it, a program I describe in this article.

(If you’re wondering why you’d want to print a floating-point number in binary, I’ll tell you that too.)

Continue reading “Converting Floating-Point Numbers to Binary Strings in C”

Base Conversion in PHP Using BCMath

PHP has a component called BCMath which does arbitrary-precision, decimal arithmetic. I used BCMath in my decimal/binary converter because:

  • Arbitrary-precision lets it operate on very large and very small numbers, numbers that can’t be represented in standard computer word sizes.
  • Decimal arithmetic lets it use the same algorithms I’d use to convert between decimal and binary by hand.

(If you’ve written a conversion routine in standard code, especially one to convert decimal fractions to binary, you’ll see the advantage of the second point.)

This article describes the implementation of my conversion routines with BCMath.

Continue reading “Base Conversion in PHP Using BCMath”

Base Conversion In PHP Using Built-In Functions

The PHP programming language has many built-in functions for converting numbers from one base to another. In fact, it has so many functions that it can be hard to know which to use. Some functions have similar capabilities, and some work with parameters of different types. We’ll sort through the differences in this article, and explain the proper context in which to use each function.

Continue reading “Base Conversion In PHP Using Built-In Functions”