Nine Ways to Display a Floating-Point Number

(Updated June 22, 2015: added a tenth display form, “decimal integer times a power of ten”.)

In the strictest sense, converting a decimal number to binary floating-point means putting it in IEEE 754 format — a multi-byte structure composed of a sign field, an exponent field, and a significand field. Viewing it in this raw form (binary or hex) is useful, but there are other forms that are more enlightening.

I’ve written an online converter that takes a decimal number as input, converts it to floating-point, and then displays its exact floating-point value in ten forms (including the two raw IEEE forms). I will show examples of these forms in this article.

Continue reading “Nine Ways to Display a Floating-Point Number”

Pi and e In Binary

Some people are curious about the binary representations of the mathematical constants pi and e. Mathematically, they’re like every other irrational number — infinite strings of 0s and 1s (with no discernible pattern). In a computer, they’re finite, making them only approximations to their true values. I will show you what their approximations look like in five different levels of binary floating-point precision.

The first 43 bits of pi and e
The first 43 bits of pi and e

Continue reading “Pi and e In Binary”

15-Digit Quick and Dirty Conversions Don’t Round-Trip

In my article “Fifteen Digits Don’t Round-Trip Through SQLite Reals” I showed examples of decimal floating-point numbers — 15 significant digits or less — that don’t round-trip through double-precision binary floating-point variables stored in SQLite. The round-trip failures occur because SQLite’s floating-point to decimal conversion routine uses limited-precision floating-point arithmetic.

My quick and dirty floating-point to decimal conversion routine, which I wrote to demonstrate conversion inaccuracies caused by limited-precision, also fails to round-trip some decimal numbers of 15 digits or less. Since I hadn’t demonstrated this failure previously, I will do so here.

Continue reading “15-Digit Quick and Dirty Conversions Don’t Round-Trip”

Quick and Dirty Floating-Point to Decimal Conversion

In my article “Quick and Dirty Decimal to Floating-Point Conversion” I presented a small C program that uses double-precision floating-point arithmetic to convert decimal strings to binary floating-point numbers. The program converts some numbers incorrectly, despite using an algorithm that’s mathematically correct; its limited precision calculations are to blame. I dubbed the program “quick and dirty” because it’s simple, and overall converts reasonably accurately.

For this article, I took a similar approach to the conversion in the opposite direction — from binary floating-point to decimal string. I wrote a small C program that combines two mathematically correct algorithms: the classic “repeated division by ten” algorithm to convert integer values, and the classic “repeated multiplication by ten” algorithm to convert fractional values. The program uses double-precision floating-point arithmetic, so like its quick and dirty decimal to floating-point counterpart, its conversions are not always correct — though reasonably accurate. I’ll present the program and analyze some example conversions, both correct and incorrect.

Continue reading “Quick and Dirty Floating-Point to Decimal Conversion”

Inconsistent Rounding of Printed Floating-Point Numbers

What does this C program print?

#include <stdio.h>
int main (void)
{
 printf ("%.1f\n",0.25);
}

The answer depends on which compiler you use. If you compile the program with Visual C++ and run on it on Windows, it prints 0.3; if you compile it with gcc and run it on Linux, it prints 0.2.

The compilers — actually, their run time libraries — are using different rules to break decimal rounding ties. The two-digit number 0.25, which has an exact binary floating-point representation, is equally near two one-digit decimal numbers: 0.2 and 0.3; either is an acceptable answer. Visual C++ uses the round-half-away-from-zero rule, and gcc (actually, glibc) uses the round-half-to-even rule, also known as bankers’ rounding.

This inconsistency of printed output is not limited to C — it spans many programming environments. In all, I tested fixed-format printing in nineteen environments: in thirteen of them, round-half-away-from-zero was used; in the remaining six, round-half-to-even was used. I also discovered an anomaly in some environments: numbers like 0.15 — which look like halfway cases but are actually not when viewed in binary — may be rounded incorrectly. I’ll report my results in this article.

Continue reading “Inconsistent Rounding of Printed Floating-Point Numbers”

Hexadecimal Floating-Point Constants

Hexadecimal floating-point constants, also known as hexadecimal floating-point literals, are an alternative way to represent floating-point numbers in a computer program. A hexadecimal floating-point constant is shorthand for binary scientific notation, which is an abstract — yet direct — representation of a binary floating-point number. As such, hexadecimal floating-point constants have exact representations in binary floating-point, unlike decimal floating-point constants, which in general do not.

Hexadecimal floating-point constants are useful for two reasons: they bypass decimal to floating-point conversions, which are sometimes done incorrectly, and they bypass floating-point to decimal conversions which, even if done correctly, are often limited to a fixed number of decimal digits. In short, their advantage is that they allow for direct control of floating-point variables, letting you read and write their exact contents.

In this article, I’ll show you what hexadecimal floating-point constants look like, and how to use them in C.

Continue reading “Hexadecimal Floating-Point Constants”

The Structure of Binary/Hexadecimal Palindromes

Binary/hexadecimal palindromes are integers that are palindromic in both binary and hexadecimal. Unlike binary/decimal palindromes, for example, they have a predictable structure. This means they can be generated directly, rather than searched for. So what is their structure?

Certainly they’re made up of the hexadecimal digits that are themselves palindromic in binary: 0, 6, 9, F; for example, F060F16 = 111100000110000011112 and 9F916 = 1001111110012. Each of these four hexadecimal digits maps neatly to a 4-digit binary palindrome, so any hexadecimal palindrome made from them is automatically palindromic in binary.

But there are other binary/hexadecimal palindromes, like 52516 = 101001001012 and 7020716 = 11100000010000001112, that contain hexadecimal digits that are not palindromic in binary. In this case, binary palindromes are produced with combinations of hexadecimal digits. It turns out there are a limited number of valid combinations, and that they’re localized — they span only two hexadecimal digits.

In this article, I’ll analyze binary/hexadecimal palindromes and describe their structure — a structure due to the relationship of the two bases, binary and hexadecimal.

Example Binary/Hexadecimal Palindromes
Example Binary/Hexadecimal Palindromes

Continue reading “The Structure of Binary/Hexadecimal Palindromes”