Maximum Number of Decimal Digits In Binary Floating-Point Numbers

I’ve written about the formulas used to compute the number of decimal digits in a binary integer and the number of decimal digits in a binary fraction. In this article, I’ll use those formulas to determine the maximum number of digits required by the double-precision (double), single-precision (float), and quadruple-precision (quad) IEEE binary floating-point formats.

The maximum digit counts are useful if you want to print the full decimal value of a floating-point number (worst case format specifier and buffer size) or if you are writing or trying to understand a decimal to floating-point conversion routine (worst case number of input digits that must be converted).

Continue reading “Maximum Number of Decimal Digits In Binary Floating-Point Numbers”

Number of Decimal Digits In a Binary Fraction

The binary fraction 0.101 converts to the decimal fraction 0.625; the binary fraction 0.1010001 converts to the decimal fraction 0.6328125; the binary fraction 0.00111011011 converts to the decimal fraction 0.23193359375. In each of those examples, the binary fraction converts to a decimal fraction — that is, a terminating decimal representation — that has the same number of digits as the binary fraction has bits.

One digit per bit? We know that’s not true for binary integers. But it is true for binary fractions; every binary fraction of length n has a corresponding equivalent decimal fraction of length n.

This is the reason why you get all those “extra” digits when you print the full decimal value of an IEEE binary floating-point fraction, and why glibc strtod() and Visual C++ strtod() were once broken.

Continue reading “Number of Decimal Digits In a Binary Fraction”

17 Digits Gets You There, Once You’ve Found Your Way

Every double-precision floating-point number can be specified with 17 significant decimal digits or less. A simple way to generate this 17-digit number is to round the full-precision decimal value of the double to 17 digits. For example, the double-precision value 0x1.6d4c11d09ffa1p-3, which in decimal is 1.783677474777478899614635565740172751247882843017578125 x 10-1, can be recovered from the decimal floating-point literal 1.7836774747774789e-1. The extra digits are unnecessary, since they will only take you to the same double.

On the other hand, an arbitrary, arbitrarily long decimal literal rounded or truncated to 17 digits may not convert to the double-precision value it’s supposed to. This is a subtle point, one that has even tripped up implementers of widely used decimal to floating-point conversion routines (glibc strtod() and Visual C++ strtod(), for example).

Continue reading “17 Digits Gets You There, Once You’ve Found Your Way”

Decimal Precision of Binary Floating-Point Numbers

How many decimal digits of precision does a binary floating-point number have?

For example, does an IEEE single-precision binary floating-point number, or float as it’s known, have 6-8 digits? 7-8 digits? 6-9 digits? 6 digits? 7 digits? 7.22 digits? 6-112 digits? (I’ve seen all those answers on the Web.)

Continue reading “Decimal Precision of Binary Floating-Point Numbers”

The Safe Range For PHP’s base_convert()

PHP’s base_convert() is a useful function that converts integers between any pair of bases, 2 through 36. However, you might hesitate to use it after reading this vague and mysterious warning in its documentation:

base_convert() may lose precision on large numbers due to properties related to the internal “double” or “float” type used.

The truth is that it works perfectly for integers up to a certain maximum — you just have to know what that is. I will show you this maximum value in each of the 35 bases, and how to check if the values you are using are within this limit.

Continue reading “The Safe Range For PHP’s base_convert()”

An Hour of Code… A Lifelong Lesson in Floating-Point

The 2015 edition of Hour of Code includes a new blocks-based, Star Wars themed coding lesson. In one of the exercises — a simple sprite-based game — you are asked to code a loop that adds 100 to your score every time R2-D2 encounters a Rebel Pilot. But instead of 100, I plugged in a floating-point number; I got the expected “unexpected” results.
Floating-point score in Star Wars Hour of Code exercise

Continue reading “An Hour of Code… A Lifelong Lesson in Floating-Point”

Visual C++ strtod(): Still Broken

About a year ago Bruce Dawson informed me that Microsoft is fixing their decimal to floating-point conversion routine in the next release of Visual Studio; I finally made the time to test the new code. I installed Visual Studio Community 2015 Release Candidate and ran my old C++ testcases. The good news: all of the individual conversion errors that I wrote about are fixed. The bad news: many errors remain.

Continue reading “Visual C++ strtod(): Still Broken”

The Inequality That Governs Round-Trip Conversions: A Partial Proof

I have been writing about the spacing of decimal and binary floating-point numbers, and about how their relative spacing determines whether numbers round-trip between those two bases. I’ve stated an inequality that captures the required spacing, and from it I have derived formulas that specify the number of digits required for round-trip conversions. I have not proven that this inequality holds, but I will prove “half” of it here. (I’m looking for help to prove the other half.)

Continue reading “The Inequality That Governs Round-Trip Conversions: A Partial Proof”

Number of Digits Required For Round-Trip Conversions

In my article “7 Bits Are Not Enough for 2-Digit Accuracy” I showed how the relative spacing of decimal and binary floating-point numbers dictates when all conversions between those two bases round-trip. There are two formulas that capture this relationship, and I will derive them in this article. I will also show that it takes one more digit (or bit) of precision to round-trip a floating-point number than it does to round-trip an integer of equal significance.

Continue reading “Number of Digits Required For Round-Trip Conversions”

7 Bits Are Not Enough for 2-Digit Accuracy

In the 1960s, I. Bennett Goldberg and David W. Matula published papers relating floating-point number systems of different bases, showing the conditions under which conversions between them round-trip; that is, when conversion to another base and back returns the original number. Independently, both authors derived the formula that specifies the number of significant digits required for round-trip conversions.

In his paper “27 Bits Are Not Enough for 8-Digit Accuracy”, Goldberg shows the formula in the context of decimal to binary floating-point conversions. He starts with a simple example — a 7-bit binary floating-point system — and shows that it does not have enough precision to round-trip all 2-digit decimal floating-point numbers. I took his example and put it into diagrams, giving you a high level view of what governs round-trip conversions. I also extended his example to show that the same concept applies to binary to decimal floating-point round-trips.
Relative Spacing Governs Round-Trips

The well-known digit counts for round-trip conversions to and from IEEE 754 floating-point are dictated by these same principles.

Continue reading “7 Bits Are Not Enough for 2-Digit Accuracy”

Nine Ways to Display a Floating-Point Number

(Updated June 22, 2015: added a tenth display form, “decimal integer times a power of ten”.)

In the strictest sense, converting a decimal number to binary floating-point means putting it in IEEE 754 format — a multi-byte structure composed of a sign field, an exponent field, and a significand field. Viewing it in this raw form (binary or hex) is useful, but there are other forms that are more enlightening.

I’ve written an online converter that takes a decimal number as input, converts it to floating-point, and then displays its exact floating-point value in ten forms (including the two raw IEEE forms). I will show examples of these forms in this article.

Continue reading “Nine Ways to Display a Floating-Point Number”

PHP Converts 2.2250738585072012e-308 Incorrectly

While testing my new decimal to floating-point converter I discovered a bug in old territory: PHP incorrectly converts the number 2.2250738585072012e-308.

<?php printf("%.17g",2.2250738585072012e-308); ?>

This prints 2.2250738585072009E-308; it should print 2.2250738585072014e-308. (I verified that the internal double value is wrong; the printed value correctly represents it.)

Continue reading “PHP Converts 2.2250738585072012e-308 Incorrectly”