## Numbers Greater Than DBL_MAX Should Convert To DBL_MAX When Rounding Toward Zero

I was testing David Gay’s most recent fixes to strtod() with different rounding modes and discovered that Apple Clang C++ (Xcode) and Microsoft Visual Studio C++ produce incorrect results for round towards zero and round down modes: their strtod()s convert numbers greater than DBL_MAX to infinity, not DBL_MAX. At first I thought Gay’s strtod() was wrong, but Dave pointed out that the IEEE 754 spec requires such conversions to be monotonic.

## Anomalies In IntelliJ Kotlin Floating-Point Literal Inspection

IntelliJ IDEA has a code inspection for Kotlin that will warn you if a decimal floating-point literal exceeds the precision of its type (Float or Double). It will suggest an equivalent literal (one that maps to the same binary floating-point number) that has fewer digits, or has the same number of digits but is closer to the floating-point number.

For Doubles for example, every literal over 17-digits should be flagged, since it never takes more than 17 digits to specify any double-precision binary floating-point value. Literals with 16 or 17 digits should be flagged if there is a replacement that is shorter or closer. And no literal with 15 digits or fewer should ever be flagged, since doubles have of 15-digits of precision.

But IntelliJ doesn’t always adhere to that, like when it suggests an 18-digit replacement for a 13-digit literal!

## Direct Generation of Double Rounding Error Conversions in Kotlin

For my recent search for short examples of double rounding errors in decimal to double to float conversions I wrote a Kotlin program to generate and test random decimal strings. While this was sufficient to find examples, I realized I could do a more direct search by generating only decimal strings with the underlying double rounding error bit patterns. I’ll show you the Java BigDecimal based Kotlin program I wrote for this purpose.

## Double Rounding Errors in Decimal to Double to Float Conversions

In my previous exploration of double rounding errors in decimal to float conversions I showed two decimal numbers that experienced a double rounding error when converted to float (single-precision) through an intermediate double (double-precision). I generated the examples indirectly by setting bit combinations that forced the error, using their corresponding exact decimal representations. As a result, the decimal numbers were long (55 digits each). Mark Dickinson derived a much shorter 17 digit example, but I hadn’t contemplated how to generate even shorter numbers — or whether they existed at all — until Per Vognsen wrote me recently to ask.

The easiest way for me to approach Per’s question was to search for examples, rather than try to find a way to construct them. As such, I wrote a simple Kotlin1 program to generate decimal strings and check them. I tested all float-range (including subnormal) decimal numbers of 9 digits or fewer, and tens of billions of random 10 to 17 digit float-range (normal only) numbers. I found example 7 to 17 digit numbers that, when converted to float through a double, suffer a double rounding error.

## Maximum Number of Decimal Digits In Binary Floating-Point Numbers

I’ve written about the formulas used to compute the number of decimal digits in a binary integer and the number of decimal digits in a binary fraction. In this article, I’ll use those formulas to determine the maximum number of digits required by the double-precision (double), single-precision (float), and quadruple-precision (quad) IEEE binary floating-point formats.

The maximum digit counts are useful if you want to print the full decimal value of a floating-point number (worst case format specifier and buffer size) or if you are writing or trying to understand a decimal to floating-point conversion routine (worst case number of input digits that must be converted).

## Number of Decimal Digits In a Binary Fraction

The binary fraction 0.101 converts to the decimal fraction 0.625; the binary fraction 0.1010001 converts to the decimal fraction 0.6328125; the binary fraction 0.00111011011 converts to the decimal fraction 0.23193359375. In each of those examples, the binary fraction converts to a decimal fraction — that is, a terminating decimal representation — that has the same number of digits as the binary fraction has bits.

One digit per bit? We know that’s not true for binary integers. But it is true for binary fractions; every binary fraction of length n has a corresponding equivalent decimal fraction of length n.

This is the reason why you get all those “extra” digits when you print the full decimal value of an IEEE binary floating-point fraction, and why glibc strtod() and Visual C++ strtod() were once broken.

## 17 Digits Gets You There, Once You’ve Found Your Way

Every double-precision floating-point number can be specified with 17 significant decimal digits or less. A simple way to generate this 17-digit number is to round the full-precision decimal value of the double to 17 digits. For example, the double-precision value 0x1.6d4c11d09ffa1p-3, which in decimal is 1.783677474777478899614635565740172751247882843017578125 x 10-1, can be recovered from the decimal floating-point literal 1.7836774747774789e-1. The extra digits are unnecessary, since they will only take you to the same double.

On the other hand, an arbitrary, arbitrarily long decimal literal rounded or truncated to 17 digits may not convert to the double-precision value it’s supposed to. This is a subtle point, one that has even tripped up implementers of widely used decimal to floating-point conversion routines (glibc strtod() and Visual C++ strtod(), for example).

## Decimal Precision of Binary Floating-Point Numbers

How many decimal digits of precision does a binary floating-point number have?

For example, does an IEEE single-precision binary floating-point number, or float as it’s known, have 6-8 digits? 7-8 digits? 6-9 digits? 6 digits? 7 digits? 7.22 digits? 6-112 digits? (I’ve seen all those answers on the Web.)

## The Safe Range For PHP’s base_convert()

PHP’s base_convert() is a useful function that converts integers between any pair of bases, 2 through 36. However, you might hesitate to use it after reading this vague and mysterious warning in its documentation:

base_convert() may lose precision on large numbers due to properties related to the internal “double” or “float” type used.

The truth is that it works perfectly for integers up to a certain maximum — you just have to know what that is. I will show you this maximum value in each of the 35 bases, and how to check if the values you are using are within this limit.

## An Hour of Code… A Lifelong Lesson in Floating-Point

The 2015 edition of Hour of Code includes a new blocks-based, Star Wars themed coding lesson. In one of the exercises — a simple sprite-based game — you are asked to code a loop that adds 100 to your score every time R2-D2 encounters a Rebel Pilot. But instead of 100, I plugged in a floating-point number; I got the expected “unexpected” results.

## Visual C++ strtod(): Still Broken

About a year ago Bruce Dawson informed me that Microsoft is fixing their decimal to floating-point conversion routine in the next release of Visual Studio; I finally made the time to test the new code. I installed Visual Studio Community 2015 Release Candidate and ran my old C++ testcases. The good news: all of the individual conversion errors that I wrote about are fixed. The bad news: many errors remain.