Paul Bristow, a Boost.Math library author and reader of my blog, recently alerted me to a problem he discovered many years ago in Visual C++: some double-precision floating-point values fail to round-trip through a stringstream as a 17-digit decimal string. Interestingly, the 17-digit strings that C++ generates are not the problem; they are correctly rounded. The problem is that the conversion of those strings to floating-point is sometimes incorrect, off by one binary ULP.
I’ve previously discovered that Visual Studio makes incorrect decimal to floating-point conversions, and that Microsoft is OK with it — at least based on their response to my now deleted bug report. But incorrect decimal to floating-point conversions in this context seems like a problem that needs fixing. When you serialize a double to a 17-digit decimal string, shouldn’t you get the same double back later? Apparently Microsoft doesn’t think so, because Paul’s bug report has also been deleted.
Continue reading …
The exact decimal equivalent of an arbitrary double-precision binary floating-point number is typically an unwieldy looking number, like this one:
0.1000000000000000055511151231257827021181583404541015625
In general, when you print a floating-point number, you don’t want to see all its digits; most of them are “garbage” in a sense anyhow. But how many digits do you need? You’d like a short string, yet you’d want it long enough so that it identifies the original floating-point number. A well-known result in computer science is that you need 17 significant decimal digits to identify an arbitrary double-precision floating-point number. If you were to round the exact decimal value of any floating-point number to 17 significant digits, you’d have a number that, when converted back to floating-point, gives you the original floating-point number; that is, a number that round-trips. For our example, that number is 0.10000000000000001.
But 17 digits is the worst case, which means that fewer digits — even as few as one — could work in many cases. The number required depends on the specific floating-point number. For our example, the short string 0.1 does the trick. This means that 0.1000000000000000055511151231257827021181583404541015625 and 0.10000000000000001 and 0.1 are the same, at least as far as their floating-point representations are concerned.
Continue reading …
This is a companion article to “Number of Bits in a Decimal Integer”
If you have an integer expressed in decimal and want to know how many bits are required to express it in binary, you can perform a simple calculation. If you want to know how many bits are required to express a d-digit decimal integer in binary, you can perform other simple calculations for that.
What if you want to go in the opposite direction, that is, from binary to decimal? There are similar calculations for determining the number of decimal digits required for a specific binary integer or for a b-bit binary integer. I will show you these calculations, which are essentially the inverses of their decimal to binary counterparts.
Continue reading …
Numbers are entered into computers as strings of text. These strings are converted to binary, into the numeric form understood by the computer’s hardware. Numbers with a decimal point — numbers we think of as real numbers — are converted into a format called binary floating-point. The procedure that converts decimal strings to binary floating-point — IEEE double-precision binary floating-point in particular — goes by the name strtod(), which stands for string to double.
Converting decimal strings to doubles correctly and efficiently is a surprisingly complex task; fortunately, David M. Gay did this for us when he wrote this paper and released this code over 20 years ago. (He maintains this reference copy of strtod() to this day.) His code appears in many places, including in the Python, PHP, and Java programming languages, and in the Firefox, Chrome, and Safari Web browsers.
I’ve spent considerable time reverse engineering strtod(); neither the paper nor the code are easy reads. I’ve written articles about how each of its major pieces work, and I’ve discovered bugs (as have a few of my readers) along the way. This article ties all of my strtod() research together.
Continue reading …
Many new programmers become aware of binary floating-point after seeing their programs give odd results: “Why does my program print 0.10000000000000001 when I enter 0.1?”; “Why does 0.3 + 0.6 = 0.89999999999999991?”; “Why does 6 * 0.1 not equal 0.6?” Questions like these are asked every day, on online forums like stackoverflow.com.
The answer is that most decimals have infinite representations in binary. Take 0.1 for example. It’s one of the simplest decimals you can think of, and yet it looks so complicated in binary:

Decimal 0.1 In Binary ( To 1369 Places)
The bits go on forever; no matter how many of those bits you store in a computer, you will never end up with the binary equivalent of decimal 0.1.
Continue reading …
I’ve shown you two ways to convert a bicimal to a fraction: the subtraction method and the direct method. In this article, I will show you a third method — a common method I call the series method — that uses the formula for infinite geometric series to create the fraction.
Continue reading …
There are several ways to convert a repeating bicimal to a fraction. I’ve shown you the subtraction method; now I’ll show you the direct method, my name for the method that creates a fraction directly, using a numerator and denominator of well-known form.

Example of Direct Method (7/12 in Decimal)
Continue reading …
In my article “Binary Division” I showed how binary long division converts a fraction to a repeating bicimal. In this article, I’ll show you a well-known procedure — what I call the subtraction method — to do the reverse: convert a repeating bicimal to a fraction.

Equivalent Representations of 47/12, in Binary
Continue reading …
This is the fourth of a four part series on “pencil and paper” binary arithmetic, which I’ve written as a supplement to my binary calculator. The first article discusses binary addition; the second article discusses binary subtraction; the third article discusses binary multiplication; this article discusses binary division.

Example of Binary Division
The pencil-and-paper method of binary division is the same as the pencil-and-paper method of decimal division, except that binary numerals are manipulated instead. As it turns out though, binary division is simpler. There is no need to guess and then check intermediate quotients; they are either 0 are 1, and are easy to determine by sight.
Continue reading …
This is the third of a four part series on “pencil and paper” binary arithmetic, which I’m writing as a supplement to my binary calculator. The first article discusses binary addition; the second article discusses binary subtraction; this article discusses binary multiplication.

Example of Binary Multiplication
The pencil-and-paper method of binary multiplication is just like the pencil-and-paper method of decimal multiplication; the same algorithm applies, except binary numerals are manipulated instead. The way it works out though, binary multiplication is much simpler. The multiplier contains only 0s and 1s, so each multiplication step produces either zeros or a copy of the multiplicand. So binary multiplication is not multiplication at all — it’s just repeated binary addition!
Continue reading …
Today is the 100th day of school at my son’s elementary school. I’ve had my binary influence on prior 100th day projects, and this year was to be no different. But alas, his class is not doing one this year. I didn’t want to waste the acorn tops we saved though, so I made my own 100th day project (well not quite — I didn’t glue them):

A Binary Multiplication Problem Expressed With One Hundred Acorn Tops.
Continue reading …
This is the second of a four part series on “pencil and paper” binary arithmetic, which I’m writing as a supplement to my binary calculator. The first article discusses binary addition; this article discusses binary subtraction.

Example of Binary Subtraction
The pencil-and-paper method of binary subtraction is just like the pencil-and-paper method of decimal subtraction you learned in elementary school. Instead of manipulating decimal numerals, however, you manipulate binary numerals, according to a basic set of rules or “facts.”
Continue reading …