Many new programmers become aware of binary floating-point after seeing their programs give odd results: “Why does my program print 0.10000000000000001 when I enter 0.1?”; “Why does 0.3 + 0.6 = 0.89999999999999991?”; “Why does 6 * 0.1 not equal 0.6?” Questions like these are asked every day, on online forums like stackoverflow.com.
The answer is that most decimals have infinite representations in binary. Take 0.1 for example. It’s one of the simplest decimals you can think of, and yet it looks so complicated in binary:
The bits go on forever; no matter how many of those bits you store in a computer, you will never end up with the binary equivalent of decimal 0.1.
0.1 In Binary
The division process would repeat forever — and so too the digits in the quotient — because 100 (“one-zero-zero”) reappears as the working portion of the dividend. Recognizing this, we can abort the division and write the answer in repeating bicimal notation, as 0.00011.
0.1 In Floating-Point
0.00011 is a finite representation of an infinite number of digits. That doesn’t help us with floating-point. Floating-point does not represent numbers using repeat bars; it represents them with a fixed number of bits. In double-precision floating-point, for example, 53 bits are used, so the otherwise infinite representation is rounded to 53 significant bits.
Let’s see what 0.1 looks like in double-precision. First, let’s write it in binary, truncated to 57 significant bits:
Bits 54 and beyond total to greater than half the value of bit position 53, so this rounds up to
In decimal, this is
which is slightly greater than 0.1.
If you were to print that to 17 significant decimal digits you’d get 0.10000000000000001 (printing rounds the result as well). Note that if you were to print to less than 17 digits, the answer would be 0.1. That’s just an illusion though — the computer has not stored 0.1.
It Can Be Slightly Greater or Slightly Less Than 0.1
Depending on how many bits of precision are used, the floating-point approximation of 0.1 could be less than 0.1. For example, in half-precision, which uses 11 significant bits, 0.1 rounds to 0.0001100110011 in binary, which is 0.0999755859375 in decimal.
0.1 Is Just One of Many Examples
0.1 is the most commonly used example in discussions about floating-point “inaccuracies” — that is why I chose it. But there are many, many more examples. How can you tell if an arbitrary decimal has an equivalent bicimal that terminates or repeats?
Of course, you could do what I did above: convert the decimal to an integer over a power of ten and then do binary division. If you get a remainder of zero, the bicimal is terminating; if you encounter a working dividend you’ve seen before, the bicimal is repeating. This method is great because you see the binary representation unfold before your eyes. However, it’s tedious. Binary division is challenging, even if you know how to do it.
There is a simpler test: a decimal has an equivalent terminating bicimal if and only if the decimal, written as a proper fraction in lowest terms, has a denominator that is a power of two. (It takes a bit of number theory to understand why this works, but the explanation is similar to why decimals terminate only for fractions with powers of two and/or powers of five in their denominators.) By this rule, you can see that 0.1 has an infinite bicimal: 0.1 = 1/10, and 10 is not a power of two. 0.5, on the other hand, terminates: 0.5 = 5/10 = 1/2. If asked whether a decimal has a corresponding bicimal that terminates or repeats, this is the test to use.
Some Terminating Bicimals Don’t Exist in Floating-Point Either
It’s important to note that some decimals with terminating bicimals don’t exist in floating-point either. This happens when there are more bits than the precision allows for. For example,
but that’s 54 bits. Rounded to 53 bits it becomes
which in decimal is
Such precisely specified numbers are not likely to be used in real programs, so this is not an issue that’s likely to come up.
In pure math, every decimal has an equivalent bicimal. In floating-point math, this is just not true.