Articles with the ‘Decimals’ Tag

Adjusting the Floating-Point Approximation in strtod()

I’ve discussed how David Gay’s strtod() function computes an initial floating-point approximation and then uses a loop to check and correct it, but I have not discussed how the correction is made. That is the last piece of the strtod() puzzle, and I will cover it in this article.

Continue reading …

“0.1 Repeating” In Binary Equals 1

In decimal, “0.9 repeating”, or 0.9, equals 1. In binary, a similar thing is true: “0.1 repeating”, or 0.1, equals 1. I’ll show you three ways to prove it, using the three bicimal to fraction conversion algorithms I described recently.

Continue reading …

Why 0.1 Does Not Exist In Floating-Point

Many new programmers become aware of binary floating-point after seeing their programs give odd results: “Why does my program print 0.10000000000000001 when I enter 0.1?”; “Why does 0.3 + 0.6 = 0.89999999999999991?”; “Why does 6 * 0.1 not equal 0.6?” Questions like these are asked every day, on online forums like stackoverflow.com.

The answer is that most decimals have infinite representations in binary. Take 0.1 for example. It’s one of the simplest decimals you can think of, and yet it looks so complicated in binary:

Decimal 0.1 In Binary ( To 1369 Places)

Decimal 0.1 In Binary ( To 1369 Places)

The bits go on forever; no matter how many of those bits you store in a computer, you will never end up with the binary equivalent of decimal 0.1.

Continue reading …

Converting a Bicimal to a Fraction (Subtraction Method)

In my article “Binary Division” I showed how binary long division converts a fraction to a repeating bicimal. In this article, I’ll show you a well-known procedure — what I call the subtraction method — to do the reverse: convert a repeating bicimal to a fraction.

Equivalent Representations of 47/12, in Binary

Equivalent Representations of 47/12, in Binary

Continue reading …

Bicimals

There is no widely accepted term for fractional binary numbers like 0.11001. A fractional decimal number like 0.427 is called a decimal or decimal fraction. A fractional binary number is called many things, including binary fraction, binary decimal, binary expansion, bicimal, binimal, binary radix fraction, and binary fractional (my term). In this article, I’m going to argue that bicimal should be the universal term.

(Please let me know what you think — take the poll at the end of this article.)

Continue reading …

Binary Division

This is the fourth of a four part series on “pencil and paper” binary arithmetic, which I’ve written as a supplement to my binary calculator. The first article discusses binary addition; the second article discusses binary subtraction; the third article discusses binary multiplication; this article discusses binary division.

An Example of Binary Division

Example of Binary Division

The pencil-and-paper method of binary division is the same as the pencil-and-paper method of decimal division, except that binary numerals are manipulated instead. As it turns out though, binary division is simpler. There is no need to guess and then check intermediate quotients; they are either 0 are 1, and are easy to determine by sight.

Continue reading …

Bigcomp: Deciding Truncated, Near Halfway Conversions

In my article “Using Integers to Check a Floating-Point Approximation,” I briefly mentioned “bigcomp,” an optimization strtod() uses to reduce big integer overhead when checking long decimal inputs. bigcomp does a floating-point to decimal conversion — right in the middle of a decimal to floating-point conversion mind you — to generate the decimal expansion of the number halfway between two target floating-point numbers. This decimal expansion is compared to the input decimal string, and the result of the comparison dictates which of the two target numbers is the correctly rounded result.

In this article, I’ll explain how bigcomp works, and when it applies. Also, I’ll talk briefly about its performance; my informal testing shows that, under the default setting, bigcomp actually worsens performance for some inputs.

Continue reading …

Using Integers to Check a Floating-Point Approximation

For decimal inputs that don’t qualify for fast path conversion, David Gay’s strtod() function does three things: first, it uses IEEE double-precision floating-point arithmetic to calculate an approximation to the correct result; next, it uses arbitrary-precision integer arithmetic (AKA big integers) to check if the approximation is correct; finally, it adjusts the approximation, if necessary. In this article, I’ll explain the second step — how the check of the approximation is done.

Continue reading …

strtod()’s Initial Decimal to Floating-Point Approximation

David Gay’s strtod() function does decimal to floating-point conversion using both IEEE double-precision floating-point arithmetic and arbitrary-precision integer arithmetic. For some inputs, a simple IEEE floating-point calculation suffices to produce the correct result; for other inputs, a combination of IEEE arithmetic and arbitrary-precision arithmetic is required. In the latter case, IEEE arithmetic is used to calculate an approximation to the correct result, which is then refined using arbitrary-precision arithmetic. In this article, I’ll describe the approximation calculation, which is based on a form of binary exponentiation.

Continue reading …

Fast Path Decimal to Floating-Point Conversion

In general, to convert an arbitrary decimal number into a binary floating-point number, arbitrary-precision arithmetic is required. However, a subset of decimal numbers can be converted correctly with just ordinary limited-precision IEEE floating-point arithmetic, taking what I call the fast path to conversion. Fast path conversion is an optimization used in practice: it’s in David Gay’s strtod() function and in Java’s FloatingDecimal class. I will explain how fast path conversion works, and describe the set of numbers that qualify for it.

Continue reading …

Correct Decimal To Floating-Point Using Big Integers

Producing correctly rounded decimal to floating-point conversions is hard, but only because it is made to be done efficiently. There is a simple algorithm that produces correct conversions, but it’s too slow — it’s based entirely on arbitrary-precision integer arithmetic. Nonetheless, you should know this algorithm, because it will help you understand the highly-optimized conversion routines used in practice, like David Gay’s strtod() function. I will outline the algorithm, which is easily implemented in a language like C, using a “big integer” library like GMP.

Ratio of Big Integers (2^119/10^20) Producing the 53-Bit Significand of 1e-20

Ratio of Big Integers (2119/1020) Producing the 53-Bit Significand of 1e-20

Continue reading …

Pi and e In Binary

Some people are curious about the binary representations of the mathematical constants pi and e. Mathematically, they’re like every other irrational number — infinite strings of 0s and 1s (with no discernible pattern). In a computer, they’re finite, making them only approximations to their true values. I will show you what their approximations look like in five different levels of binary floating-point precision.

The first 43 bits of pi and e

The first 43 bits of pi and e

Continue reading …