Articles with the ‘Binary arithmetic’ Tag

Why 0.1 Does Not Exist In Floating-Point

Many new programmers become aware of binary floating-point after seeing their programs give odd results: “Why does my program print 0.10000000000000001 when I enter 0.1?”; “Why does 0.3 + 0.6 = 0.89999999999999991?”; “Why does 6 * 0.1 not equal 0.6?” Questions like these are asked every day, on online forums like stackoverflow.com.

The answer is that most decimals have infinite representations in binary. Take 0.1 for example. It’s one of the simplest decimals you can think of, and yet it looks so complicated in binary:

Decimal 0.1 In Binary ( To 1369 Places)

Decimal 0.1 In Binary ( To 1369 Places)

The bits go on forever; no matter how many of those bits you store in a computer, you will never end up with the binary equivalent of decimal 0.1.

Continue reading …

Converting a Bicimal to a Fraction (Direct Method)

There are several ways to convert a repeating bicimal to a fraction. I’ve shown you the subtraction method; now I’ll show you the direct method, my name for the method that creates a fraction directly, using a numerator and denominator of well-known form.

Example of Direct Method (7/12 in Decimal)

Example of Direct Method (7/12 in Decimal)

Continue reading …

Converting a Bicimal to a Fraction (Subtraction Method)

In my article “Binary Division” I showed how binary long division converts a fraction to a repeating bicimal. In this article, I’ll show you a well-known procedure — what I call the subtraction method — to do the reverse: convert a repeating bicimal to a fraction.

Equivalent Representations of 47/12, in Binary

Equivalent Representations of 47/12, in Binary

Continue reading …

Binary Division

This is the fourth of a four part series on “pencil and paper” binary arithmetic, which I’ve written as a supplement to my binary calculator. The first article discusses binary addition; the second article discusses binary subtraction; the third article discusses binary multiplication; this article discusses binary division.

An Example of Binary Division

Example of Binary Division

The pencil-and-paper method of binary division is the same as the pencil-and-paper method of decimal division, except that binary numerals are manipulated instead. As it turns out though, binary division is simpler. There is no need to guess and then check intermediate quotients; they are either 0 are 1, and are easy to determine by sight.

Continue reading …

Binary Multiplication

This is the third of a four part series on “pencil and paper” binary arithmetic, which I’m writing as a supplement to my binary calculator. The first article discusses binary addition; the second article discusses binary subtraction; this article discusses binary multiplication.

An Example of Binary Multiplication

Example of Binary Multiplication

The pencil-and-paper method of binary multiplication is just like the pencil-and-paper method of decimal multiplication; the same algorithm applies, except binary numerals are manipulated instead. The way it works out though, binary multiplication is much simpler. The multiplier contains only 0s and 1s, so each multiplication step produces either zeros or a copy of the multiplicand. So binary multiplication is not multiplication at all — it’s just repeated binary addition!

Continue reading …

Binary Subtraction

This is the second of a four part series on “pencil and paper” binary arithmetic, which I’m writing as a supplement to my binary calculator. The first article discusses binary addition; this article discusses binary subtraction.

An Example of Binary Subtraction

Example of Binary Subtraction

The pencil-and-paper method of binary subtraction is just like the pencil-and-paper method of decimal subtraction you learned in elementary school. Instead of manipulating decimal numerals, however, you manipulate binary numerals, according to a basic set of rules or “facts.”

Continue reading …

Binary Addition

This is the first of a four part series on “pencil and paper” binary arithmetic, which I’m writing as a supplement to my binary calculator. This article introduces binary arithmetic, and then discusses binary addition.

An Example of Binary Addition

Example of Binary Addition

Continue reading …

Incorrect Decimal to Floating-Point Conversion In SQLite

SQLite has a limited-precision floating-point to decimal conversion routine which it uses to print double-precision floating-point values retrieved from a database. As I’ve discovered, its limited-precision conversion results in decimal numbers of 15 significant digits or less that won’t round-trip. For example, if you store the number 9.944932e+31, it will print back as 9.94493200000001e+31.

SQLite also has a limited-precision decimal to floating-point conversion routine, which it uses to convert input decimal numbers to double-precision floating-point numbers for storage in a database. I’ve found that some of its conversions are incorrect — by as many as four ULPs — and that some decimal numbers fail to round-trip because of this; “garbage in, garbage out” as they say.

Continue reading …

Fifteen Digits Don’t Round-Trip Through SQLite Reals

I’ve discovered that decimal floating-point numbers of 15 significant digits or less don’t always round-trip through SQLite. Consider this example, executed on version 3.7.3 of the pre-compiled SQLite command shell:

sqlite> create table t1(d real);
sqlite> insert into t1 values(9.944932e+31);
sqlite> select * from t1;
9.94493200000001e+31

SQLite represents a decimal floating-point number that has real affinity as a double-precision binary floating-point number — a double. A decimal number of 15 significant digits or less is supposed to be recoverable from its double-precision representation. In SQLite, however, this guarantee is not met; this is because its floating-point to decimal conversion routine is implemented in limited-precision floating-point arithmetic.

Continue reading …

The Answer is One (Unless You Use Floating-Point)

What does this C function do?

double f(double a)
{
 double b, c;

 b = 10*a - 10;
 c = a - 0.1*b;

 return (c);
}

Based solely on reading the code, you’ll conclude that it always returns 1: c = a – 0.1*(10*a – 10) = a – (a-1) = 1. But if you execute the code, you’ll find that it may or may not return 1, depending on the input. If you know anything about binary floating-point arithmetic, that won’t surprise you; what might surprise you is how far from 1 the answer can be — as far away as a large negative number!

Continue reading …

Quick and Dirty Floating-Point to Decimal Conversion

In my article “Quick and Dirty Decimal to Floating-Point Conversion” I presented a small C program that uses double-precision floating-point arithmetic to convert decimal strings to binary floating-point numbers. The program converts some numbers incorrectly, despite using an algorithm that’s mathematically correct; its limited precision calculations are to blame. I dubbed the program “quick and dirty” because it’s simple, and overall converts reasonably accurately.

For this article, I took a similar approach to the conversion in the opposite direction — from binary floating-point to decimal string. I wrote a small C program that combines two mathematically correct algorithms: the classic “repeated division by ten” algorithm to convert integer values, and the classic “repeated multiplication by ten” algorithm to convert fractional values. The program uses double-precision floating-point arithmetic, so like its quick and dirty decimal to floating-point counterpart, its conversions are not always correct — though reasonably accurate. I’ll present the program and analyze some example conversions, both correct and incorrect.

Continue reading …

Double Rounding Errors in Floating-Point Conversions

Double rounding is when a number is rounded twice, first from n0 digits to n1 digits, and then from n1 digits to n2 digits. Double rounding is often harmless, giving the same result as rounding once, directly from n0 digits to n2 digits. However, sometimes a doubly rounded result will be incorrect, in which case we say that a double rounding error has occurred.

For example, consider the 6-digit decimal number 7.23496. Rounded directly to 3 digits — using round-to-nearest, round half to even rounding — it’s 7.23; rounded first to 5 digits (7.2350) and then to 3 digits it’s 7.24. The value 7.24 is incorrect, reflecting a double rounding error.

In a computer, double rounding occurs in binary floating-point arithmetic; the typical example is a calculated result that’s rounded to fit into an x87 FPU extended precision register and then rounded again to fit into a double-precision variable. But I’ve discovered another context in which double rounding occurs: conversion from a decimal floating-point literal to a single-precision floating-point variable. The double rounding is from full-precision binary to double-precision, and then from double-precision to single-precision.

In this article, I’ll show example conversions in C that are tainted by double rounding errors, and how attaching the ‘f’ suffix to floating-point literals prevents them — in gcc C at least, but not in Visual C++!

Continue reading …