Double rounding is when a number is rounded twice, first from n_{0} digits to n_{1} digits, and then from n_{1} digits to n_{2} digits. Double rounding is often harmless, giving the same result as rounding once, directly from n_{0} digits to n_{2} digits. However, sometimes a doubly rounded result will be incorrect, in which case we say that a ** double rounding error** has occurred.

For example, consider the 6-digit decimal number 7.23496. Rounded directly to 3 digits — using round-to-nearest, round half to even rounding — it’s 7.23; rounded first to 5 digits (7.2350) and then to 3 digits it’s 7.24. The value 7.24 is incorrect, reflecting a double rounding error.

In a computer, double rounding occurs in binary floating-point arithmetic; the typical example is a calculated result that’s rounded to fit into an x87 FPU extended precision register and then rounded again to fit into a double-precision variable. But I’ve discovered another context in which double rounding occurs: conversion from a decimal floating-point literal to a single-precision floating-point variable. The double rounding is from full-precision binary to double-precision, and then from double-precision to single-precision.

In this article, I’ll show example conversions in C that are tainted by double rounding errors, and how attaching the ‘f’ suffix to floating-point literals prevents them — in gcc C at least, but not in Visual C++!

Continue reading “Double Rounding Errors in Floating-Point Conversions”