About the Decimal to Floating-Point Converter
This is a decimal to binary floating-point converter. It will convert a decimal number to its nearest single-precision and double-precision IEEE 754 binary floating-point number, using round-half-to-even rounding (the default IEEE rounding mode). It is implemented with arbitrary-precision arithmetic, so its conversions are correctly rounded. It will convert both normal and subnormal numbers, and will convert numbers that overflow (to infinity) or underflow (to zero).
The resulting floating-point number can be displayed in ten forms: in decimal, in binary, in normalized decimal scientific notation, in normalized binary scientific notation, as a normalized decimal times a power of two, as a decimal integer times a power of two, as a decimal integer times a power of ten, as a hexadecimal floating-point constant, in raw binary, and in raw hexadecimal. Each form represents the exact value of the floating-point number.
Why Use This Converter?
This converter will show you why numbers in your computer programs, like 0.1, do not behave as you’d expect.
Inside the computer, most numbers with a decimal point can only be approximated; another number, just a tiny bit away from the one you want, must stand in for it. For example, in single-precision floating-point, 0.1 becomes 0.100000001490116119384765625. If your program is printing 0.1, it is lying to you; if it is printing 0.100000001, it’s still lying, but at least it’s telling you you really don’t have 0.1.
How to Use This Converter
Input
- Enter a positive or negative number, either in standard (e.g., 134.45) or exponent (e.g., 1.3445e2) form. Indicate fractional values with a decimal point (‘.’), and do not use commas. Essentially, you can enter what a computer program accepts as a floating-point literal, except without any suffix (like ‘f’).
- Check the boxes for the IEEE precision you want; choose Double, Single, or both. (Double is the default.) Double means a 53-bit significand (less if subnormal) with an 11-bit exponent; Single means a 24-bit significand (less if subnormal) with an 8-bit exponent.
- Check the boxes for any output format you want; choose one or all ten. (Decimal is the default.)
- Click ‘Convert’ to convert.
- Click ‘Clear’ to reset the form and start from scratch.
If you want to convert another number, just type over the original number and click ‘Convert’ — there is no need to click ‘Clear’ first.
Output
There are ten output forms to choose from:
- Decimal: Display the floating-point number in decimal. (Expand output box, if necessary, to see all digits.)
- Binary: Display the floating-point number in binary. (Expand output box, if necessary, to see all digits.)
- Normalized decimal scientific notation: Display the floating-point number in decimal, but compactly, using normalized scientific notation. (Expand output box, if necessary, to see all digits.)
- Normalized binary scientific notation: Display the floating-point number in binary, but compactly, using normalized binary scientific notation.
- Note: subnormal numbers are shown normalized, with their actual exponent.
- Normalized decimal times a power of two: Display the floating-point number in a hybrid normalized scientific notation, as a normalized decimal number times a power of two.
- Decimal integer times a power of two: Display the floating-point number as a decimal integer times a power of two. (The binary representation of the decimal integer is the bit pattern of the floating-point representation, less trailing zeros.) This form is most interesting for negative exponents, since it represents the floating-point number as a dyadic fraction.
- Decimal integer times a power of ten: Display the floating-point number as a decimal integer times a power of ten. This form is most interesting for negative exponents, since it represents the floating-point number as a fraction. (Expand output box, if necessary, to see all digits.)
- Hexadecimal floating-point constant: Display the floating-point number as a hexadecimal floating-point constant.
- Note: There are many ways to format hexadecimal floating-point constants, as you would see if, for example, you compared the output of Java, Visual C++, gcc C, and Python programs. The differences across various languages are superficial though — trailing zeros may or may not be shown, positive exponents may or may not have a plus sign, etc. This converter formats the constants without trailing zeros and without plus signs.
- Note: Like many programming languages, this converter shows subnormal numbers unnormalized, with their exponents set to the minimum normal exponent.
- Note: The last hexadecimal digit in a hexadecimal floating-point constant may have trailing binary 0s within; this doesn’t necessarily imply those bits exist in the selected IEEE format.
- Raw binary: Display the floating-point number in its raw IEEE format (sign bit followed by the exponent field followed by the significand field).
- Raw hexadecimal: Display the floating-point number in its raw IEEE format, equivalent to the raw binary format but expressed compactly in hexadecimal.
(See here for more details on these output forms.)
There are two output flags:
- Inexact: If checked, this shows that the conversion was inexact; that is, it had to be rounded to an approximation of the input number. (The conversion is inexact when the decimal output does not match the decimal input, but this is a quicker way to tell.)
- Note: This converter flags overflow to infinity and underflow to zero as inexact.
- Subnormal: If checked, this shows that the number was too small, and converted with less than full precision (the actual precision is shown in parentheses).
Implementation
I wrote this converter from scratch — it does not rely on native conversion functions like strtod() or strtof() or printf(). It is based on the big integer based algorithm I describe in my article “Correct Decimal To Floating-Point Using Big Integers”. I’ve implemented it using BCMath.
Limits
For practical reasons, I’ve set an arbitrary (somewhat) limit on the length of the decimal input; you’ll get an error message if you hit it. This will filter inputs that would otherwise overflow to infinity or underflow to zero, but it will also prevent you from entering some “hard” halfway rounding cases. (For the record though, this converter accepts all the hard examples I’ve discussed on my site.) For all inputs that are accepted however, the output is correct (notwithstanding any bugs escaping my extensive testing).