Displaying IEEE Doubles in Binary Scientific Notation

An IEEE double-precision floating-point number, or double, is a 64-bit encoding of a rational number. Internally, the 64 bits are broken into three fields: a 1-bit sign field, which represents positive or negative; an 11-bit exponent field, which represents a power of two; and a 52-bit fraction field, which represents the significant bits of the number. These three fields — together with an implicit leading 1 bit — represent a number in binary scientific notation, with 1 to 53 bits of precision.

For example, consider the decimal number 33.75. It converts to a double with a sign field of 0, an exponent field of 10000000100, and a fraction field of 0000111000000000000000000000000000000000000000000000. The 0 in the sign field means it’s a positive number (1 would mean it’s negative). The value of 10000000100 in the exponent field, which equals 1028 in decimal, means the exponent of the power of two is 5 (the exponent field value is offset, or biased, by 1023). The fraction field, when prefixed with an implicit leading 1, represents the binary fraction 1.0000111. Written in normalized binary scientific notation — following the convention that the fraction is written in binary and the power of two is written in decimal — 33.75 equals 1.0000111 x 25.

In this article, I’ll show you the C function I wrote to display a double in normalized binary scientific notation. This function is useful, for example, when verifying that decimal to floating-point conversions are correctly rounded.

Subnormal Numbers

In double-precision floating-point, most numbers are represented in normalized form, with an implicit 1 bit giving 53 bits of precision. However, very small numbers — the so-called subnormal numbers — are represented in unnormalized form, with no implicit leading 1 bit and zero to 51 leading zeros of fraction field. These numbers are encoded with an exponent field of zero, with their true exponent equal to -1022 minus the location of the first 1 bit in their fraction field. This means that subnormal numbers are scaled by powers of two in the range 2-1074 through 2-1023, with accompanying precision of one to 52 bits.

Although subnormal numbers are encoded as unnormalized, they can still be written as normalized. For example, the decimal number 1e-310 converts to a subnormal double with a sign field of 0, an exponent field of 00000000000, and a fraction field of 0000000100100110100010001011011100001110011000101011. This can be printed as 1.00100110100010001011011100001110011000101011 x 2-1030 — which is what my C function does.

The Code

I wrote a function called print_double_binsci() that prints double-precision floating-point numbers in normalized binary scientific notation. It is based on a call to my function parse_double(), which isolates the three fields of a double.

I declared and defined this function in files I named binsci.h and binsci.c, respectively.

binsci.h

/***********************************************************/
/* binsci.h: Function to print an IEEE double-precision    */
/*           floating-point number in normalized binary    */
/*           scientific notation                           */
/*                                                         */
/* Rick Regan (https://www.exploringbinary.com)            */
/*                                                         */
/* Version 2 (support subnormals)                          */
/***********************************************************/
void print_double_binsci(double d);

binsci.c

/***********************************************************/
/* binsci.c: Function to print an IEEE double-precision    */
/*           floating-point number in normalized binary    */
/*           scientific notation                           */
/*                                                         */
/* Rick Regan (https://www.exploringbinary.com)            */
/*                                                         */
/* Version 2 (support subnormals)                          */
/***********************************************************/
#include <stdio.h>
#include "rawdouble.h"
#include "binsci.h"

void print_double_binsci(double d)
{
 unsigned char sign_field;
 unsigned short exponent_field;
 short exponent;
 unsigned long long fraction_field, significand;
 int i, start = 0, end = 52;

 //Isolate the three fields of the double
 parse_double(d,&sign_field,&exponent_field,&fraction_field);

 //Print a minus sign, if necessary
 if (sign_field == 1)
   printf("-");

 if (exponent_field == 0 && fraction_field == 0)
   printf("0\n"); //Number is zero
 else
  {
   if (exponent_field == 0 && fraction_field != 0)
     {//Subnormal number
      significand = fraction_field; //No implicit 1 bit
      exponent = -1022; //Exponents decrease from here
      while (((significand >> (52-start)) & 1) == 0)
        {
         exponent--;
         start++;
        }
     }
   else
     {//Normalized number (ignoring INFs, NANs)
      significand = fraction_field | (1ULL << 52); //Implicit 1 bit
      exponent = exponent_field - 1023; //Subtract bias
     }

   //Suppress trailing 0s
   while (((significand >> (52-end)) & 1) == 0)
     end--;

   //Print the significant bits
   for (i=start; i<=end; i++)
    {
     if (i == start+1)
       printf(".");
     if (((significand >> (52-i)) & 1) == 1)
       printf("1");
     else
       printf("0");
    }

   if (start == end) //Special case: 1 bit (a power of two)
     printf(".0");

   //Print the exponent
   printf(" x 2^%d\n",exponent);
  }
}

Notes

  • Numbers that are not raised to a power are printed with the suffix “x 2^0”.
  • Not-a-number (NaN) and infinity values are not handled.

Examples of Usage

I wrote a program, called binsciTest.c, that shows some example calls to print_double_binsci():

/***********************************************************/
/* binsciTest.c: Program to test printing of IEEE double   */
/*               precision floating-point numbers in       */
/*               binary scientific notation                */
/*                                                         */
/* Rick Regan (https://www.exploringbinary.com)            */
/*                                                         */
/* Version 2 (print subnormals)                            */
/***********************************************************/
#include <stdio.h>
#include "binsci.h"

int main (void)
{
 printf("33.75 =\n");
 print_double_binsci(33.75);
 printf("\n");

 printf("0.1 =\n");
 print_double_binsci(0.1);
 printf("\n");

 printf("-0.6 =\n");
 print_double_binsci(-0.6);
 printf("\n");

 printf("3.518437208883201171875e13 =\n");
 print_double_binsci(3.518437208883201171875e13);
 printf("\n");

 printf("9214843084008499.0 =\n");
 print_double_binsci(9214843084008499.0);
 printf("\n");

 printf("30078505129381147446200.0 =\n");
 print_double_binsci(30078505129381147446200.0);
 printf("\n");

 printf("1777820000000000000001.0 =\n");
 print_double_binsci(1777820000000000000001.0);
 printf("\n");
 
 printf("0.3932922657273 =\n");
 print_double_binsci(0.3932922657273);
 printf("\n");

 printf("4.9406564584124654e-324 =\n");
 print_double_binsci(4.9406564584124654e-324);
 printf("\n");

 printf("1.2e-321 =\n");
 print_double_binsci(1.2e-321);
 printf("\n");

 printf("2.2250738585072011e-308 =\n");
 print_double_binsci(2.2250738585072011e-308);
 
 return (0);
}

(Some of these examples were taken from my articles Incorrectly Rounded Conversions in Visual C++ and Incorrectly Rounded Conversions in GCC and GLIBC.)

I compiled and ran it on both Windows and Linux:

  • On Windows, I built a project in Visual C++ with files binsci.c, binsci.h, binsciTest.c, rawdouble.c, and rawdouble.h, and compiled and ran it in there.
  • On Linux, I compiled with “gcc binsciTest.c binsci.c rawdouble.c -o binsciTest” and then ran it with “./binsciTest”.

Output

This is the Windows output (the Linux output is a little different; Visual C++ and gcc differ in some of their decimal to floating-point conversions):

33.75 =
1.0000111 x 2^5

0.1 =
1.100110011001100110011001100110011001100110011001101 x 2^-4

-0.6 =
-1.0011001100110011001100110011001100110011001100110011 x 2^-1

3.518437208883201171875e13 =
1.0000000000000000000000000000000000000000000000000001 x 2^45

9214843084008499.0 =
1.0000010111100110110011101100010101110111011000011001 x 2^53

30078505129381147446200.0 =
1.10010111101000111100011100100111000110110000001 x 2^74

1777820000000000000001.0 =
1.100000011000000011010101101110101101001011100011111 x 2^70

0.3932922657273 =
1.1001001010111011001101010010110001000110001000111001 x 2^-2

4.9406564584124654e-324 =
1.0 x 2^-1074

1.2e-321 =
1.1110011 x 2^-1067

2.2250738585072011e-308 =
1.111111111111111111111111111111111111111111111111111 x 2^-1023

Except for 33.75, which is exact, all the other examples are 53 significant bit approximations to the decimal numbers they stand in for (type ‘0.1’ into my decimal to binary converter and see for yourself).

Dingbat

2 comments

Comments are closed.

Copyright © 2008-2024 Exploring Binary

Privacy policy

Powered by WordPress

css.php