Every integer has an equivalent representation in decimal and binary. Except for 0 and 1, the binary representation of an integer has more digits than its decimal counterpart. To find the number of binary digits (bits) corresponding to any given decimal integer, you could convert the decimal number to binary and count the bits. For example, the two-digit decimal integer 29 converts to the five-digit binary integer 11101. But there’s a way to compute the number of bits directly, without the conversion.
Sometimes you want to know, not how many bits are required for a specific integer, but how many are required for a d-digit integer — a range of integers. A range of integers has a range of bit counts. For example, four-digit decimal integers require between 10 and 14 bits. For any d-digit range, you might want to know its minimum, maximum, or average number of bits. Those values can be computed directly as well.
In this article, I will show you those calculations. I will be discussing pure binary and decimal, not computer encodings like two’s complement, fixed-point, floating-point, or BCD. All of the discussion assumes positive integers, although it applies to negative integers if you temporarily ignore their minus signs. 0 is a special case not covered by the formulas, but obviously it has only 1 bit.
(I use the terms decimal integer and binary integer when I really mean “an integer expressed in decimal numerals” and “an integer expressed in binary numerals”. An integer is an integer, independent of its base.)
A positive integer n has b bits when 2b-1 ≤ n ≤ 2b – 1. For example:
- 29 has 5 bits because 16 ≤ 29 ≤ 31, or 24 ≤ 29 ≤ 25 – 1
- 123 has 7 bits because 64 ≤ 123 ≤ 127, or 26 ≤ 123 ≤ 27 – 1
- 967 has 10 bits because 512 ≤ 967 ≤ 1023, or 29 ≤ 967 ≤ 210 – 1
For larger numbers, you could consult a table of powers of two to find the consecutive powers that contain your number.
To see why this works, think of the binary representations of the integers 24 through 25 – 1, for example. They are 10000 through 11111, all possible 5-bit values.
The above method can be stated another way: the number of bits is the exponent of the smallest power of two greater than your number. You can state that mathematically as:
bspec = ⌊log2(n)⌋ + 1
That formula has three parts:
- log2(n) means the logarithm in base 2 of n, which is the exponent to which 2 is raised to get n. For example, log2(123) ≈ 6.9425145. The presence of a fractional part means n is between powers of two.
- ⌊x⌋ is the floor of x, which is the integer part of x. For example, ⌊6.9425145⌋ = 6. You can think of ⌊log2(n)⌋ as the exponent of the highest power of two in the binary representation of n.
- + 1 takes the exponent to the next higher power of two. You can think of this step as accounting for the 20th place of your binary number, which then gives you its total number of bits. For our example, that’s 6 + 1 = 7.
You might be tempted to use the ceiling function — ⌈x⌉, which is the smallest integer greater than or equal to x — to compute the number of bits as such:
bspec = ⌈log2(n)⌉
However, this fails when n is a power of two.
A positive integer n has d decimal digits when 10d-1 ≤ n ≤ 10d – 1. How many bits do numbers in this range require? It varies. For example, consider four-digit decimal integers. Using the above formula you’ll see that the smallest four-digit number, 1000, requires 10 bits, and the largest four-digit number, 9999, requires 14 bits. The number of bits varies between those extremes. For example, 1344 requires 11 bits, 2527 requires 12 bits, and 5019 requires 13 bits. Why does this occur? Because that single power of ten range spans all or part of five consecutive power-of-two ranges. Here’s how the examples look from that viewpoint:
- 1000 has 10 bits because 512 ≤ 1000 ≤ 1023, or 29 ≤ 1000 ≤ 210 – 1
- 1344 has 11 bits because 1024 ≤ 1344 ≤ 2047, or 210 ≤ 1344 ≤ 211 – 1
- 2527 has 12 bits because 2048 ≤ 2527 ≤ 4095, or 211 ≤ 2527 ≤ 212 – 1
- 5019 has 13 bits because 4096 ≤ 5019 ≤ 8191, or 212 ≤ 5019 ≤ 213 – 1
- 9999 has 14 bits because 8192 ≤ 9999 ≤ 16383, or 213 ≤ 9999 ≤ 214 – 1
This diagram shows the ranges:
Minimum Number of Bits in a d-Digit Integer
The minimum number of bits required for a d-digit integer is computed simply by using the specific number formula on the minimum d-digit value:
bmin = ⌊log2(10d-1)⌋ + 1
We can make this a more efficient computation by using the logarithmic identity loga(xy) = y·loga(x):
bmin = ⌊log2(10d-1)⌋ + 1 = ⌊(d-1)·log2(10)⌋ + 1
In this form, we take the logarithm of a small constant instead of a large variable.
Since we are dealing with powers of ten we can use the ceiling function here (as long as d > 1); there is no positive power of ten that is also a power of two. Here’s the equivalent formula:
bmin = ⌈log2(10d-1)⌉ = ⌈(d-1)·log2(10)⌉
Maximum Number of Bits in a d-Digit Integer
The maximum number of bits required for a d-digit integer is computed simply by using the specific number formula on the maximum d-digit value:
bmax = ⌊log2(10d – 1)⌋ + 1
We can’t make the same simplification as for the minimum value, at least not on the face of it. But notice that ⌊log2(10d – 1)⌋ = ⌊log2(10d)⌋, since a power of ten and that power of ten minus one are both in the same power of two range. (A power of ten minus one cannot be a power of two — it’s odd). This allows us to use this more computationally efficient formula to the same effect:
bmax = ⌊log2(10d)⌋ + 1 = ⌊d·log2(10)⌋ + 1
And again, for d > 1, we can use the ceiling function instead:
bmax = ⌈d·log2(10)⌉
Average Number of Bits in a d-Digit Integer
The average number of bits required for a d-digit integer is the total number of bits required to represent all d-digit integers divided by the number of d-digit integers. For our example, the average is
bavg = (24·10 + 1024·11 + 2048·12 + 4096·13 + 1808·14)/9000 ≈ 12.74
Computing The Formulas
Of the above formulas, these two — in these forms — are the most commonly used:
- bspec = ⌊log2(n)⌋ + 1
- bmax = ⌈d·log2(10)⌉
How do you compute them?
First we have to figure out how to take the base-2 logarithm of a number. Many programming environments do not have a base-2 logarithm function. You can deal with that by doing a change of base:
log2(n) = loga(n)/loga(2)
‘a’ could represent any base; for example, base e, the natural logarithm. For simplicity, I’ll drop the ‘a’ and refer to the logarithm function generically as log(n):
log2(n) = log(n)/log(2)
Let’s restate the formula in the form you would most likely enter it in a computer:
bspec = floor(log(n)/log(2)) + 1
For example, the decimal integer 1,997,443,410 has floor(log(1997443410)/log(2)) + 1 = 31 bits.
You have to be careful when computing logarithms; floating-point inaccuracies could cause an incorrect result, particularly at power-of-two boundaries.
(If you are using a language like C you could avoid change of base — and improve performance and accuracy — by computing the floor of the base-2 logarithm using “Bit Twiddling Hacks”.)
For very large numbers, you’ll need arbitrary-precision arithmetic. I use PARI/GP, for example.
Let’s restate the formula for computer evaluation:
bmax = ceil(d*(log(10)/log(2)))
(The parentheses around the division are unnecessary; I just like thinking of log(10)/log(2) as a separate constant.)
log(10)/log(2), to 17 digits, is 3.3219280948873623. People commonly round this to 3.32, leading to this simple formula:
bmax = ceil(d*3.32)
For example, a 16-digit decimal integer requires up to ceil(16*3.32) = 54 bits.
You have to be careful with this though; for example, ceil(25*3.32) = 83 (since 25*3.32 = 83), but 25-digit integers require 84 bits. You need to specify the constant with more precision to get the correct result. (Here’s one place where this error is made.)