The Safe Range For PHP’s base_convert()

PHP’s base_convert() is a useful function that converts integers between any pair of bases, 2 through 36. However, you might hesitate to use it after reading this vague and mysterious warning in its documentation:

base_convert() may lose precision on large numbers due to properties related to the internal “double” or “float” type used.

The truth is that it works perfectly for integers up to a certain maximum — you just have to know what that is. I will show you this maximum value in each of the 35 bases, and how to check if the values you are using are within this limit.

Range of Integers Exactly Representable in Floating-Point

Many numbers can only be approximated in IEEE binary floating-point, while others can be represented exactly. Integers starting from 0 and going sequentially to some maximum are among those numbers that are exactly representable. In single-precision floating-point, the maximum is 2²⁴; in double-precision floating-point, the maximum is 2⁵³. (Technically the maximums are the largest 24 significant bit integer, 2²⁴ – 1, and the largest 53 significant bit integer, 2⁵³ – 1. But 2²⁴ and 2⁵³ are exactly representable by virtue of being powers of two.)

For the remainder of this article, I will assume PHP uses double-precision floating-point. (I’ve only ever seen double-precision implementations — are there implementations that uses single-precision?)

base_convert()

base_convert() takes three arguments: the input number (as a string), the input base, and the output base. The output number is returned as a string. The digits for bases 2 through 36 are drawn from the digits 0-9 and letters a-z (mixed case is accepted, but lower case is returned). You can pass any integer, but the output is guaranteed to be accurate only for integers less than or equal to 2⁵³; I will call these integers safe for base_convert().

Implementation

We have to look at base_convert()’s implementation to make sure it never produces intermediate calculations that exceed the safe range of integers.

base_convert() proceeds in two steps: first it converts the input string to numeric binary, using a C function called basetozval(), and then it converts that numeric binary value to the output string, using a C function called zvaltobase().

basetozval() is straightforward. It executes this line of code for every digit of the input string:

fnum = fnum * base + c;

fnum is the accumulating numeric binary value of the input, and c is the numeric value of the current digit. Although fnum is declared as a double, its intermediate values will all be integers, and its final integer sum cannot exceed 2⁵³ if the input itself does not.

zvaltobase() is also simple, but it generates intermediate floating-point values, so deserves a closer look. Here is the relevant loop:

do {
    *--ptr = digits[(int) fmod(fvalue, base)];
    fvalue /= base;
} while (ptr > buf && fabs(fvalue) >= 1);

This generates output digits until the numeric input value is consumed. For each iteration, the value of each digit is the remainder of fvalue/base, and the value of fvalue is reset to fvalue/base.

I would have written this code in a slightly different, but equivalent way:

do {
    *--ptr = digits[(int) fmod(fvalue, base)];
    fvalue = floor(fvalue/base);
} while (ptr > buf && fabs(fvalue) > 0);

In this form it is obvious that it works, assuming we start with a value of 2⁵³ or less. It essentially performs integer arithmetic: fvalue is always an integer, as is the fmod() result.

Unsafe Conversions

There’s only a very small chance that an integer greater than 2⁵³ will be converted correctly; three conditions have to be met:

The integer has to be exactly representable.
base_convert()’s “quick and dirty like” conversion routines have to convert the integer to the right double-precision floating-point number and print it correctly.
The converted to number has to be less than or equal to 64 digits (that’s the size of the output buffer used).

Furthermore, integers that are not exactly representable may not even be correctly rounded (in the IEEE 754 sense); this is a consequence of doing the conversion in limited precision floating-point.

Here are examples of unsafe conversions, with incorrect results:

Example 1

<?php echo base_convert('3674675646464747008',10,2) . '<br>'; ?>

3674675646464747008 is exactly representable, as the double-precision number

11001011111111000100100000011111111100010011110110011000000000

base_convert() returns

11001011111111000100100000011111111100010011110110100000000000

which is one ULP too high.

Example 2

<?php echo base_convert('123456789012345669025792',10,2) . '<br>'; ?>

123456789012345669025792 is exactly representable, as the double-precision number

11010001001001001101100011111000100001010000001101100000000000000000000000000

base_convert() returns

1001101100011111000100001010000001101100000000000000000000000000

which is way, way off. (The 64 digit limit was hit, cutting off the most significant digits.)

Example 3

<?php echo base_convert('1234567890123456789',10,2) . '<br>'; ?>

1234567890123456789 is not exactly representable; the closest double precision number is

1000100100010000100001111010001111101111010011000000100000000

base_convert() returns

1000100100010000100001111010001111101111010011000001000000000

which is one ULP too high (so is not correctly rounded).

Maximum Integer Representations By Base

This table shows the maximum integer safely convertable by base_convert() — 2⁵³ — as represented in each of the 35 bases:

**Maximum Safe Integer For base_convert(), By Base**
Base	Max Value (2⁵³)
2	100000000000000000000000000000000000000000000000000000
3	1121202011211211122211100012101112
4	200000000000000000000000000
5	33421042423033203202432
6	224404414114114022452
7	5350140446150306054
8	400000000000000000
9	47664754584305345
10	9007199254740992
11	2179a75830112628
12	702273685b77a28
13	2397b7325802696
14	b4c34aaccadc64
15	4964cdca1dc7b2
16	20000000000000
17	f7ded8c9e1f8f
18	7e2c925c889fe
19	416210bi7ca4a
20	23jc3e8722c9c
21	14f01e5ec7fdb
22	f92hf53a8cc8
23	9a9i7gmkbfj6
24	5m1bec25hbd8
25	3jb4ed3h3aeh
26	2bko8jf78bb6
27	1gk4mmhm95ae
28	12bd1h7b56h4
29	lbpf6d7shib
30	f7iboftrod2
31	aukoap6ali8
32	80000000000
33	5t2d3e17rj8
34	4cbreicjccw
35	399uaj5f5vw
36	2gosa7pa2gw

A Direct Test

A table of base-specific maximum values is interesting, but it does not lead to a simple, base-independent, programmatic test. Here is such a test:

<?php
function base_convert_is_safe($integer,$base)
{
  if (bindec(base_convert($integer,$base,2)) <= 9007199254740991) {
     $isSafe = TRUE;
  } else {
     $isSafe = FALSE;
  }
  return $isSafe;
}
?>

This test uses base_convert() to convert the input string to a binary string, and then uses another PHP conversion function — bindec(), which by the way is safe up to 2⁵³ as well — to convert that binary string to a numeric binary value. This value is then compared to the numeric binary value of (decimal) 9007199254740991, which is 2⁵³ – 1. (Testing for 2⁵³ is problematic; with the default IEEE rounding mode of round-to-nearest/ties-to-even, 2⁵³ + 1 is indistinguishable from 2⁵³. We thus have to cut the top number from the safe range.)

This test is not very efficient, since it uses a call to bindec() and a call to base_convert() before the actual base_convert() conversion you want to do.

Here are examples of how to use the tester function:

Example 4

<?php
$integer_from = '2gosa7pa2gv';
$base_from = 36;
if (base_convert_is_safe($integer_from,$base_from)) {
    $integer_to = base_convert($integer_from,$base_from,2);
    echo $integer_to . '<br>';
} else {
    echo 'NOT safe to use base_convert()<br>';
}
?>

The input value equals 2⁵³ – 1, so this prints

11111111111111111111111111111111111111111111111111111

Example 5

<?php
$integer_from = '2gosa7pa2gx';
$base_from = 36;
if (base_convert_is_safe($integer_from,$base_from)) {
    $integer_to = base_convert($integer_from,$base_from,2);
    echo $integer_to . '<br>';
} else {
    echo 'NOT safe to use base_convert()<br>';
}
?>

The input value equals 2⁵³ + 1, so this prints

NOT safe to use base_convert()

Maximum Digits By Base

There is another way to test for safe conversions that is a little less simple but much more efficient — but you have to sacrifice the top end of the safe range of integers to use it. For example, in decimal, 2⁵³ is the 16-digit integer 9007199254740992. If we limit the safe range of decimal integers to 15 digits — that is, integers up to and including 999999999999999 — we can simply take the length of the input and make sure it is less than or equal to 15. We don’t have to consider the value of the number per se.

This works similarly for the other bases; for example:

2⁵³ in base 7 is the 19-digit integer 5350140446150306054; if we cap it at 18 digits, we get 666666666666666666
2⁵³ in base 16 is the 14-digit integer 20000000000000; if we cap it at 13 digits, we get fffffffffffff
2⁵³ in base 36 is the 11-digit integer 2gosa7pa2gw; if we cap it at 10 digits, we get zzzzzzzzzz

(Taking the cutoff as one digit less than the representation for 2⁵³ is the same as taking the integer part of the logarithm to the base “from base” of 2⁵³: floor(log_fromBase(2⁵³) .)

Here is a table showing the maximum cap of digits for each base, and the corresponding values:

**Maximum Safe Number of Digits (and Corresponding Maximum Values) For base_convert(), By Base**
Base	Max Digits	Max Value
2	53	11111111111111111111111111111111111111111111111111111
3	33	222222222222222222222222222222222
4	26	33333333333333333333333333
5	22	4444444444444444444444
6	20	55555555555555555555
7	18	666666666666666666
8	17	77777777777777777
9	16	8888888888888888
10	15	999999999999999
11	15	aaaaaaaaaaaaaaa
12	14	bbbbbbbbbbbbbb
13	14	cccccccccccccc
14	13	ddddddddddddd
15	13	eeeeeeeeeeeee
16	13	fffffffffffff
17	12	gggggggggggg
18	12	hhhhhhhhhhhh
19	12	iiiiiiiiiiii
20	12	jjjjjjjjjjjj
21	12	kkkkkkkkkkkk
22	11	lllllllllll
23	11	mmmmmmmmmmm
24	11	nnnnnnnnnnn
25	11	ooooooooooo
26	11	ppppppppppp
27	11	qqqqqqqqqqq
28	11	rrrrrrrrrrr
29	10	ssssssssss
30	10	tttttttttt
31	10	uuuuuuuuuu
32	10	vvvvvvvvvv
33	10	wwwwwwwwww
34	10	xxxxxxxxxx
35	10	yyyyyyyyyy
36	10	zzzzzzzzzz

You can create an array of maximum digit values, indexed by base, and test the length of your number before calling base_convert().

The (Often Big) Tradeoff

Limiting the maximum number of digits cuts the top range of integers you can convert safely with base_convert() — sometimes significantly. It depends on the relative value of the top (most significant) digit you are cutting. This table shows the percentage of safe integers allowed by the maximum digit value — what I call coverage — by base:

**Coverage of Maximum Digit Safe Integers For base_convert(), By Base**
Base	Max Digits	Coverage
2	53	≈100%
3	33	≈61.7%
4	26	≈50%
5	22	≈26.5%
6	20	≈40.6%
7	18	≈18.1%
8	17	≈25%
9	16	≈20.6%
10	15	≈11.1%
11	15	≈46.4%
12	14	≈14.3%
13	14	≈43.7%
14	13	≈8.8%
15	13	≈21.6%
16	13	≈50%
17	12	≈6.5%
18	12	≈12.8%
19	12	≈24.6%
20	12	≈45.5%
21	12	≈81.7%
22	11	≈6.5%
23	11	≈10.6%
24	11	≈16.9%
25	11	≈26.5%
26	11	≈40.7%
27	11	≈61.7%
28	11	≈92.1%
29	10	≈4.7%
30	10	≈6.6%
31	10	≈9.1%
32	10	≈12.5%
33	10	≈17%
34	10	≈22.9%
35	10	≈30.6%
36	10	≈40.6%

You can improve your coverage significantly with a hybrid test: you can allow one digit more than the maximum length as long as the top digit occurs lexicographically before the value of the top digit in that base’s representation of 2⁵³. For example, In base 29, the maximum digit length number ssssssssss (10 digits) gives less than 5% coverage. 2⁵³ in base 29 is lbpf6d7shib (the top digit is the letter ‘l’, not the digit ‘1’ — which is why my arbitrary-precision base converter offers a character set that excludes that letter). Anything kssssssssss (which is 11 digits) or less is safe, giving over 98% coverage.

One More Option: Just Eyeball It

You can skip the programmatic checks if you know that the range of numbers you will be converting is safe — by looking at the tables beforehand.

A Better Warning

Perhaps this is a better warning for base_convert()’s documentation:

base_convert() can convert integers accurately only up to a certain size. This is due to the limits of the internal floating-point type used. If single-precision floating-point is used, the maximum integer is 2²⁴; if double-precision floating-point is used, the maximum integer is 2⁵³.

I would also add a note that said how to tell whether single or double precision is being used (this can be done by trial and error — but is there a better way?) Better yet, if there are no single-precision implementations, we could eliminate any talk of different precisions at all.

4 comments

Jovica says:

May 16, 2016 at 3:09 am

I made this website
binarydecimal.com
By using base_convert
example
binarydecimal.com/ff00ff-hexadecimal
Rick Regan says:

May 16, 2016 at 8:46 am

@Jovica,

Your converter seems not to honor base_convert()’s safe range (put a large enough number in and it returns garbage).
Jack Chen says:

February 22, 2018 at 10:59 pm

How to do Base36 to hex (base16) in php by using base_convert()? I always got a lot of “0” in string.
Rick Regan says:

February 23, 2018 at 9:23 am

@Jack Chen,

Please post an example of what you converted/are trying to convert.

Comments are closed.

Range of Integers Exactly Representable in Floating-Point

base_convert()

Implementation

Unsafe Conversions

Example 1

Example 2

Example 3

Maximum Integer Representations By Base

A Direct Test

Example 4

Example 5

Maximum Digits By Base

The (Often Big) Tradeoff

One More Option: Just Eyeball It

A Better Warning

Related

4 comments