The Safe Range For PHP’s base_convert()

http://www.exploringbinary.com/the-safe-range-for-phps-base-convert/


PHP’s base_convert() is a useful function that converts integers between any pair of bases, 2 through 36. However, you might hesitate to use it after reading this vague and mysterious warning in its documentation:

base_convert() may lose precision on large numbers due to properties related to the internal “double” or “float” type used.

The truth is that it works perfectly for integers up to a certain maximum — you just have to know what that is. I will show you this maximum value in each of the 35 bases, and how to check if the values you are using are within this limit.

Range of Integers Exactly Representable in Floating-Point

Many numbers can only be approximated in IEEE binary floating-point, while others can be represented exactly. Integers starting from 0 and going sequentially to some maximum are among those numbers that are exactly representable. In single-precision floating-point, the maximum is 224; in double-precision floating-point, the maximum is 253. (Technically the maximums are the largest 24 significant bit integer, 224 – 1, and the largest 53 significant bit integer, 253 – 1. But 224 and 253 are exactly representable by virtue of being powers of two.)

For the remainder of this article, I will assume PHP uses double-precision floating-point. (I’ve only ever seen double-precision implementations — are there implementations that uses single-precision?)

base_convert()

base_convert() takes three arguments: the input number (as a string), the input base, and the output base. The output number is returned as a string. The digits for bases 2 through 36 are drawn from the digits 0-9 and letters a-z (mixed case is accepted, but lower case is returned). You can pass any integer, but the output is guaranteed to be accurate only for integers less than or equal to 253; I will call these integers safe for base_convert().

Implementation

We have to look at base_convert()’s implementation to make sure it never produces intermediate calculations that exceed the safe range of integers.

base_convert() proceeds in two steps: first it converts the input string to numeric binary, using a C function called basetozval(), and then it converts that numeric binary value to the output string, using a C function called zvaltobase().

basetozval() is straightforward. It executes this line of code for every digit of the input string:

fnum = fnum * base + c;

fnum is the accumulating numeric binary value of the input, and c is the numeric value of the current digit. Although fnum is declared as a double, its intermediate values will all be integers, and its final integer sum cannot exceed 253 if the input itself does not.

zvaltobase() is also simple, but it generates intermediate floating-point values, so deserves a closer look. Here is the relevant loop:

do {
    *--ptr = digits[(int) fmod(fvalue, base)];
    fvalue /= base;
} while (ptr > buf && fabs(fvalue) >= 1);

This generates output digits until the numeric input value is consumed. For each iteration, the value of each digit is the remainder of fvalue/base, and the value of fvalue is reset to fvalue/base.

I would have written this code in a slightly different, but equivalent way:

do {
    *--ptr = digits[(int) fmod(fvalue, base)];
    fvalue = floor(fvalue/base);
} while (ptr > buf && fabs(fvalue) > 0);

In this form it is obvious that it works, assuming we start with a value of 253 or less. It essentially performs integer arithmetic: fvalue is always an integer, as is the fmod() result.

Unsafe Conversions

There’s only a very small chance that an integer greater than 253 will be converted correctly; three conditions have to be met:

  • The integer has to be exactly representable.
  • base_convert()’s “quick and dirty like” conversion routines have to convert the integer to the right double-precision floating-point number and print it correctly.
  • The converted to number has to be less than or equal to 64 digits (that’s the size of the output buffer used).

Furthermore, integers that are not exactly representable may not even be correctly rounded (in the IEEE 754 sense); this is a consequence of doing the conversion in limited precision floating-point.

Here are examples of unsafe conversions, with incorrect results:

Example 1

<?php echo base_convert('3674675646464747008',10,2) . '<br>'; ?>

3674675646464747008 is exactly representable, as the double-precision number

11001011111111000100100000011111111100010011110110011000000000

base_convert() returns

11001011111111000100100000011111111100010011110110100000000000

which is one ULP too high.

Example 2

<?php echo base_convert('123456789012345669025792',10,2) . '<br>'; ?>

123456789012345669025792 is exactly representable, as the double-precision number

11010001001001001101100011111000100001010000001101100000000000000000000000000

base_convert() returns

1001101100011111000100001010000001101100000000000000000000000000

which is way, way off. (The 64 digit limit was hit, cutting off the most significant digits.)

Example 3

<?php echo base_convert('1234567890123456789',10,2) . '<br>'; ?>

1234567890123456789 is not exactly representable; the closest double precision number is

1000100100010000100001111010001111101111010011000000100000000

base_convert() returns

1000100100010000100001111010001111101111010011000001000000000

which is one ULP too high (so is not correctly rounded).

Maximum Integer Representations By Base

This table shows the maximum integer safely convertable by base_convert() — 253 — as represented in each of the 35 bases:

Maximum Safe Integer For base_convert(), By Base
Base Max Value (253)
2 100000000000000000000000000000000000000000000000000000
3 1121202011211211122211100012101112
4 200000000000000000000000000
5 33421042423033203202432
6 224404414114114022452
7 5350140446150306054
8 400000000000000000
9 47664754584305345
10 9007199254740992
11 2179a75830112628
12 702273685b77a28
13 2397b7325802696
14 b4c34aaccadc64
15 4964cdca1dc7b2
16 20000000000000
17 f7ded8c9e1f8f
18 7e2c925c889fe
19 416210bi7ca4a
20 23jc3e8722c9c
21 14f01e5ec7fdb
22 f92hf53a8cc8
23 9a9i7gmkbfj6
24 5m1bec25hbd8
25 3jb4ed3h3aeh
26 2bko8jf78bb6
27 1gk4mmhm95ae
28 12bd1h7b56h4
29 lbpf6d7shib
30 f7iboftrod2
31 aukoap6ali8
32 80000000000
33 5t2d3e17rj8
34 4cbreicjccw
35 399uaj5f5vw
36 2gosa7pa2gw

A Direct Test

A table of base-specific maximum values is interesting, but it does not lead to a simple, base-independent, programmatic test. Here is such a test:

<?php
function base_convert_is_safe($integer,$base)
{
  if (bindec(base_convert($integer,$base,2)) <= 9007199254740991) {
     $isSafe = TRUE;
  } else {
     $isSafe = FALSE;
  }
  return $isSafe;
}
?>

This test uses base_convert() to convert the input string to a binary string, and then uses another PHP conversion function — bindec(), which by the way is safe up to 253 as well — to convert that binary string to a numeric binary value. This value is then compared to the numeric binary value of (decimal) 9007199254740991, which is 253 – 1. (Testing for 253 is problematic; with the default IEEE rounding mode of round-to-nearest/ties-to-even, 253 + 1 is indistinguishable from 253. We thus have to cut the top number from the safe range.)

This test is not very efficient, since it uses a call to bindec() and a call to base_convert() before the actual base_convert() conversion you want to do.

Here are examples of how to use the tester function:

Example 4

<?php
$integer_from = '2gosa7pa2gv';
$base_from = 36;
if (base_convert_is_safe($integer_from,$base_from)) {
    $integer_to = base_convert($integer_from,$base_from,2);
    echo $integer_to . '<br>';
} else {
    echo 'NOT safe to use base_convert()<br>';
}
?>

The input value equals 253 – 1, so this prints

11111111111111111111111111111111111111111111111111111

Example 5

<?php
$integer_from = '2gosa7pa2gx';
$base_from = 36;
if (base_convert_is_safe($integer_from,$base_from)) {
    $integer_to = base_convert($integer_from,$base_from,2);
    echo $integer_to . '<br>';
} else {
    echo 'NOT safe to use base_convert()<br>';
}
?>

The input value equals 253 + 1, so this prints

NOT safe to use base_convert()

Maximum Digits By Base

There is another way to test for safe conversions that is a little less simple but much more efficient — but you have to sacrifice the top end of the safe range of integers to use it. For example, in decimal, 253 is the 16-digit integer 9007199254740992. If we limit the safe range of decimal integers to 15 digits — that is, integers up to and including 999999999999999 — we can simply take the length of the input and make sure it is less than or equal to 15. We don’t have to consider the value of the number per se.

This works similarly for the other bases; for example:

  • 253 in base 7 is the 19-digit integer 5350140446150306054; if we cap it at 18 digits, we get 666666666666666666
  • 253 in base 16 is the 14-digit integer 20000000000000; if we cap it at 13 digits, we get fffffffffffff
  • 253 in base 36 is the 11-digit integer 2gosa7pa2gw; if we cap it at 10 digits, we get zzzzzzzzzz

(Taking the cutoff as one digit less than the representation for 253 is the same as taking the integer part of the logarithm to the base “from base” of 253: floor(logfromBase(253) .)

Here is a table showing the maximum cap of digits for each base, and the corresponding values:

Maximum Safe Number of Digits (and Corresponding Maximum Values) For base_convert(), By Base
Base Max Digits Max Value
2 53 11111111111111111111111111111111111111111111111111111
3 33 222222222222222222222222222222222
4 26 33333333333333333333333333
5 22 4444444444444444444444
6 20 55555555555555555555
7 18 666666666666666666
8 17 77777777777777777
9 16 8888888888888888
10 15 999999999999999
11 15 aaaaaaaaaaaaaaa
12 14 bbbbbbbbbbbbbb
13 14 cccccccccccccc
14 13 ddddddddddddd
15 13 eeeeeeeeeeeee
16 13 fffffffffffff
17 12 gggggggggggg
18 12 hhhhhhhhhhhh
19 12 iiiiiiiiiiii
20 12 jjjjjjjjjjjj
21 12 kkkkkkkkkkkk
22 11 lllllllllll
23 11 mmmmmmmmmmm
24 11 nnnnnnnnnnn
25 11 ooooooooooo
26 11 ppppppppppp
27 11 qqqqqqqqqqq
28 11 rrrrrrrrrrr
29 10 ssssssssss
30 10 tttttttttt
31 10 uuuuuuuuuu
32 10 vvvvvvvvvv
33 10 wwwwwwwwww
34 10 xxxxxxxxxx
35 10 yyyyyyyyyy
36 10 zzzzzzzzzz

You can create an array of maximum digit values, indexed by base, and test the length of your number before calling base_convert().

The (Often Big) Tradeoff

Limiting the maximum number of digits cuts the top range of integers you can convert safely with base_convert() — sometimes significantly. It depends on the relative value of the top (most significant) digit you are cutting. This table shows the percentage of safe integers allowed by the maximum digit value — what I call coverage — by base:

Coverage of Maximum Digit Safe Integers For base_convert(), By Base
Base Max Digits Coverage
2 53 ≈100%
3 33 ≈61.7%
4 26 ≈50%
5 22 ≈26.5%
6 20 ≈40.6%
7 18 ≈18.1%
8 17 ≈25%
9 16 ≈20.6%
10 15 ≈11.1%
11 15 ≈46.4%
12 14 ≈14.3%
13 14 ≈43.7%
14 13 ≈8.8%
15 13 ≈21.6%
16 13 ≈50%
17 12 ≈6.5%
18 12 ≈12.8%
19 12 ≈24.6%
20 12 ≈45.5%
21 12 ≈81.7%
22 11 ≈6.5%
23 11 ≈10.6%
24 11 ≈16.9%
25 11 ≈26.5%
26 11 ≈40.7%
27 11 ≈61.7%
28 11 ≈92.1%
29 10 ≈4.7%
30 10 ≈6.6%
31 10 ≈9.1%
32 10 ≈12.5%
33 10 ≈17%
34 10 ≈22.9%
35 10 ≈30.6%
36 10 ≈40.6%

You can improve your coverage significantly with a hybrid test: you can allow one digit more than the maximum length as long as the top digit occurs lexicographically before the value of the top digit in that base’s representation of 253. For example, In base 29, the maximum digit length number ssssssssss (10 digits) gives less than 5% coverage. 253 in base 29 is lbpf6d7shib (the top digit is the letter ‘l’, not the digit ‘1’ — which is why my arbitrary-precision base converter offers a character set that excludes that letter). Anything kssssssssss (which is 11 digits) or less is safe, giving over 98% coverage.

One More Option: Just Eyeball It

You can skip the programmatic checks if you know that the range of numbers you will be converting is safe — by looking at the tables beforehand.

A Better Warning

Perhaps this is a better warning for base_convert()’s documentation:

base_convert() can convert integers accurately only up to a certain size. This is due to the limits of the internal floating-point type used. If single-precision floating-point is used, the maximum integer is 224; if double-precision floating-point is used, the maximum integer is 253.

I would also add a note that said how to tell whether single or double precision is being used (this can be done by trial and error — but is there a better way?) Better yet, if there are no single-precision implementations, we could eliminate any talk of different precisions at all.

Dingbat

2 Responses to “The Safe Range For PHP’s base_convert()”

  1. Jovica Says:

    I made this website
    binarydecimal.com
    By using base_convert
    example
    binarydecimal.com/ff00ff-hexadecimal

  2. Rick Regan Says:

    @Jovica,

    Your converter seems not to honor base_convert()’s safe range (put a large enough number in and it returns garbage).

Leave a Comment

(To reduce spam, cookies must be enabled)


css.php