# The Safe Range For PHP’s base_convert()

Copyright © 2008-2016 Exploring Binary

http://www.exploringbinary.com/the-safe-range-for-phps-base-convert/

PHP’s base_convert() is a useful function that converts integers between any pair of bases, 2 through 36. However, you might hesitate to use it after reading this vague and mysterious warning in its documentation:

base_convert() may lose precision on large numbers due to properties related to the internal “double” or “float” type used.

The truth is that it works perfectly for integers up to a certain maximum — you just have to know what that is. I will show you this maximum value in each of the 35 bases, and how to check if the values you are using are within this limit.

## Range of Integers Exactly Representable in Floating-Point

Many numbers can only be approximated in IEEE binary floating-point, while others can be represented exactly. Integers starting from 0 and going sequentially to some maximum are among those numbers that *are* exactly representable. In single-precision floating-point, the maximum is 2^{24}; in double-precision floating-point, the maximum is 2^{53}. (Technically the maximums are the largest 24 significant bit integer, 2^{24} – 1, and the largest 53 significant bit integer, 2^{53} – 1. But 2^{24} and 2^{53} are exactly representable by virtue of being powers of two.)

For the remainder of this article, I will assume PHP uses double-precision floating-point. (I’ve only ever seen double-precision implementations — are there implementations that uses single-precision?)

## base_convert()

base_convert() takes three arguments: the input number (as a string), the input base, and the output base. The output number is returned as a string. The digits for bases 2 through 36 are drawn from the digits 0-9 and letters a-z (mixed case is accepted, but lower case is returned). You can pass any integer, but the output is guaranteed to be accurate only for integers less than or equal to 2^{53}; I will call these integers * safe* for base_convert().

### Implementation

We have to look at base_convert()’s implementation to make sure it never produces intermediate calculations that exceed the safe range of integers.

base_convert() proceeds in two steps: first it converts the input string to numeric binary, using a C function called *basetozval()*, and then it converts that numeric binary value to the output string, using a C function called *zvaltobase()*.

*basetozval()* is straightforward. It executes this line of code for every digit of the input string:

fnum = fnum * base + c;

fnum is the accumulating numeric binary value of the input, and c is the numeric value of the current digit. Although fnum is declared as a double, its intermediate values will all be integers, and its final integer sum cannot exceed 2^{53} if the input itself does not.

*zvaltobase()* is also simple, but it generates intermediate floating-point values, so deserves a closer look. Here is the relevant loop:

do { *--ptr = digits[(int) fmod(fvalue, base)]; fvalue /= base; } while (ptr > buf && fabs(fvalue) >= 1);

This generates output digits until the numeric input value is consumed. For each iteration, the value of each digit is the remainder of fvalue/base, and the value of fvalue is reset to fvalue/base.

I would have written this code in a slightly different, but equivalent way:

do { *--ptr = digits[(int) fmod(fvalue, base)]; fvalue =floor(fvalue/base); } while (ptr > buf && fabs(fvalue)> 0);

In this form it is obvious that it works, assuming we start with a value of 2^{53} or less. It essentially performs integer arithmetic: fvalue is always an integer, as is the fmod() result.

### Unsafe Conversions

There’s only a very small chance that an integer greater than 2^{53} will be converted correctly; three conditions have to be met:

- The integer has to be exactly representable.
- base_convert()’s “quick and dirty like” conversion routines have to convert the integer to the right double-precision floating-point number
*and*print it correctly. - The converted to number has to be less than or equal to 64 digits (that’s the size of the output buffer used).

Furthermore, integers that are not exactly representable may not even be correctly rounded (in the IEEE 754 sense); this is a consequence of doing the conversion in limited precision floating-point.

Here are examples of unsafe conversions, with incorrect results:

#### Example 1

<?php echo base_convert('3674675646464747008',10,2) . '<br>'; ?>

3674675646464747008 is exactly representable, as the double-precision number

11001011111111000100100000011111111100010011110110011000000000

base_convert() returns

11001011111111000100100000011111111100010011110110100000000000

which is one ULP too high.

#### Example 2

<?php echo base_convert('123456789012345669025792',10,2) . '<br>'; ?>

123456789012345669025792 is exactly representable, as the double-precision number

11010001001001001101100011111000100001010000001101100000000000000000000000000

base_convert() returns

1001101100011111000100001010000001101100000000000000000000000000

which is way, way off. (The 64 digit limit was hit, cutting off the most significant digits.)

#### Example 3

<?php echo base_convert('1234567890123456789',10,2) . '<br>'; ?>

1234567890123456789 is *not* exactly representable; the closest double precision number is

1000100100010000100001111010001111101111010011000000100000000

base_convert() returns

1000100100010000100001111010001111101111010011000001000000000

which is one ULP too high (so is not correctly rounded).

## Maximum Integer Representations By Base

This table shows the maximum integer safely convertable by base_convert() — 2^{53} — as represented in each of the 35 bases:

Base | Max Value (2^{53}) |
---|---|

2 | 100000000000000000000000000000000000000000000000000000 |

3 | 1121202011211211122211100012101112 |

4 | 200000000000000000000000000 |

5 | 33421042423033203202432 |

6 | 224404414114114022452 |

7 | 5350140446150306054 |

8 | 400000000000000000 |

9 | 47664754584305345 |

10 |
9007199254740992 |

11 | 2179a75830112628 |

12 | 702273685b77a28 |

13 | 2397b7325802696 |

14 | b4c34aaccadc64 |

15 | 4964cdca1dc7b2 |

16 | 20000000000000 |

17 | f7ded8c9e1f8f |

18 | 7e2c925c889fe |

19 | 416210bi7ca4a |

20 | 23jc3e8722c9c |

21 | 14f01e5ec7fdb |

22 | f92hf53a8cc8 |

23 | 9a9i7gmkbfj6 |

24 | 5m1bec25hbd8 |

25 | 3jb4ed3h3aeh |

26 | 2bko8jf78bb6 |

27 | 1gk4mmhm95ae |

28 | 12bd1h7b56h4 |

29 | lbpf6d7shib |

30 | f7iboftrod2 |

31 | aukoap6ali8 |

32 | 80000000000 |

33 | 5t2d3e17rj8 |

34 | 4cbreicjccw |

35 | 399uaj5f5vw |

36 | 2gosa7pa2gw |

## A Direct Test

A table of base-specific maximum values is interesting, but it does not lead to a simple, base-independent, programmatic test. Here is such a test:

<?php function base_convert_is_safe($integer,$base) { if (bindec(base_convert($integer,$base,2)) <= 9007199254740991) { $isSafe = TRUE; } else { $isSafe = FALSE; } return $isSafe; } ?>

This test uses base_convert() to convert the input string to a binary string, and then uses another PHP conversion function — bindec(), which by the way is safe up to 2^{53} as well — to convert that binary string to a numeric binary value. This value is then compared to the numeric binary value of (decimal) 9007199254740991, which is 2^{53} – 1. (Testing for 2^{53} is problematic; with the default IEEE rounding mode of *round-to-nearest/ties-to-even*, 2^{53} + 1 is indistinguishable from 2^{53}. We thus have to cut the top number from the safe range.)

This test is not very efficient, since it uses a call to bindec() and a call to base_convert() before the actual base_convert() conversion you want to do.

Here are examples of how to use the tester function:

### Example 4

<?php $integer_from = '2gosa7pa2gv'; $base_from = 36; if (base_convert_is_safe($integer_from,$base_from)) { $integer_to = base_convert($integer_from,$base_from,2); echo $integer_to . '<br>'; } else { echo 'NOT safe to use base_convert()<br>'; } ?>

The input value equals 2^{53} – 1, so this prints

11111111111111111111111111111111111111111111111111111

### Example 5

<?php $integer_from = '2gosa7pa2gx'; $base_from = 36; if (base_convert_is_safe($integer_from,$base_from)) { $integer_to = base_convert($integer_from,$base_from,2); echo $integer_to . '<br>'; } else { echo 'NOT safe to use base_convert()<br>'; } ?>

The input value equals 2^{53} + 1, so this prints

NOT safe to use base_convert()

## Maximum Digits By Base

There is another way to test for safe conversions that is a little less simple but much more efficient — but you have to sacrifice the top end of the safe range of integers to use it. For example, in decimal, 2^{53} is the 16-digit integer 9007199254740992. If we limit the safe range of decimal integers to 15 digits — that is, integers up to and including 999999999999999 — we can simply take the length of the input and make sure it is less than or equal to 15. We don’t have to consider the value of the number per se.

This works similarly for the other bases; for example:

- 2
^{53}in base 7 is the 19-digit integer 5350140446150306054; if we cap it at 18 digits, we get 666666666666666666 - 2
^{53}in base 16 is the 14-digit integer 20000000000000; if we cap it at 13 digits, we get fffffffffffff - 2
^{53}in base 36 is the 11-digit integer 2gosa7pa2gw; if we cap it at 10 digits, we get zzzzzzzzzz

(Taking the cutoff as one digit less than the representation for 2^{53} is the same as taking the integer part of the logarithm to the base “from base” of 2^{53}: floor(log_{fromBase}(2^{53}) .)

Here is a table showing the maximum cap of digits for each base, and the corresponding values:

Base | Max Digits | Max Value |
---|---|---|

2 | 53 | 11111111111111111111111111111111111111111111111111111 |

3 | 33 | 222222222222222222222222222222222 |

4 | 26 | 33333333333333333333333333 |

5 | 22 | 4444444444444444444444 |

6 | 20 | 55555555555555555555 |

7 | 18 | 666666666666666666 |

8 | 17 | 77777777777777777 |

9 | 16 | 8888888888888888 |

10 |
15 |
999999999999999 |

11 | 15 | aaaaaaaaaaaaaaa |

12 | 14 | bbbbbbbbbbbbbb |

13 | 14 | cccccccccccccc |

14 | 13 | ddddddddddddd |

15 | 13 | eeeeeeeeeeeee |

16 | 13 | fffffffffffff |

17 | 12 | gggggggggggg |

18 | 12 | hhhhhhhhhhhh |

19 | 12 | iiiiiiiiiiii |

20 | 12 | jjjjjjjjjjjj |

21 | 12 | kkkkkkkkkkkk |

22 | 11 | lllllllllll |

23 | 11 | mmmmmmmmmmm |

24 | 11 | nnnnnnnnnnn |

25 | 11 | ooooooooooo |

26 | 11 | ppppppppppp |

27 | 11 | qqqqqqqqqqq |

28 | 11 | rrrrrrrrrrr |

29 | 10 | ssssssssss |

30 | 10 | tttttttttt |

31 | 10 | uuuuuuuuuu |

32 | 10 | vvvvvvvvvv |

33 | 10 | wwwwwwwwww |

34 | 10 | xxxxxxxxxx |

35 | 10 | yyyyyyyyyy |

36 | 10 | zzzzzzzzzz |

You can create an array of maximum digit values, indexed by base, and test the length of your number before calling base_convert().

### The (Often Big) Tradeoff

Limiting the maximum number of digits cuts the top range of integers you can convert safely with base_convert() — sometimes significantly. It depends on the relative value of the top (most significant) digit you are cutting. This table shows the percentage of safe integers allowed by the maximum digit value — what I call *coverage* — by base:

Base | Max Digits | Coverage |
---|---|---|

2 | 53 | ≈100% |

3 | 33 | ≈61.7% |

4 | 26 | ≈50% |

5 | 22 | ≈26.5% |

6 | 20 | ≈40.6% |

7 | 18 | ≈18.1% |

8 | 17 | ≈25% |

9 | 16 | ≈20.6% |

10 |
15 |
≈11.1% |

11 | 15 | ≈46.4% |

12 | 14 | ≈14.3% |

13 | 14 | ≈43.7% |

14 | 13 | ≈8.8% |

15 | 13 | ≈21.6% |

16 | 13 | ≈50% |

17 | 12 | ≈6.5% |

18 | 12 | ≈12.8% |

19 | 12 | ≈24.6% |

20 | 12 | ≈45.5% |

21 | 12 | ≈81.7% |

22 | 11 | ≈6.5% |

23 | 11 | ≈10.6% |

24 | 11 | ≈16.9% |

25 | 11 | ≈26.5% |

26 | 11 | ≈40.7% |

27 | 11 | ≈61.7% |

28 | 11 | ≈92.1% |

29 | 10 | ≈4.7% |

30 | 10 | ≈6.6% |

31 | 10 | ≈9.1% |

32 | 10 | ≈12.5% |

33 | 10 | ≈17% |

34 | 10 | ≈22.9% |

35 | 10 | ≈30.6% |

36 | 10 | ≈40.6% |

You can improve your coverage significantly with a hybrid test: you can allow one digit more than the maximum length as long as the top digit occurs lexicographically before the value of the top digit in that base’s representation of 2^{53}. For example, In base 29, the maximum digit length number *ssssssssss* (10 digits) gives less than 5% coverage. 2^{53} in base 29 is *lbpf6d7shib* (the top digit is the letter ‘l’, not the digit ‘1’ — which is why my arbitrary-precision base converter offers a character set that excludes that letter). Anything kssssssssss (which is 11 digits) or less is safe, giving over 98% coverage.

## One More Option: Just Eyeball It

You can skip the programmatic checks if you know that the range of numbers you will be converting is safe — by looking at the tables beforehand.

## A Better Warning

Perhaps this is a better warning for base_convert()’s documentation:

base_convert() can convert integers accurately only up to a certain size. This is due to the limits of the internal floating-point type used. If single-precision floating-point is used, the maximum integer is 2

^{24}; if double-precision floating-point is used, the maximum integer is 2^{53}.

I would also add a note that said how to tell whether single or double precision is being used (this can be done by trial and error — but is there a better way?) Better yet, if there are no single-precision implementations, we could eliminate any talk of different precisions at all.

May 16th, 2016 at 3:09 am

I made this website

binarydecimal.com

By using base_convert

example

binarydecimal.com/ff00ff-hexadecimal

May 16th, 2016 at 8:46 am

@Jovica,

Your converter seems not to honor base_convert()’s safe range (put a large enough number in and it returns garbage).