SQLite has a limited-precision floating-point to decimal conversion routine which it uses to print double-precision floating-point values retrieved from a database. As I’ve discovered, its limited-precision conversion results in decimal numbers of 15 significant digits or less that won’t round-trip. For example, if you store the number 9.944932e+31, it will print back as 9.94493200000001e+31.
SQLite also has a limited-precision decimal to floating-point conversion routine, which it uses to convert input decimal numbers to double-precision floating-point numbers for storage in a database. I’ve found that some of its conversions are incorrect — by as many as four ULPs — and that some decimal numbers fail to round-trip because of this; “garbage in, garbage out” as they say.
Incorrect Decimal to Floating-Point Conversions
To find examples of incorrect decimal to floating-point conversions, I wrote a program that generates random decimal strings and converts them using SQLite’s sqlite3AtoF() function (to call sqlite3AtoF(), I had to modify the SQLite source code file sqlite3.c to make it a non-static function). I compared SQLite’s conversion to the correct one, as calculated by David Gay’s strtod() function.
Here are some incorrect conversions I found on a Windows Visual C++ build of SQLite:
Input | Correct | SQLite | ULPs in error |
---|---|---|---|
1e-23 | 0x1.82db34012b251p-77 | 0x1.82db34012b252p-77 | 1 |
8.533e+68 | 0x1.fa69165a8eea2p+228 | 0x1.fa69165a8eea0p+228 | 2 |
4.1006e-184 | 0x1.be0d1c7ea60c9p-610 | 0x1.be0d1c7ea60ccp-610 | 3 |
9.998e+307 | 0x1.1cc0a350ca87bp+1023 | 0x1.1cc0a350ca87ep+1023 | 3 |
9.9538452227e-280 | 0x1.2117ae45cde43p-927 | 0x1.2117ae45cde40p-927 | 3 |
6.47660115e-260 | 0x1.fdd9e333badadp-862 | 0x1.fdd9e333bada9p-862 | 4 |
Here are some incorrect conversions I found on a Linux build of SQLite:
Input | Correct | SQLite | ULPs in error |
---|---|---|---|
7.4e+47 | 0x1.033d7eca0adefp+159 | 0x1.033d7eca0adeep+159 | 1 |
5.92e+48 | 0x1.033d7eca0adefp+162 | 0x1.033d7eca0adeep+162 | 1 |
7.35e+66 | 0x1.172b70eababa9p+222 | 0x1.172b70eababaap+222 | 1 |
8.32116e+55 | 0x1.b2628393e02cdp+185 | 0x1.b2628393e02cep+185 | 1 |
On Windows, I found conversions that are off by up to four ULPs; on Linux, I could only find conversions that were off by at most one ULP. SQLite on Linux converts more accurately because it uses extended-precision arithmetic, whereas SQLite on Windows uses only double-precision.
Bad Round-Trips Due to Decimal to Floating-Point Conversion
Incorrect decimal to floating-point conversions can prevent decimal numbers from round-tripping. Of the six examples of incorrect conversions on Windows, two caused bad round-trips:
sqlite> create table t1(d real); sqlite> insert into t1 values(1e-23); sqlite> insert into t1 values(8.533e+68); sqlite> insert into t1 values(4.1006e-184); sqlite> insert into t1 values(9.998e+307); sqlite> insert into t1 values(9.9538452227e-280); sqlite> insert into t1 values(6.47660115e-260); sqlite> select * from t1; 1.0e-23 8.533e+68 4.1006e-184 9.99800000000001e+307 9.95384522269999e-280 6.47660115e-260
You can’t tell from the output that the cause is incorrect decimal to floating-point conversion, but I verified that it is: in debug mode, I manually changed SQLite’s double to the correctly converted value, and the correct decimal value was printed by its floating-point to decimal conversion routine.
I found many more examples that I didn’t show. Strangely, each had one of three exponents: -280, -277, or +307. Also strange is that all the bad round-trips were caused by three-ULP errors; if three ULPs can cause an error, why not four?
No Bad Round-Trips on Linux
On Linux, I found no bad round-trips caused by incorrect decimal to floating-point conversions. This is probably because no conversion was off by more than one ULP.
Fixing the Decimal to Floating-Point Conversion Routine
Ideally, all decimal to floating-point conversions in SQLite would be done correctly; however, this would require arbitrary-precision arithmetic. At the least, the Windows build of SQLite could use extended-precision arithmetic so that it can convert as accurately as the Linux build. This would reduce, if not eliminate, the possibility of bad 15-digit or less round-trip conversions.
Fixing the decimal to floating-point conversion routine would have an additional benefit: it would improve the accuracy of a user’s floating-point calculations. SQLite’s decimal to floating-point conversion routine is the first I’ve encountered that produces incorrect results exceeding one ULP (compare to Visual C++, GCC, and GLIBC, for example). I can’t say what impact these large conversion errors have, since I don’t know what kinds of calculations SQLite users — Windows users in particular — do with floating-point numbers.
A Related SQLite Bug
A SQLite bug titled “Insufficient precision of sqlite3_column_text16() for floating point values” addresses the round-trip from the other direction: floating-point to decimal to floating-point. That would require SQLite to return decimal numbers to up to 17 digits, not 15. The bug report calls for changing the hardcoded '15' to '17', but it would require more than that; the conversion routines would have to use higher precision.
This and the previous articles intrigue me. You may not know the impact of these errors, as you don’t know the nature of a users’ calculations, but I can’t help thinking that there is some havoc being raised. While it may be the case that people may catch large errors, I wonder about the smaller ones, ones analogous to taking 1/4 of a cent from every bank account. At some point the errors grow large, but it may be too late to backtrack and make changes. This would be really the case in a dynamic system, and with critical ones, such as infrastructure control (I think of a nuclear plant, for example), the problem can be critical. The other errors you point out in previous articles are disconcerting. On a larger scale, how much of a problem is there in basic routines in other applications upon which people depend? Yours is but a rather small sample of all applications, and I’d be curious to see how widespread these types of problems are.
Jeremy,
The interesting thing is that these conversion errors have existed for a long time (in the SQLite source, for instance, there’s this comment: “The “printf” code that follows dates from the 1980’s.”). I’d like to think that if it were a major problem, we’d know by now. But then again …
Decimal values (money, currency rates) shouldn’t be stored as floating-points value – just because of the errors you have described.
Instead you can use integer values + additional “wrapper” code.
For example: https://github.com/vpiotr/decimal_for_cpp
Second note:
Maybe this will help – “What Every Scientist Should Know About Floating-Point Arithmetic” – http://dlc.sun.com/pdf/800-7895/800-7895.pdf
It clarifies why 2.0 + 2.0 (not always) = 4.0
@Piotr,
I’m quite aware of the inaccuracies of floating-point and of the problems with storing currency in floating-point variables — but that’s besides the point. SQLite lets you store floating-point values, and as such, should be held to the same standard that guides other floating-point implementations.
Are you sure it’s SQLite problem and not it’s runtime library?
As I understood you are comparing custom version of strtod (David Gay’s) function with code inside SQLite (potentially using VisualStudio runtime conversions – if used).
Maybe including results for original strtod from VS, GCC should help in understanding where is the problem.
@Piotr,
SQLite is not using a runtime library to do its conversions (it has “rolled its own”).
The bad conversions manifest themselves in the precompiled binary for Windows (see the SQLite “transcript” above), so it’s not just my build that is causing the problem.
Looks like you are referring to sqlite3AtoF().
It handles UTF-8 and leading white spaces in addition to normal strtod.
If you can ignore these issues, I think it would be easy to modify the code to use standard / custom strtod() (#ifdef).
Did you try to add a ticket for it?
Like this one?
http://www.sqlite.org/cvstrac/tktview?tn=1662
@Piotr,
Yes, sqlite3AtoF(), as I mentioned in the article :).
I am an “anonymous” user so I’m not allowed to enter bug reports (see SQLite’s bug reporting policy). I tried reporting another SQLite bug through the mailing list, as per policy, but no one opened a bug report. If you have the authority, feel free to do so — for both bugs.
@Rick:
In fact you are fighting with very small differences, that will appear as long as SQLite uses own conversions. If you need more precision, maybe it would be better to store numbers as long double in integer form (for small numbers, up to 18 digits) or text form.
@Piotr,
I’m sure there are plenty of workarounds. But the fact remains that SQLite has floating-point support, and I think it should at least meet the standard of other floating-point implementations (max 1 ULP decimal to floating-point conversion error, and decimal numbers of 15 digits or less round-trip).
BTW, I’m not using this in any application — I was just evaluating it.
Today I encountered a similar issue with SQLite in the version of PHP that ships with OSX, only it is an issue with integers instead of floats. While doing some unit tests on a caching library I’m working on I had a test to confirm that the expiration date (a timestamp) put in was the same returned. Every few dozen runs of the test suite would fail on this test, as SQLite was passing back a timestamp that was one second less than what was originally put in-
Failed asserting that matches expected .
@Robert,
I just took a quick look at sqlite3.c. I see that there’s a function called parseDateOrTime() that calls sqlite3AtoF(). Maybe that comes into play in what you’re seeing?