Decimal to floating point conversion #27307

hanna-kruppe · 2015-07-26T19:13:28Z

Completely rewrite the conversion of decimal strings to f64 and f32. The code is intended to be absolutely positively completely 100% accurate (when it doesn't give up). To the best of my knowledge, it achieves that goal. Any input that is not rejected is converted to the floating point number that is closest to the true value of the input. This includes overflow, subnormal numbers, and underflow to zero. In other words, the rounding error is less than or equal to 0.5 units in the last place. Half-way cases (exactly 0.5 ULP error) are handled with half-to-even rounding, also known as banker's rounding.

This code implements the algorithms from the paper How to Read Floating Point Numbers Accurately by William D. Clinger, with extensions to handle underflow, overflow and subnormals, as well as some algorithmic optimizations.

Correctness

With such a large amount of tricky code, many bugs are to be expected. Indeed tracking down the obscure causes of various rounding errors accounts for the bulk of the development time. Extensive tests (taking in the order of hours to run through to completion) are included in src/etc/test-float-parse: Though exhaustively testing all possible inputs is impossible, I've had good success with generating millions of instances from various "classes" of inputs. These tests take far too long to be run by @bors so contributors who touch this code need the discipline to run them. There are #[test]s, but they don't even cover every stupid mistake I made in course of writing this.

Another aspect is integer overflow. Extreme (or malicious) inputs could cause overflow both in the machine-sized integers used for bookkeeping throughout the algorithms (e.g., the decimal exponent) as well as the arbitrary-precision arithmetic. There is input validation to reject all such cases I know of, and I am quite sure nobody will accidentally cause this code to go out of range. Still, no guarantees.

Limitations

Noticed the weasel words "(when it doesn't give up)" at the beginning? Some otherwise well-formed decimal strings are rejected because spelling out the value of the input requires too many digits, i.e., digits * 10^abs(exp) can't be stored in a bignum. This only applies if the value is not "obviously" zero or infinite, i.e., if you take a near-infinity or near-zero value and add many pointless fractional digits. At least with the algorithm used here, computing the precise value would require computing the full value as a fraction, which would overflow. The precise limit is number_of_digits + abs(exp) > 375 but could be raised almost arbitrarily. In the future, another algorithm might lift this restriction entirely.

This should not be an issue for any realistic inputs. Still, the code does reject inputs that would result in a finite float when evaluated with unlimited precision. Some of these inputs are even regressions that the old code (mostly) handled, such as 0.333...333 with 400+ 3s. Thus this might qualify as [breaking-change].

Performance

Benchmarks results are... tolerable. Short numbers that hit the fast paths (f64 multiplication or shortcuts to zero/inf) have performance in the same order of magnitude as the old code tens of nanoseconds. Numbers that are delegated to Algorithm Bellerophon (using floats with 64 bit significand, implemented in software) are slower, but not drastically so (couple hundred nanoseconds).

Numbers that need the AlgorithmM fallback (for f64, roughly everything below 1e-305 and above 1e305) take far, far longer, hundreds of microseconds. Note that my implementation is not quite as naive as the expository version in the paper (it needs one to four division instead of ~1000), but division is fundamentally pretty expensive and my implementation of it is extremely simple and slow.

All benchmarks run on a mediocre laptop with a i5-4200U CPU under light load.

Binary size

Unfortunately the implementation needs to duplicate almost all code: Once for f32 and once for f64. Before you ask, no, this cannot be avoided, at least not completely (but see the Future Work section). There's also a precomputed table of powers of ten, weighing in at about six kilobytes.

Running a stage1 rustc over a stand-alone program that simply parses pi to f32 and f64 and outputs both results reveals that the overhead vs. the old parsing code is about 44 KiB normally and about 28 KiB with LTO. It's presumably half of that + 3 KiB when only one of the two code paths is exercised.

rustc options	old	new	delta
[nothing]	2588375	`2633828`	44.39 KiB
-O	2585211	2630688	44.41 KiB
-O -C lto	1026353	1054981	27.96 KiB
-O -C lto -C link-args=-s	414208	442368	27.5 KiB

Future Work

Directory layout

The dec2flt code uses some types embedded deeply in the flt2dec module hierarchy, even though nothing about them it formatting-specific. They should be moved to a more conversion-direction-agnostic location at some point.

Performance

It could be much better, especially for large inputs. Some low-hanging fruit has been picked but much more work could be done. Some specific ideas are jotted down in FIXMEs all over the code.

Binary size

One could try to compress the table further, though I am skeptical. Another avenue would be reducing the code duplication from basically everything being generic over T: RawFloat. Perhaps one can reduce the magnitude of the duplication by pushing the parts that don't need to know the target type into separate functions, but this is finicky and probably makes some code read less naturally.

Other bases

This PR leaves f{32,64}::from_str_radix alone. It only replaces FromStr (and thus .parse()). I am convinced that from_str_radix should not exist, and have proposed its deprecation and speedy removal. Whatever the outcome of that discussion, it is independent from, and out of scope for, this PR.

Fixes #24557
Fixes #14353

r? @pnkfelix

cc @lifthrasiir @huonw

Gankra · 2015-07-26T20:18:54Z

CC @lifthrasiir

eefriedman · 2015-07-26T22:26:26Z

The decimal exponent is too big. This limitation is shared with the old code, though it accepts any exponent that fits into an i64 while this version stops at i64::MAX / 10, since it needs to do arithmetic with the exponent.

This seems like something which could be handled trivially: if the exponent overflows, either the correct result is inf or 0, or the string has a stretch of zeros way too long to fit into memory.

hanna-kruppe · 2015-07-26T22:48:22Z

That's a good point. Exponent and number of decimal digits used to be a smaller integer type and I didn't stop to reconsider when I switched to 64 bit integers. However, exploiting this insight requires restructuring some code: The exponent is parsed in the parsing module, which isn't expected to produce any floats. I'll sleep over the least hacky way to integrate this.

A related aspect is that the integral and fractional parts really can't be larger than 1.8 exabyte, which is the limit now. I'll remove those guards when I do the other change.

eefriedman · 2015-07-26T23:48:11Z

However, the third bullet point would have rejected some sensible inputs, such as 1.7976931348623157e-324 (the smallest positive subnormal number, represented with 17 fractional digits; note that the same float is usually expressed as 5e-324).

Umm, 1.7976931348623157e-324 ? Are you sure you didn't mean some other number?

eefriedman · 2015-07-27T00:01:08Z

It would probably be a good idea to deprecate f32/f64::from_str_radix. (We don't want to keep them around given that they don't actually work correctly.)

eefriedman · 2015-07-27T00:06:21Z

src/libcore/num/dec2flt/rawfp.rs

+
+// Find the smallest floating point number strictly larger than the argument.
+// This operation is saturating, i.e. next_float(inf) == inf.
+// Unlike most code in this module, this function does handle zero, subnormals, and infinities.


This should explicitly document that it only handles floats with a positive sign bit.

eefriedman · 2015-07-27T00:34:22Z

This might be a silly question... but does fast_path() work correctly on Linux x86-32 with SSE disabled? (The Clinger paper calls out a similar scenario in section 9.)

hanna-kruppe · 2015-07-27T10:13:32Z

Umm, 1.7976931348623157e-324 ? Are you sure you didn't mean some other number?

Gah! The decimal part belongs to another boundary case. The correct number is 4.9406564584124654e-324 Edited.

It would probably be a good idea to deprecate f32/f64::from_str_radix. (We don't want to keep them around given that they don't actually work correctly.)

Good to hear. I'd like to collect some more support before I go ahead and do it, and I'm not sure if it needs to go into this PR.

This might be a silly question... but does fast_path() work correctly on Linux x86-32 with SSE disabled? (The Clinger paper calls out a similar scenario in section 9.)

Not silly at all! Unfortunately the VM on which I'd normally test this currently has... issues. If anyone could compile a short program doing a float multiplication with a 32 bit rust and -C target-feature="-sse" and post the assembly, that'd be great. x87 can round correctly by setting the control word correctly, but GCC for example never does this. I'd have to check if LLVM allows doing so and if rustc instructs it to.

hanna-kruppe · 2015-07-27T11:15:19Z

Lifted the unnecessary restrictions @eefriedman pointed out. I immediately squashed that into the last commit, I hope that's okay.

eefriedman · 2015-07-27T17:59:21Z

src/libcore/num/dec2flt/parse.rs

+    // in the decimal digits only adjusts the exponent by +/- 1, at exp = 10^18 the input would
+    // have to be 17 exabyte (!) of zeros to get even remotely close to being finite.
+    // This is not exactly a use case we need to cater to.
+    if number.len() >= 18 {


Do you need to trim zeros here?

Indeed, thanks! That would have been sensible even when rejecting large exponents.

eefriedman · 2015-07-27T20:28:30Z

src/libcore/num/dec2flt/mod.rs

+        return Some(T::zero());
+    }
+    // This is a crude approximation of ceil(log10(the real value)).
+    let max_place = e + decimal.integral.len() as i64;


Might want to explicitly note why the math involving e in this file can't overflow; it's a bit subtle.

Done (in the long comment at the start of the file, and also in parse.rs).

Gankra · 2015-07-27T23:06:07Z

re deprecation: kill it with fire

pnkfelix · 2015-08-06T21:17:46Z

Okay, done with review! I didn't see anything major, and I trust @rkruppe will be good about addressing any of my nits as @rkruppe sees fit.

So, r=me once it is ready to land. (I'll be out of town for a week, feel free to ping other people in #rust-internals for r=pnkfelix marker once its ready.)

hanna-kruppe · 2015-08-06T22:00:36Z

So I finally got around to testing whether the fast path is correct on x86 without SSE (thanks, @eefriedman!). Turns out the exact same problem described in the paper (in the context of a Motorola 68881/68882 floating point coprocessor) also applies to more common hardware. 1.448997445238699 is rounded incorrectly, to the float whose shortest decimal representation is 1.4489974452386991 (off by one ULP). Bummer. I'll do something about this before merging, but I don't know yet what.

The intent is to allow decimal-to-float parsing to use Fp in its fast path. That code is added in a later commit.

This is necessary for decimal-to-float code (in a later commit) to handle inputs such as 4.9406564584124654e-324 (the smallest subnormal f64). According to the benchmarks for flt2dec::dragon, this does not affect performance measurably. It probably uses slightly more stack space though.

- Exposing digits and individual bits - Counting the number of bits - Add small (digit-sized) values - Multiplication by power of 5 - Division with remainder All are necessary for decimal to floating point conversions. All but the most trivial ones come with tests.

fast decimal-to-float conversions.

This commit primarily adds implementations of the algorithms from William Clinger's paper "How to Read Floating Point Numbers Accurately". It also includes a lot of infrastructure necessary for those algorithms, and some unit tests. Since these algorithms reject a few (extreme) inputs that were previously accepted, this could be seen as a [breaking-change]

@bors

Running these tests takes hours, so they are not run by @bors.

This matches the behavior of clang. See also discussion on rust-lang#27307.

hanna-kruppe · 2015-08-12T09:11:06Z

Since there are no more in-tree targets that disable SSE, "deal with it" now means "explain that it's theoretically broken and add a test that will expose said brokenness if some unfortunate soul is forced to compile without SSE".

eddyb · 2015-08-13T13:29:35Z

@bors r=pnkfelix

bors · 2015-08-13T13:29:36Z

📌 Commit 15518a9 has been approved by pnkfelix

@bors

Completely rewrite the conversion of decimal strings to `f64` and `f32`. The code is intended to be absolutely positively completely 100% accurate (when it doesn't give up). To the best of my knowledge, it achieves that goal. Any input that is not rejected is converted to the floating point number that is closest to the true value of the input. This includes overflow, subnormal numbers, and underflow to zero. In other words, the rounding error is less than or equal to 0.5 units in the last place. Half-way cases (exactly 0.5 ULP error) are handled with half-to-even rounding, also known as banker's rounding. This code implements the algorithms from the paper [How to Read Floating Point Numbers Accurately][paper] by William D. Clinger, with extensions to handle underflow, overflow and subnormals, as well as some algorithmic optimizations. # Correctness With such a large amount of tricky code, many bugs are to be expected. Indeed tracking down the obscure causes of various rounding errors accounts for the bulk of the development time. Extensive tests (taking in the order of hours to run through to completion) are included in `src/etc/test-float-parse`: Though exhaustively testing all possible inputs is impossible, I've had good success with generating millions of instances from various "classes" of inputs. These tests take far too long to be run by @bors so contributors who touch this code need the discipline to run them. There are `#[test]`s, but they don't even cover every stupid mistake I made in course of writing this. Another aspect is *integer* overflow. Extreme (or malicious) inputs could cause overflow both in the machine-sized integers used for bookkeeping throughout the algorithms (e.g., the decimal exponent) as well as the arbitrary-precision arithmetic. There is input validation to reject all such cases I know of, and I am quite sure nobody will *accidentally* cause this code to go out of range. Still, no guarantees. # Limitations Noticed the weasel words "(when it doesn't give up)" at the beginning? Some otherwise well-formed decimal strings are rejected because spelling out the value of the input requires too many digits, i.e., `digits * 10^abs(exp)` can't be stored in a bignum. This only applies if the value is not "obviously" zero or infinite, i.e., if you take a near-infinity or near-zero value and add many pointless fractional digits. At least with the algorithm used here, computing the precise value would require computing the full value as a fraction, which would overflow. The precise limit is `number_of_digits + abs(exp) > 375` but could be raised almost arbitrarily. In the future, another algorithm might lift this restriction entirely. This should not be an issue for any realistic inputs. Still, the code does reject inputs that would result in a finite float when evaluated with unlimited precision. Some of these inputs are even regressions that the old code (mostly) handled, such as `0.333...333` with 400+ `3`s. Thus this might qualify as [breaking-change]. # Performance Benchmarks results are... tolerable. Short numbers that hit the fast paths (`f64` multiplication or shortcuts to zero/inf) have performance in the same order of magnitude as the old code tens of nanoseconds. Numbers that are delegated to Algorithm Bellerophon (using floats with 64 bit significand, implemented in software) are slower, but not drastically so (couple hundred nanoseconds). Numbers that need the AlgorithmM fallback (for `f64`, roughly everything below 1e-305 and above 1e305) take far, far longer, hundreds of microseconds. Note that my implementation is not quite as naive as the expository version in the paper (it needs one to four division instead of ~1000), but division is fundamentally pretty expensive and my implementation of it is extremely simple and slow. All benchmarks run on a mediocre laptop with a i5-4200U CPU under light load. # Binary size Unfortunately the implementation needs to duplicate almost all code: Once for `f32` and once for `f64`. Before you ask, no, this cannot be avoided, at least not completely (but see the Future Work section). There's also a precomputed table of powers of ten, weighing in at about six kilobytes. Running a stage1 `rustc` over a stand-alone program that simply parses pi to `f32` and `f64` and outputs both results reveals that the overhead vs. the old parsing code is about 44 KiB normally and about 28 KiB with LTO. It's presumably half of that + 3 KiB when only one of the two code paths is exercised. | rustc options | old | new | delta | |--------------------------- |--------- |--------- |----------- | | [nothing] | 2588375 | 2633828 | 44.39 KiB | | -O | 2585211 | 2630688 | 44.41 KiB | | -O -C lto | 1026353 | 1054981 | 27.96 KiB | | -O -C lto -C link-args=-s | 414208 | 442368 | 27.5 KiB | # Future Work ## Directory layout The `dec2flt` code uses some types embedded deeply in the `flt2dec` module hierarchy, even though nothing about them it formatting-specific. They should be moved to a more conversion-direction-agnostic location at some point. ## Performance It could be much better, especially for large inputs. Some low-hanging fruit has been picked but much more work could be done. Some specific ideas are jotted down in `FIXME`s all over the code. ## Binary size One could try to compress the table further, though I am skeptical. Another avenue would be reducing the code duplication from basically everything being generic over `T: RawFloat`. Perhaps one can reduce the magnitude of the duplication by pushing the parts that don't need to know the target type into separate functions, but this is finicky and probably makes some code read less naturally. ## Other bases This PR leaves `f{32,64}::from_str_radix` alone. It only replaces `FromStr` (and thus `.parse()`). I am convinced that `from_str_radix` should not exist, and have proposed its [deprecation and speedy removal][deprecate-radix]. Whatever the outcome of that discussion, it is independent from, and out of scope for, this PR. Fixes #24557 Fixes #14353 r? @pnkfelix cc @lifthrasiir @huonw [paper]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.4152 [deprecate-radix]: https://internals.rust-lang.org/t/deprecate-f-32-64-from-str-radix/2405

bors · 2015-08-13T13:29:40Z

⌛ Testing commit 15518a9 with merge bb954cf...

bors · 2015-08-13T16:09:20Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

IvanUkhov · 2015-08-14T01:21:33Z

@rkruppe, thank you for this work!

pnkfelix · 2015-08-18T20:14:08Z

\o/

donbright · 2018-12-17T01:02:45Z

"absolutely positively completely 100% accurate "

i question the basic assumption that base-10 numbers can be converted into base-2 with 100% accuracy. there will be information lost in the transfer, to me it seems that would result in accuracy of some number less than 100%. and that number should be measurable.

Change twice used large const table to static This table is used twice in core::num::dec2flt::algorithm::power_of_ten. According to the semantics of const, a separate huge definition of the table is inlined at both places. https://github.com/rust-lang/rust/blob/5233edcf1c7ee70ac25e4ec1115c3546f53d8a2d/library/core/src/num/dec2flt/algorithm.rs#L16-L22 Theoretically this gets cleaned up by optimization passes, but in practice I am experiencing a miscompile from LTO on this code. Making the table a static, which would only be defined a single time and not require attention from LTO, eliminates the miscompile and seems semantically more appropriate anyway. A separate bug report on the LTO bug is forthcoming. Original addition of `const` is from rust-lang#27307.

rust-highfive assigned pnkfelix Jul 26, 2015

eefriedman reviewed Jul 27, 2015
View reviewed changes

hanna-kruppe force-pushed the dec2flt branch from 8270600 to 216823c Compare July 27, 2015 09:46

hanna-kruppe force-pushed the dec2flt branch from 216823c to 9de3766 Compare July 27, 2015 11:13

eefriedman reviewed Jul 27, 2015
View reviewed changes

hanna-kruppe force-pushed the dec2flt branch from 9de3766 to 5e5b93d Compare July 27, 2015 20:15

eefriedman reviewed Jul 27, 2015
View reviewed changes

hanna-kruppe force-pushed the dec2flt branch 2 times, most recently from 3078a03 to 2df7da8 Compare August 8, 2015 11:40

Robin Kruppe added 5 commits August 8, 2015 17:12

Make core::num::dec2flt::strategy::grisu::Fp methods public.

b55806c

The intent is to allow decimal-to-float parsing to use Fp in its fast path. That code is added in a later commit.

Script for generating the powers-of-ten tables necessary for correct and

b7e39a1

fast decimal-to-float conversions.

hanna-kruppe force-pushed the dec2flt branch from 2df7da8 to a783ddc Compare August 8, 2015 15:17

Add optional, external tests for floating point parsing.

82dbc2e

Running these tests takes hours, so they are not run by @bors.

hanna-kruppe force-pushed the dec2flt branch 2 times, most recently from 6d622f5 to e0515f6 Compare August 10, 2015 21:21

eefriedman mentioned this pull request Aug 11, 2015

Use correct target CPU for iOS simulator. #27672

Merged

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Aug 12, 2015

rollup merge of rust-lang#27672: eefriedman/ios-target-cpu

d4f2ef9

This matches the behavior of clang. See also discussion on rust-lang#27307.

hanna-kruppe force-pushed the dec2flt branch from 3a60b04 to 6e1361a Compare August 12, 2015 09:07

Mention that the fast path is broken without SSE.

15518a9

hanna-kruppe force-pushed the dec2flt branch from 6e1361a to 15518a9 Compare August 12, 2015 09:10

bors merged commit 15518a9 into rust-lang:master Aug 13, 2015

hanna-kruppe mentioned this pull request Oct 5, 2015

Integer parsing should accept leading plus #28826

Merged

hanna-kruppe mentioned this pull request Oct 28, 2015

1.4 release notes #29416

Merged

hanna-kruppe mentioned this pull request Feb 4, 2016

Float parsing can fail on valid float literals #31407

Closed

hanna-kruppe mentioned this pull request Feb 3, 2017

Escaping char in libcore adds 2k of static data for no_std cases #39492

Open

lifthrasiir mentioned this pull request Mar 21, 2017

Add support for hexadecimal float literals #1433

Closed

dtolnay mentioned this pull request Aug 3, 2018

Performance cliff when parsing f64 above 1e305 #53015

Closed

dtolnay mentioned this pull request Mar 1, 2021

Change twice used large const table to static #82676

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decimal to floating point conversion #27307

Decimal to floating point conversion #27307

hanna-kruppe commented Jul 26, 2015

Gankra commented Jul 26, 2015

eefriedman commented Jul 26, 2015

hanna-kruppe commented Jul 26, 2015

eefriedman commented Jul 26, 2015

eefriedman commented Jul 27, 2015

eefriedman Jul 27, 2015

hanna-kruppe Jul 27, 2015

eefriedman commented Jul 27, 2015

hanna-kruppe commented Jul 27, 2015

hanna-kruppe commented Jul 27, 2015

eefriedman Jul 27, 2015

hanna-kruppe Jul 27, 2015

eefriedman Jul 27, 2015

hanna-kruppe Jul 27, 2015

Gankra commented Jul 27, 2015

pnkfelix commented Aug 6, 2015

hanna-kruppe commented Aug 6, 2015

hanna-kruppe commented Aug 12, 2015

eddyb commented Aug 13, 2015

bors commented Aug 13, 2015

bors commented Aug 13, 2015

bors commented Aug 13, 2015

IvanUkhov commented Aug 14, 2015

pnkfelix commented Aug 18, 2015

donbright commented Dec 17, 2018

Decimal to floating point conversion #27307

Decimal to floating point conversion #27307

Conversation

hanna-kruppe commented Jul 26, 2015

Correctness

Limitations

Performance

Binary size

Future Work

Directory layout

Performance

Binary size

Other bases

Gankra commented Jul 26, 2015

eefriedman commented Jul 26, 2015

hanna-kruppe commented Jul 26, 2015

eefriedman commented Jul 26, 2015

eefriedman commented Jul 27, 2015

eefriedman Jul 27, 2015

Choose a reason for hiding this comment

hanna-kruppe Jul 27, 2015

Choose a reason for hiding this comment

eefriedman commented Jul 27, 2015

hanna-kruppe commented Jul 27, 2015

hanna-kruppe commented Jul 27, 2015

eefriedman Jul 27, 2015

Choose a reason for hiding this comment

hanna-kruppe Jul 27, 2015

Choose a reason for hiding this comment

eefriedman Jul 27, 2015

Choose a reason for hiding this comment

hanna-kruppe Jul 27, 2015

Choose a reason for hiding this comment

Gankra commented Jul 27, 2015

pnkfelix commented Aug 6, 2015

hanna-kruppe commented Aug 6, 2015

hanna-kruppe commented Aug 12, 2015

eddyb commented Aug 13, 2015

bors commented Aug 13, 2015

bors commented Aug 13, 2015

bors commented Aug 13, 2015

IvanUkhov commented Aug 14, 2015

pnkfelix commented Aug 18, 2015

donbright commented Dec 17, 2018