Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Returning number of characters processed in ryu_parse #147

Open
ecorm opened this issue Jan 11, 2020 · 15 comments
Open

Feature request: Returning number of characters processed in ryu_parse #147

ecorm opened this issue Jan 11, 2020 · 15 comments

Comments

@ecorm
Copy link

ecorm commented Jan 11, 2020

ryu_parse does not return the number of characters processed, so we are forced to tokenize the input before feeding it to s2d. s2d already knows when to stop consuming the input at the first non-number character, so there ends up being a duplication of effort and parsing a list of numbers is slower than it should be.

Both std::strtod and std::from_chars return a pointer to the first character not matching the number pattern. ryu_parse should do the same.

With this proposal, the INPUT_TOO_LONG status code would become obsolete.

@ecorm
Copy link
Author

ecorm commented Jan 11, 2020

I propose that ryu_parse provides a drop-in replacement for strtod with the same behavior except that it's locale-independant.

@ecorm
Copy link
Author

ecorm commented Jan 12, 2020

@expnkx thanks for suggesting your library, but I only need low-level parsing primitives for a library I'm working on. I want the number parsing dependency for my library to be as small as possible, with std::strtod as default for users who don't care about speed or switching locales.

@ecorm
Copy link
Author

ecorm commented Jan 12, 2020

@expnkx I agree the API for std::strtod is terrible. I'm more concerned about the penalty of pre-tokenizing the number than the actual API. Matching the std::strtod API would make it simpler to choose between the standard library and Ryu implementations at compile time using preprocessor directives.

@ecorm
Copy link
Author

ecorm commented Jan 12, 2020

@expnkx The purpose of this issue is to improve this project, not to promote your library.

@ecorm
Copy link
Author

ecorm commented Jan 12, 2020

@expnkx I don't mind your suggestions for a better API within the scope of this project. What I do mind is your insistance to use your library instead of this one.

If you think nobody should be using this library, then please go away - you're not helping. This is an issue tracker specific to this project, not your personal soap box.

@ulfjack
Copy link
Owner

ulfjack commented Jan 15, 2020

@ecorm, I'm not sure I follow. s2d_n is supposed to consume the entire input. I can imagine an API that consumes as much as possible and leaves it to the caller to check if the entire input is consumed, although I think it's a bit risky if there's a mismatch between the format expected here and the format expected one level up.

@ecorm
Copy link
Author

ecorm commented Jan 15, 2020

@ulfjack Consuming as much as possible is exactly what std::strtod, std::from_chars, and double_conversion::StringToDoubleConverter::StringToDouble do (up to a maximum specified by the caller), and they return the number of characters consumed. When parsing a number from a larger string (such as a JSON document), having ryu_parse return the number of characters consumed avoids the need to look ahead and determine the number boundary, which hurts the speed advantage of using ryu_parse in the first place.

If you think returning the number of characters consumed would somehow hurt the performance, or you want to avoiding breaking the existing API, then I'd propose extra parsing functions that do return the number of characters consumed.

It turns out after all that I need to find the number boundary anyway within my JSON document, so I longer need the number of bytes consumed. However, I suggest that you leave this feature request open for a while to see if others might be interested in it.

@ecorm ecorm changed the title ryu_parse does not return number of characters processed Feature request: Returning number of characters processed in ryu_parse Jan 15, 2020
@rlespinet
Copy link
Contributor

rlespinet commented Jun 13, 2020

Hi, I tweaked the s2d and s2f implementation for my needs, which involves returning the number of characters processed. I also removed the status code, and replaced them by asserts, which makes these functions unsafe, but I'd be happy to create a PR which only returns the number of character processed if you're interested.

rlespinet@52db0bb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@ulfjack @ecorm @rlespinet and others