-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new version of icu_pattern crate #4579
Conversation
/// If set to `false`, ASCII letters can only appear in quoted literals, | ||
/// like "{0} 'days'". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: how is "ASCII letters" defined? Space doesn't seem to be part of it, what about all the other non a-z ASCII characters?
#[derive(PartialEq, Debug, Clone)] | ||
pub(crate) enum PatternToken<'s, P> { | ||
Placeholder(P), | ||
Literal { content: Cow<'s, str>, quoted: bool }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the quoted
output used for? Do any consumers behave differently whether a literal was quoted or not?
/// Zero copy parsing is a model which allows the parser to produce tokens that are de-facto | ||
/// slices of the input without ever having to modify the input or copy from it. | ||
/// | ||
/// In case of ICU patterns that decision brings a trade-off around handling of quoted literals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are "ICU patterns"?
@robertbastian The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
design seems fine overall. didn't look too closely at the parser
@zbraniecki Are you OK if I replace |
Discussed with @robertbastian and agreed to make a version that includes a generic for the number of placeholders with potential data model optimizations. |
Based on feedback from @robertbastian I opened a new PR with a different data model, #4610. I think both data models have pros and cons. For reference, here is the data model for this PR: I have not run performance benchmarks and can do so if that is the deciding factor. As far as data size, I expect the model implemented in this PR to be slightly smaller than the one implemented in #4610. |
I'm quite happy now with #4610 so I will close this PR. Archived as https://github.com/sffc/omnicu/tree/archive/NumericPlaceholderPattern |
We need this in so many places that I decided to build it.
I added a new type
NumericPlaceholderPattern
which, as the name suggests, supports only numeric placeholders. It stores all of its data in a string, so it can be stored easily in a data provider and interpolated zero-copy. I took some inspiration from the currenticu_pattern
when designing the API, but my proposed API has fewer generics and instead specialized on this one common use case.Two things we don't support, and the path to add them:
StringPlaceholderPattern
.ContextualPlaceholderPattern
.The only code I copied from the existing crate was the
Parser
impl, which turns out to be more lines of code than the fresh code that I wrote.@zbraniecki, if you like this direction, I can just move this code into the existing path in the file tree.