-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: [RegexGenerator(...)] attribute #58880
Comments
Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions Issue DetailsBackground and motivationAttribute to trigger regex source generation: API Proposalnamespace System.Text.RegularExpressions
{
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false, Inherited = false)]
public sealed class RegexGeneratorAttribute : Attribute
{
// Same three constructors as on Regex, except the timeout is an int (ms) instead of a TimeSpan
public RegexGeneratorAttribute(string pattern);
public RegexGeneratorAttribute(string pattern, RegexOptions options);
public RegexGeneratorAttribute(string pattern, RegexOptions options, int matchTimeout);
public string Pattern { get; }
public RegexOptions Options { get; }
public int MatchTimeout { get; }
}
} API Usage[RegexGenerator(@"\b[t]\w+", RegexOptions.IgnoreCase | RegexOptions.InvariantCulture)]
private static partial Regex GetRegexToFindWordsBeginningWithT();
...
Regex r = GetRegexToFindWordsBeginningWithT();
foreach (Match m in r.Matches(input))
{
...
} Risksn/a
|
Is the instance cached or do you get a new regex every call? |
It hasn't been implemented yet, so, neither :) But my intention is that it gets cached as a singleton. |
If the method were to return a cached singleton or would otherwise be cheap to call, could/should this be done via a field/property rather than a method? [RegexGenerator(@"\b[t]\w+", RegexOptions.IgnoreCase | RegexOptions.InvariantCulture)]
private static readonly Regex RegexToFindWordsBeginningWithT;
// or
[RegexGenerator(@"\b[t]\w+", RegexOptions.IgnoreCase | RegexOptions.InvariantCulture)]
private static Regex RegexToFindWordsBeginningWithT { get; } I'm still not very familiar with what source generators can do, but I assume they would be able to support this by outputting a static constructor to set the field/property? Although this would be a slight footgun as you could technically include this attribute for source generation, and also initialise the field yourself (which would be a silly thing to do), probably resulting in strange behaviour... Edit: Ah, I see a class can only have a single static constructor, so my earlier idea of having a source generator output a static constructor would also preclude users from including their own static constructors, so it seems like a no-go |
If C# adds support for partial properties in the future, it could support that as well. |
This is a good use case for it. It also make me wonder if it would be better to support regex literals in the compiler (that's essentially what this does). |
Why? |
It would feel more ergonomic and could potentially optimize more than a source generator could (it can rewrite the expression). |
But one thing I'm overlooking is the regex options. I'm not sure what that would look like with literal syntax. |
So can the source generator. We already do optimizing / rewriting in Regex.
And timeout. |
It's about it feeling more natural and ergonomic. The attribute on top of a partial method doesn't feel very nice. It's working around language limitations. |
Which language limitations? If you mean lack of partial properties, the source generator can easily be augmented when that's in the language. We shouldn't work around the work around by jumping straight to adding an entirely new sublanguage to C#. |
Side-note: If this is done, it'd be great to have it followed-up in Roslyn by having the RegEx highlighting feature recognize this attribute. |
I was thinking of this more like a generalization of string literals. We had a similar discussion with utf8 literals, now we regex we have another potential pattern for the same thing. The other benefit is that you would be able to use it outside of field declarations so it could be used for locals as well. Regex re = /^hello+/;
Console.WriteLine(re.IsMatch(args[0])); |
That should already be supported via [RegexGenerator(/* lang=regex */ @"\b[t]\w+", RegexOptions.IgnoreCase | RegexOptions.InvariantCulture)]
string foo = /* lang=regex */ "^hey joe!$";
// lang=regex
string bar = "^hey joe!$";
CallFoo(/* lang=regex */ "^hey joe!$"); |
namespace System.Text.RegularExpressions
{
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false, Inherited = false)]
public sealed class RegexGeneratorAttribute : Attribute
{
public RegexGeneratorAttribute(string pattern);
public RegexGeneratorAttribute(string pattern, RegexOptions options);
public RegexGeneratorAttribute(string pattern, RegexOptions options, int matchTimeoutMilliseconds);
public string Pattern { get; }
public RegexOptions Options { get; }
public int MatchTimeoutMilliseconds { get; }
}
} |
@stephentoub as you pointed out offline, our current regex cache uses the actual culture name in the key (ie not just Invariant vs. Current as is in the RegexOptions). Either we have to accept that's information baked into the regex at the time it is compiled, or that's a bug we should fix. In the former case, this attribute will need to have a CultureInfo as well (the string form thereof at least) thoughts? |
We need to revisit the expected behavior here around culture all-up, not just with regards to the source generator, in particular does "current culture" (when RegexOptions.CultureInvariant isn't specified) apply at construction time or at match time? Right now, for both the interpreter and RegexOptions.Compiled, it's in a weird middle world where data from the culture that current at the time of ctor is factored in and then culture that's current at the time of match is factored in, and the implementations don't respond well to those being different. From my perspective it's a bug that we factor it in from both. If we choose to only factor it in at match time, the generator will match whatever we choose to do for RegexOptions.Compiled (which will likely have a perf impact). If we choose to only factor it in from ctor time, we may want to do something special for the source generator, since (as we discussed in the API review) it's currently going to bake in the culture current at the time of build. Options there could include just saying "yeah, you get what you get", it could include adding another ctor that takes a culture name, it could include throwing if you don't specify InvariantCulture but did specify either IgnoreCase or a pattern that employs |
Agreed. What would be nice here is some data on what customers rely on. I'm guessing changing to "you get what you get" or "always construction time" would be breaking. We at least should be consistent if we can. |
Any change here is likely to break someone somewhere. We'll need to determine what we believe is a) the "right" behavior we desire in an ideal world, b) the perf impact of that decision, and c) how widespread we expect breaks to be from that change, and make a decision. We shouldn't do that here, though. I'll be opening a handful of issues related to the source generator as follow-ups to address after it's initially in. |
Sure, but it'd be nice if this attribute is recognized so the magic comment isn't needed. |
Background and motivation
Attribute to trigger regex source generation:
#44676
API Proposal
Alternate names:
(Note there's already a System.ComponentModel.DataAnnotations.RegularExpressionAttribute that's very different.)
API Usage
Risks
n/a
The text was updated successfully, but these errors were encountered: