Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start using the Regex source generator in this repo #62105

Closed
danmoseley opened this issue Nov 28, 2021 · 25 comments · Fixed by #66142
Closed

Start using the Regex source generator in this repo #62105

danmoseley opened this issue Nov 28, 2021 · 25 comments · Fixed by #66142
Labels
area-System.Text.RegularExpressions help wanted [up-for-grabs] Good issue for external contributors

Comments

@danmoseley
Copy link
Member

We now have a source generator in .NET 7 SDK builds to generate regex code at compile time. We should start using it in this repo. (Not in the System.Text.RegularExpressions tests, of course - but in tests elsewhere in the repo there are some regexes.)

Up for grabs. If this works out we can open an issue in aspnetcore to do the same.

This needs main to start using the .NET 7 SDK first. @dotnet/runtime-infrastructure is that planned soon?

cc @stephentoub

@danmoseley danmoseley added area-System.Text.RegularExpressions help wanted [up-for-grabs] Good issue for external contributors labels Nov 28, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 28, 2021
@ghost
Copy link

ghost commented Nov 28, 2021

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

We now have a source generator in .NET 7 SDK builds to generate regex code at compile time. We should start using it in this repo. (Not in the System.Text.RegularExpressions tests, of course - but in tests elsewhere in the repo there are some regexes.)

Up for grabs. If this works out we can open an issue in aspnetcore to do the same.

This needs main to start using the .NET 7 SDK first. @dotnet/runtime-infrastructure is that planned soon?

cc @stephentoub

Author: danmoseley
Assignees: -
Labels:

area-System.Text.RegularExpressions, up-for-grabs

Milestone: -

@Clockwork-Muse
Copy link
Contributor

It's been a while since I've done anything. I suppose I can take this on (will keep me out of trouble).

@am11
Copy link
Member

am11 commented Nov 29, 2021

The naming difference in [RegexGenerator] and [GeneratedDllImport] (both corresponding to upcoming .NET 7 generators) might look a bit strange when working in common configurations like the ones in eng/generators.targets. It was mentioned under Alternative names in #58880. It would be nice to align their names, but not sure if it is too late to propose name change; GeneratedDllImport to match regex one (or vice versa) before .NET 7 ships.

@stephentoub
Copy link
Member

not sure if it is too late to propose name change; GeneratedDllImport to match regex one (or vice versa) before .NET 7 ships.

It's not too late. I agree we should be consistent between them. Can you please open a separate issue?

(Personally, I prefer RegexGeneratorAttribute / DllImportGeneratorAttribute, as it's the thing causing the generation to happen rather than being the result of the generation process.)

@danmoseley
Copy link
Member Author

danmoseley commented Nov 29, 2021

This needs main to start using the .NET 7 SDK first. @dotnet/runtime-infrastructure is that planned soon?

Seems that has now happened. #61472

Edit: excuse me, that's the GA SDK. We will need a newer one. @Clockwork-Muse I guess you will have to use a newer SDK locally, then before we can merge we will need main to be using a .NET 7 SDK. I suspect they would normally not do that until Preview 1 (Feb) but perhaps it can happen earlier.

@jkoritzinsky
Copy link
Member

Why don't we use the live generator instead of the LKG one from the SDK? That way we have a better loop for fixing bugs in the generator and we don't have to update to use a nightly unsigned SDK.

@danmoseley
Copy link
Member Author

If that's technically possible (sounds like build phases all over again?) then I agree that's better.

@jkoritzinsky
Copy link
Member

That's what we're doing with the DllImport source generator. We're using the same mechanisms used to test the generator to use them in the rest of the repo itself.

@danmoseley
Copy link
Member Author

Perfect, then @Clockwork-Muse you should be unblocked. thanks for offering to do this!

We will also need to solve this problem for aspnetcore, or they will, if they want to opt into this in the near term. Let's do our repo first.

@jkoritzinsky
Copy link
Member

If the Regex generator is shipping in the .NET 7 ref pack, then ASP.NET Core should be able to consume it by the same mechanism that they consume the nightly ref packs today.

@Clockwork-Muse
Copy link
Contributor

Confirm that I can use the in-repo generator.

If there are non-test projects that are using regex, should I switch them as well?

@danmoseley
Copy link
Member Author

If there are non-test projects that are using regex, should I switch them as well?

I see no reason why not (there is hardly any usage there though) but we might want to do some perf measurements for some of those. I don't believe we expect them to be any slower but this is new.

@Clockwork-Muse
Copy link
Contributor

Style question: Do we have a preferred naming convention for these? Currently I've been doing something like:

[RegexGenerator(@"^\d{4}\D\d{2}\D\d{2}\D\d{2}\D\d{2}\D\d{2}\D\d{2}\s$")]
private static partial Regex GetFormatterNameSimpleRegex();

@ericstj
Copy link
Member

ericstj commented Nov 30, 2021

cc @joperezr

Why don't we use the live generator instead of the LKG one from the SDK?

Yes. Here's a pointer:

<ProjectReference

If the Regex generator is shipping in the .NET 7 ref pack, then ASP.NET Core should be able to consume it by the same mechanism that they consume the nightly ref packs today.

Correct. That was enabled with #61329

@danmoseley
Copy link
Member Author

Do we have a preferred naming convention for these?

I think you might be helping figure that out! What you have seems reasonable - to me at least. I could imagine if it's used in several places the calling code might wrap it in a property so it's FormatterNameSimpleRegex.Match or whatnot rather than GetformatterNameSimpleRegex().Match.

@joperezr
Copy link
Member

I see no reason why not (there is hardly any usage there though) but we might want to do some perf measurements for some of those. I don't believe we expect them to be any slower but this is new.

For all of the benchmarks we have so far, source generator is much faster than the interpreted engine (which is the default). Only places I'd imagine this might be an issue is when hitting places that the source generator doesn't support yet, like using NonBacktracking engine for instance, but I really doubt any of our tests is doing that.

@stephentoub
Copy link
Member

The source generator should be as fast or faster than RegexOptions.Compiled for everything; if it's not, it's a bug. We should be comfortable switching anything to use the source generator from a throughput perspective.

@Clockwork-Muse
Copy link
Contributor

.... also, because I can be terrible about this, how many PRs do you want? One for everything? There's ~600 files, but most of the changes are relatively minor.

@danmoseley
Copy link
Member Author

Do you want to just put up the smallest possible one first so we see what it looks like?

@Clockwork-Muse
Copy link
Contributor

@danmoseley #62325

Only one regex to replace on a project (although it's a source one, not a test one).

@Clockwork-Muse
Copy link
Contributor

... replacing this feels wrong:

[ConditionalTheory(typeof(PlatformDetection), nameof(PlatformDetection.IsThreadingSupported))]
[MemberData(nameof(FloatingPointValues))]
public void Log_StateAndScopeContainsFloatingPointType_SerializesValue(object value)
{
    // Arrange
    var t = SetUp(
        new ConsoleLoggerOptions { FormatterName = ConsoleFormatterNames.Json },
        simpleOptions: null,
        systemdOptions: null,
        jsonOptions: new JsonConsoleFormatterOptions
        {
            JsonWriterOptions = new JsonWriterOptions() { Indented = false },
            IncludeScopes = true
        }
    );
    var logger = (ILogger)t.Logger;
    var sink = t.Sink;

    // Act
    using (logger.BeginScope("{Value}", value))
    {
        logger.LogInformation("{LogEntryValue}", value);
    }

    // Assert
    string message = sink.Writes[0].Message;
    AssertMessageValue(message, "Value");
    AssertMessageValue(message, "LogEntryValue");

    static void AssertMessageValue(string message, string propertyName)
    {
        var serializedValueMatch = Regex.Match(message, "\"" + propertyName + "\":(.*?),");
        Assert.Equal(2, serializedValueMatch.Groups.Count);
        string jsonValue = serializedValueMatch.Groups[1].Value;
        Assert.True(double.TryParse(jsonValue, NumberStyles.Any, CultureInfo.InvariantCulture, out var floatingPointValue), "The json doesn not contain a floating point value: " + jsonValue);
        Assert.Equal(1.2, floatingPointValue, 2);
    }
}

(It would require two separate regex patterns)
Thoughts? Comments?

@danmoseley
Copy link
Member Author

I agree I'd leave that one.

@kasperk81
Copy link
Contributor

is it because Regex.Match() lighter than source generator here? or because it's test code?
two calls to Regex.Match() vs. two calls to Match() on generator returned object, i'd like if the latter won in both size and perf overall and code remain equally succinct.

@danmoseley
Copy link
Member Author

@kasperk81 it's test code, clarity is more important than performance. Since we hardly at all use regex in our product (in this repo) changing to the source generator is essentially to test it.

@joperezr
Copy link
Member

joperezr commented Dec 7, 2021

Thoughts? Comments?

Agreed with @danmoseley. Source generator is great when you know the pattern already. For patterns that can change based on input not using the source generator is fine, even in this case when we kind of know what the values could be as this could be broken if the test is refactored.

@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Dec 15, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 3, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 4, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.RegularExpressions help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants