Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict the rewriting to specific file types #643

Open
wetneb opened this issue Oct 18, 2023 · 10 comments
Open

Restrict the rewriting to specific file types #643

wetneb opened this issue Oct 18, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@wetneb
Copy link

wetneb commented Oct 18, 2023

What problem are you trying to solve?

When running OpenRewrite on a Maven project, with only Java recipes active, OpenRewrite still tries to parse files of other types. According to #185 this should normally not happen, but somehow it does happen on my side. See my comment there for details (including how to reproduce the issue).

Describe the solution you'd like

In the Maven plugin configuration, I would like to be able to manually restrict the plugin to only process .java files. For instance via an <inclusions> configuration option, which would be the inverse of <exclusions>.

Have you considered any alternatives or workarounds?

I could use the <exclusions> configuration option to blacklist the file types that I see are processed erroneously, but because I do not know an exhaustive list of all the file types that OpenRewrite could potentially parse, I would prefer to be able to directly state that I only want Java files to be processed.

Additional context

Arguably it would also make sense to make sure that #185 is really solved.

Are you interested in contributing this feature to OpenRewrite?

Why not!

@wetneb wetneb added the enhancement New feature or request label Oct 18, 2023
@timtebeek
Copy link
Contributor

Hi! Nice bit or archaeology in finding that #185 there! I wasn't even aware we did that once upon a time. Nowadays we parse projects in full, as the line of what recipes expect and need which files has started to blur, where dependencies are only added if certain types are used, or scanning recipes that can operate across multiple file types. As such I don't immediately see selective parsing make a comeback now, as it's be hard to track what's needed where, and often still result in parsing everything, just with more complicated logic to arrive at the same situation. Does that make sense?

That said, let's see what we can do for you instead; is it safe to assume you want to selectively parse files in an effort to speed things up? Or did you have a different reason motivation that might help to have context about?

@wetneb
Copy link
Author

wetneb commented Oct 18, 2023

Yes the motivation here is to speed up the parsing: running mvn rewrite:run currently takes about 1 min on my machine, so I suspect it will add quite some overhead to our CI jobs if we run it there systematically. Also, I would prefer to keep the build log clear of those irrelevant warnings.

@timtebeek
Copy link
Contributor

I'd already spent some time earlier this year to try to get OpenRefine ingested into the Moderne platform at app.moderne.io; we ingest some 31K (mostly Java) OSS projects there nightly, such that anyone can run recipes against those projects quickly through our platform. Right now ingestion for OpenRefine fails with errors that I'm hoping I'm able to resolve with some additional flags passed into the Maven command. That would unlock near instance recipe runs with the Moderne platform UI and CLI, as the LST model is already built; not sure if that's an option there, but that works well for large projects at least.

@wetneb
Copy link
Author

wetneb commented Oct 18, 2023

Sounds exciting! Do you know if that technology can also help with enforcing a particular style (such as import order) in a CI job? I assume it's difficult, because the LST model for the PR cannot be built ahead of the PR being opened. But perhaps that's not the intended workflow - we should rather accept PRs even if they deviate from the style and regularly add linting commits via the Moderne app?

@timtebeek
Copy link
Contributor

We're working towards such CI workflow checks, but not there yet. A pattern we see some users adopt already is periodic runs to clean up anything that might have been introduced for instance the past couple weeks, and keep the code base clean that way. That also helps pick up any recipe improvements, or even new recipes when running the composite recipes we provide for common static analysis issues for instance, or upgrading to newer versions of Java (continuously).

@timtebeek
Copy link
Contributor

Was finally able to do a manual import to Moderne, for now, and run recipes there:

Not quite sure you'd immediately want to apply those directle, but at least it shows quicker runs with the platform. Feel free to explore and let me know what you think.

Also: with the added context above, and the platform as an alternative, would you then still want to keep this issue open? As indicated we're unlikely to limit the model building and recipe execution now that the lines have blurred. If you see any alternatives let me know.

@wetneb
Copy link
Author

wetneb commented Oct 19, 2023

Thanks for your work on this!

I see the value in a platform like Moderne but personally I prefer to be able to run such a tool locally. So I think it would still make sense (and should intuitively not be too complicated) to have a way to restrict the rewriting done by the Maven plugin to some specific files, if restoring the fix of #185 is out of reach.

@timtebeek
Copy link
Contributor

You might be interested in following along to this PR, as it aims to improve performance in parsing web resources

That said, for OpenRefine it might make most sense to exclude the resources in https://github.com/OpenRefine/OpenRefine/tree/master/main/webapp, for instance by configuring the rewrite-maven-plugin in a separate profile that you can enable and run at will. That ought to speed things up considerably, and prevent parsing and modifying the resources in there.

We've had similar requests for inclusion filters rather than the existing exclusions, but held off on such effort as it complicates the logic of what is and is not included, particularly when options are combined. Plus then there's documenting those options, and supporting them going forward, which as you've seen from #185 is already hard to keep track of.

Given the above I'd lean towards calling this a feature not planned, at least for now. We do hope to resolve the performance issues you're seeing, such that there's less of a need. Until then exclusions are likely the way to go.

@leveretconey
Copy link

Hi, we also need this feature. We are using OpenRewrite to help us upgrade our internal Java applications. During usage, I noticed some performance issues when the project includes large files, which significantly prolongs the listSourceFiles� step. For example, there is an application with a huge JSON configuration file, and our recipe does not modify it. However, we cannot exclude all JSON files because some recipes do modify them. Our Java applications usually have a fixed file directory structure, and we ourselves know which files may need modification. Therefore, we hope to have a whitelist functionality.

@timtebeek
Copy link
Contributor

Hi, we also need this feature. We are using OpenRewrite to help us upgrade our internal Java applications. During usage, I noticed some performance issues when the project includes large files, which significantly prolongs the listSourceFiles� step. For example, there is an application with a huge JSON configuration file, and our recipe does not modify it. However, we cannot exclude all JSON files because some recipes do modify them. Our Java applications usually have a fixed file directory structure, and we ourselves know which files may need modification. Therefore, we hope to have a whitelist functionality.

For that use case I hope you can use exclusions, possibly with wildcards, to not process that large JSON files involved there. We've had suggestions to add inclusions before, but that quite complicates the logic, especially when options are combined, so we've decided against that for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants