Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Eagerly validate search paths before constructing Programs #12684

Closed
wants to merge 2 commits into from

Conversation

AlexWaygood
Copy link
Member

Summary

This is a prototype for a way by which we could move settings resolution for search paths out of the module resolver. It may not be the best way; I'm not wedded to it necessarily!

Moving settings resolution for search paths out of the module resolver enables us to validate that these settings are correct before constructing Program instances, which is desirable because Program instances should ideally store validated, normalized settings on them rather than unvalidated, unnormalized settings.

The way this PR works is that in crates/ruff_db/src/program.rs, new SearchPathInner and SearchPath types are added. These are extremely minimal, but in terms of the data they hold they have 1:1 correspondence with the SearchPathInner and SearchPath types in crates/red_knot_module_resolver/path.rs. Having public types in ruff_db that can be cheaply and easily converted to and from the private types in red_knot_module_resolver means that the Program struct can use a SearchPath type in its fields (representing a validated, normalized search path) without depending on anything from the module-resolver crate.

The module resolver, meanwhile, exposes a new public function for creating a Program instance from the raw, unvalidated search-path settings. This function uses internals of the module resolver to attempt to construct validated red_knot_module_resolver::SearchPaths, and then cheaply converts those search paths to ruff_db::SearchPaths in order to construct a Program instance.

I initially experimented with simply moving the SearchPathInner and SearchPath types out of the module resolver crate entirely and into ruff_db. While this is doable, it is quite painful, because validating search paths requires knowledge of the internals of the module resolver. For example: one of the validation steps we need to perform is to check that the file at stdlib/VERSIONS parses as a valid VERSIONS file if the user has given us a custom typeshed directory. Attempting to parse VERSIONS requires knowledge of the structure of the VERSIONS file, which in turn requires knowledge of the ModuleName type, etc. etc.

Guide to reviewing

I would recommend looking at the changes to ruff_db first, then the changes to red_knot_module_resolver, and everything else after that.

Test Plan

cargo test

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Aug 5, 2024
Copy link
Contributor

github-actions bot commented Aug 5, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@carljm
Copy link
Contributor

carljm commented Aug 5, 2024

I initially experimented with simply moving the SearchPathInner and SearchPath types out of the module resolver crate entirely and into ruff_db. While this is doable, it is quite painful, because validating search paths requires knowledge of the internals of the module resolver.

It seems to me that a lot of the pain here arises from the fact that Program depends on module resolver, but module resolver must depend on ruff_db crate for the basic Salsa setup. But this seems like a constraint we've artificially placed on ourselves by having Program defined in ruff_db. As far as I can see nothing else in ruff_db depends on Program, so can't we just move Program out to ruff_workspace crate instead? Then Program can simply depend directly on the module resolver, and module resolver can depend on ruff_db, and we eliminate these issues.

@AlexWaygood
Copy link
Member Author

As far as I can see nothing else in ruff_db depends on Program, so can't we just move Program out to ruff_workspace crate instead? Then Program can simply depend directly on the module resolver, and module resolver can depend on ruff_db, and we eliminate these issues.

That sounds reasonable to me, but I'll defer to @MichaReiser on this :-)

@MichaReiser
Copy link
Member

Then Program can simply depend directly on the module resolver, and module resolver can depend on ruff_db, and we eliminate these issues.

I don't think this will work because the module resolver depends on Program to resolve the module resolution settings. I also suspect that some of the semantic model may require access to Program.

I haven't looked at the specific implementation here and it may be worth thinking this through more carefully and re-reading through #12318.

To me, the most natural place for Program is in the semantic model. That leaves the problem that the module resolver needs access to the search path settings and target version. We can either:

  • Move the module resolver into the semantic model crate (as a sub module) and be done with it
  • Expose a method on the module-resolver's Db trait to access the search paths settings. This leads to some boilerplate for all higher Db traits but would allow us to keep the two crates separated.

@AlexWaygood AlexWaygood force-pushed the alex/eager-search-validation branch 3 times, most recently from 6c9d976 to 9d483c4 Compare August 6, 2024 16:09
@MichaReiser
Copy link
Member

I plan to look at this once I start working on validation. Sorry for the delay. Let me know if you want feedback earlier

@AlexWaygood
Copy link
Member Author

No worries; I don't see that as urgent. It's a pain that it touches so many files as merge conflicts keep cropping up, but it is what it is 😄

Please take your time!

@MichaReiser
Copy link
Member

I thought more about this and I'm leaning towards merging red_knot_module_resolver into red_knot_python_semantic because the module resolver is small(ish) and I don't see strong reasons that they have to be separate crates. Having both in a single crate greatly simplifies the problem because we can move Program into the semantic crate and be "done" with it.

@AlexWaygood what do you think of unifying the two crates?

@AlexWaygood
Copy link
Member Author

Yeah I think combining the two crates could definitely make sense. I wasn't 100% sure it made sense for them to be separate crates to begin with. In theory it's nice to have a clean separation of concerns between the semantic-model crate and the module resolver but in practice it doesn't really work:

  • The module resolver is only used by the semantic-model crate, so exactly what the return type of functions like resolve_module() is depends heavily on the implementation details of exactly what information the semantic-model crate needs from the module resolver
  • There are already certain types, like ModuleName, which are included in (and exposed by) the module-resolver crate, but probably should really belong in the semantic-model crate. ModuleName isn't an implementation detail of the module resolver, it's a type that both the module resolver and the semantic model need to make use of. Arguably it would make more "sense" for it to be in the semantic-model crate, but it needs to be in the module-resolver crate currently betcause the module resolver needs access to it, and the module-resolver crate is lower level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants