[red-knot] Eagerly validate search paths before constructing `Program`s #12684

AlexWaygood · 2024-08-05T10:46:02Z

Summary

This is a prototype for a way by which we could move settings resolution for search paths out of the module resolver. It may not be the best way; I'm not wedded to it necessarily!

Moving settings resolution for search paths out of the module resolver enables us to validate that these settings are correct before constructing Program instances, which is desirable because Program instances should ideally store validated, normalized settings on them rather than unvalidated, unnormalized settings.

The way this PR works is that in crates/ruff_db/src/program.rs, new SearchPathInner and SearchPath types are added. These are extremely minimal, but in terms of the data they hold they have 1:1 correspondence with the SearchPathInner and SearchPath types in crates/red_knot_module_resolver/path.rs. Having public types in ruff_db that can be cheaply and easily converted to and from the private types in red_knot_module_resolver means that the Program struct can use a SearchPath type in its fields (representing a validated, normalized search path) without depending on anything from the module-resolver crate.

The module resolver, meanwhile, exposes a new public function for creating a Program instance from the raw, unvalidated search-path settings. This function uses internals of the module resolver to attempt to construct validated red_knot_module_resolver::SearchPaths, and then cheaply converts those search paths to ruff_db::SearchPaths in order to construct a Program instance.

I initially experimented with simply moving the SearchPathInner and SearchPath types out of the module resolver crate entirely and into ruff_db. While this is doable, it is quite painful, because validating search paths requires knowledge of the internals of the module resolver. For example: one of the validation steps we need to perform is to check that the file at stdlib/VERSIONS parses as a valid VERSIONS file if the user has given us a custom typeshed directory. Attempting to parse VERSIONS requires knowledge of the structure of the VERSIONS file, which in turn requires knowledge of the ModuleName type, etc. etc.

Guide to reviewing

I would recommend looking at the changes to ruff_db first, then the changes to red_knot_module_resolver, and everything else after that.

Test Plan

cargo test

github-actions · 2024-08-05T10:59:17Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

carljm · 2024-08-05T20:42:26Z

I initially experimented with simply moving the SearchPathInner and SearchPath types out of the module resolver crate entirely and into ruff_db. While this is doable, it is quite painful, because validating search paths requires knowledge of the internals of the module resolver.

It seems to me that a lot of the pain here arises from the fact that Program depends on module resolver, but module resolver must depend on ruff_db crate for the basic Salsa setup. But this seems like a constraint we've artificially placed on ourselves by having Program defined in ruff_db. As far as I can see nothing else in ruff_db depends on Program, so can't we just move Program out to ruff_workspace crate instead? Then Program can simply depend directly on the module resolver, and module resolver can depend on ruff_db, and we eliminate these issues.

AlexWaygood · 2024-08-05T20:44:34Z

As far as I can see nothing else in ruff_db depends on Program, so can't we just move Program out to ruff_workspace crate instead? Then Program can simply depend directly on the module resolver, and module resolver can depend on ruff_db, and we eliminate these issues.

That sounds reasonable to me, but I'll defer to @MichaReiser on this :-)

MichaReiser · 2024-08-06T06:00:54Z

Then Program can simply depend directly on the module resolver, and module resolver can depend on ruff_db, and we eliminate these issues.

I don't think this will work because the module resolver depends on Program to resolve the module resolution settings. I also suspect that some of the semantic model may require access to Program.

I haven't looked at the specific implementation here and it may be worth thinking this through more carefully and re-reading through #12318.

To me, the most natural place for Program is in the semantic model. That leaves the problem that the module resolver needs access to the search path settings and target version. We can either:

Move the module resolver into the semantic model crate (as a sub module) and be done with it
Expose a method on the module-resolver's Db trait to access the search paths settings. This leads to some boilerplate for all higher Db traits but would allow us to keep the two crates separated.

MichaReiser · 2024-08-07T10:18:17Z

I plan to look at this once I start working on validation. Sorry for the delay. Let me know if you want feedback earlier

AlexWaygood · 2024-08-07T10:19:51Z

No worries; I don't see that as urgent. It's a pain that it touches so many files as merge conflicts keep cropping up, but it is what it is 😄

Please take your time!

MichaReiser · 2024-08-08T06:00:29Z

I thought more about this and I'm leaning towards merging red_knot_module_resolver into red_knot_python_semantic because the module resolver is small(ish) and I don't see strong reasons that they have to be separate crates. Having both in a single crate greatly simplifies the problem because we can move Program into the semantic crate and be "done" with it.

@AlexWaygood what do you think of unifying the two crates?

AlexWaygood · 2024-08-08T11:31:47Z

Yeah I think combining the two crates could definitely make sense. I wasn't 100% sure it made sense for them to be separate crates to begin with. In theory it's nice to have a clean separation of concerns between the semantic-model crate and the module resolver but in practice it doesn't really work:

The module resolver is only used by the semantic-model crate, so exactly what the return type of functions like resolve_module() is depends heavily on the implementation details of exactly what information the semantic-model crate needs from the module resolver
There are already certain types, like ModuleName, which are included in (and exposed by) the module-resolver crate, but probably should really belong in the semantic-model crate. ModuleName isn't an implementation detail of the module resolver, it's a type that both the module resolver and the semantic model need to make use of. Arguably it would make more "sense" for it to be in the semantic-model crate, but it needs to be in the module-resolver crate currently betcause the module resolver needs access to it, and the module-resolver crate is lower level.

AlexWaygood added the red-knot Multi-file analysis & type inference label Aug 5, 2024

AlexWaygood requested review from carljm and MichaReiser as code owners August 5, 2024 10:46

AlexWaygood force-pushed the alex/eager-search-validation branch from aa6d0a2 to 7fb0168 Compare August 5, 2024 21:07

AlexWaygood force-pushed the alex/eager-search-validation branch 3 times, most recently from 6c9d976 to 9d483c4 Compare August 6, 2024 16:09

AlexWaygood added 2 commits August 7, 2024 11:07

better docs

4accfa3

[red-knot] Eagerly validate search paths before constructing Programs

c5581f9

AlexWaygood force-pushed the alex/eager-search-validation branch from 9d483c4 to c5581f9 Compare August 7, 2024 10:11

AlexWaygood closed this Aug 8, 2024

AlexWaygood mentioned this pull request Aug 9, 2024

Eagerly validate search paths #12783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Eagerly validate search paths before constructing `Program`s #12684

[red-knot] Eagerly validate search paths before constructing `Program`s #12684

AlexWaygood commented Aug 5, 2024

github-actions bot commented Aug 5, 2024 •

edited

Loading

carljm commented Aug 5, 2024 •

edited

Loading

AlexWaygood commented Aug 5, 2024

MichaReiser commented Aug 6, 2024

MichaReiser commented Aug 7, 2024

AlexWaygood commented Aug 7, 2024

MichaReiser commented Aug 8, 2024

AlexWaygood commented Aug 8, 2024

[red-knot] Eagerly validate search paths before constructing Programs #12684

[red-knot] Eagerly validate search paths before constructing Programs #12684

Conversation

AlexWaygood commented Aug 5, 2024

Summary

Guide to reviewing

Test Plan

github-actions bot commented Aug 5, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

carljm commented Aug 5, 2024 • edited Loading

AlexWaygood commented Aug 5, 2024

MichaReiser commented Aug 6, 2024

MichaReiser commented Aug 7, 2024

AlexWaygood commented Aug 7, 2024

MichaReiser commented Aug 8, 2024

AlexWaygood commented Aug 8, 2024

[red-knot] Eagerly validate search paths before constructing `Program`s #12684

[red-knot] Eagerly validate search paths before constructing `Program`s #12684

github-actions bot commented Aug 5, 2024 •

edited

Loading

`ruff-ecosystem` results

carljm commented Aug 5, 2024 •

edited

Loading