Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Add support for editable installs to the module resolver #12307

Merged
merged 19 commits into from
Jul 16, 2024

Conversation

AlexWaygood
Copy link
Member

Summary

This PR adds support for editable installations in site-packages to the module resolver. Still a few details to be checked vis-à-vis Python's runtime semantics.

Work towards #11653.

Test Plan

cargo test -p red_knot_module_resolver

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Jul 12, 2024
Copy link
Contributor

github-actions bot commented Jul 12, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@@ -73,6 +73,7 @@ enum ModuleResolutionPathBufInner {
FirstParty(SystemPathBuf),
StandardLibrary(FilePath),
SitePackages(SystemPathBuf),
EditableInstall(SystemPathBuf),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered just using the SitePackages variant for editable installs, since these search paths do come about as a consequence of .pth files in the site-packages directory. However, the search paths for editable installs aren't necessarily children of the site-packages search path (and indeed most often won't be), and I think it would be too easy to assume that all instances of the SitePackages variant have the site-packages directory as their root.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with introducing a new enum variant for pth-added paths, though I'm also not totally convinced by it: I don't know when we'd ever handle them differently, and it requires one more chunk of duplicated code wherever we handle this enum. I don't think the confusion you mention should occur: in general search paths should never be children of other search paths (this is a really bad setup that leads to the same module being importable under two different names), so it doesn't seem to me to be a likely confusion to assume this would be the case.

I'm not sure about the name EditableInstall. pth files are a feature of the Python import system that are used by the editable-install feature of some package managers, but the feature predates that use, and pth files have been (and I'm sure still are) used in many other ways as well (e.g. for the zipped-egg installation format of setuptools). So I think it's potentially a little misleading to imply that every path discovered in a .pth file is an editable install.

I'd suggest PthEntry or something similar instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the name EditableInstall. pth files are a feature of the Python import system that are used by the editable-install feature of some package managers, but the feature predates that use, and pth files have been (and I'm sure still are) used in many other ways as well (e.g. for the zipped-egg installation format of setuptools). So I think it's potentially a little misleading to imply that every path discovered in a .pth file is an editable install.

Hrm, true, but the validation we do elsewhere should ensure that we only add paths that point to directories when we iterate through .pth files to try to find paths that could potentially be editable installs. While it's true that a directoy path in a .pth file could point to something which wasn't intended to be an editable install, the Python runtime will still consider Python modules at that path to be importable, correct (since they'll be added to sys.path by site.py)? So IIUC a directory path in a .pth file is in effect an editable install, even if it... wasn't meant to be.

All of which is to say, from the perspective of the module resolver, this variant should only really be used if we think it's more-likely-than-not that a path represents a directory that has been editably installed, or a path relative to a path representing a directory that has been editably installed. We should have done sufficient validation at this point that it's more than just "an arbitrary line in a .pth file".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with either name but it might be at the time that we add some documentation to each variant because I don't think the linked order still applies (it doesn't mention editable installs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some more docs makes sense but I won't do it just now, since I may still refactor this pretty soon according to the idea Carl suggested on one of my earlier PRs

Copy link
Contributor

@carljm carljm Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly any path found in a .pth file will be added to sys.path, and particularly if that path is a directory, it will result in modules there being importable. I'm not sure if there is any other validation you're referring to that we do.

The logical jump you're making that I don't quite follow is "if we add a path to sys.path that makes things importable, that's an 'editable install'." I don't think that follows. It's just importable Python source code. "Editable install" is a very specific user-facing feature that implies a certain use case. For example, there's no reason the path added by the pth file line couldn't point to some Python code on a read-only filesystem. We should still support that just fine, but it's not very editable! In that case the pth file entry is being used for some other use case that I wouldn't describe as an "editable install."

@MichaReiser
Copy link
Member

MichaReiser commented Jul 12, 2024

Dude, stop working 😂 I thought you needed some rest

@AlexWaygood
Copy link
Member Author

Dude, stop working 😂 I thought you needed some rest

I have an addiction :(

@AlexWaygood AlexWaygood marked this pull request as ready for review July 13, 2024 13:01
Copy link

codspeed-hq bot commented Jul 13, 2024

CodSpeed Performance Report

Merging #12307 will improve performances by 4.81%

Comparing editables (0d64399) with editables (e29d3d4)

Summary

⚡ 1 improvements
✅ 32 untouched benchmarks

Benchmarks breakdown

Benchmark editables editables Change
red_knot_check_file[incremental] 90 µs 85.9 µs +4.81%

Comment on lines 1491 to 1516
#[test]
fn deleting_pth_file_on_which_module_resolution_depends_invalidates_cache() {
const SITE_PACKAGES: &[FileSpec] = &[("_foo.pth", "/x/src")];
const X_DIRECTORY: &[FileSpec] = &[("/x/src/foo.py", "")];

let TestCase {
mut db,
site_packages,
..
} = TestCaseBuilder::new()
.with_site_packages_files(SITE_PACKAGES)
.with_other_files(X_DIRECTORY)
.build();

let foo_module_name = ModuleName::new_static("foo").unwrap();
let foo_module = resolve_module(&db, foo_module_name.clone()).unwrap();
assert_eq!(
Some(foo_module.file()),
system_path_to_file(&db, SystemPathBuf::from("/x/src/foo.py"))
);

let pth_file = site_packages.join("_foo.pth");
db.memory_file_system().remove_file(&pth_file).unwrap();
File::touch_path(&mut db, &pth_file);
assert_eq!(resolve_module(&db, foo_module_name.clone()), None);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing, and I'm not sure why. To read the contents of the pth files, I'm using system_path_to_file in PthFileIterator::next(), and source_text() in PthFile::new(). IIUC, I think that should be sufficient to indicate to Salsa that the module_search_paths() query depends on the existence and contents of the _foo.pth file, and that therefore deleting the _foo.pth file should invalidate the previous query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at the salsa events and this seems correct.

[crates/red_knot_module_resolver/src/resolver.rs:1555:9] db.take_salsa_events().debug(&db) = [
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: WillCheckCancellation,
    },
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: WillCheckCancellation,
    },
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: WillCheckCancellation,
    },
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: DidValidateMemoizedValue {
            database_key: source_text(2),
        },
    },
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: DidValidateMemoizedValue {
            database_key: parse_typeshed_versions(2),
        },
    },
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: WillCheckCancellation,
    },
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: WillExecute {
            database_key: dynamic_module_resolution_paths(0),
        },
    },
    Event {
        runtime_id: RuntimeId {
            counter: 0,
        },
        kind: DidValidateMemoizedValue {
            database_key: resolve_module_query(0),
        },
    },
]

It re-runs the dynamic_module_resolution_paths query because the dynamic module resolution paths changed (you deleted a phf file). It then re-runs the resolve_module_query but evaluates to the same cached result. Not sure why this is the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that adding db.report_untracked_read() to the function body, like you suggested elsehwere, makes the test start passing. But then if I start using db.system().read_to_string() in the query to get the contents of the .pth file rather than source_text() (which you said should be okay if I added the report_untracked_read() call), it starts failing again.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

An unrelated comment for resolver (and some of them might be because how I structured the code). I find it difficult to read the code because I always have to jump up and down. The ModuleResolutionSettings aren't next to the ValidatedSearchPathSettings, instead, the Pth code becomes in between. We shouldn't refactor the code as part of this PR but maybe you can come up with an ordering that brings the code closer together.

crates/red_knot_module_resolver/src/testing.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member Author

An unrelated comment for resolver (and some of them might be because how I structured the code). I find it difficult to read the code because I always have to jump up and down. The ModuleResolutionSettings aren't next to the ValidatedSearchPathSettings, instead, the Pth code becomes in between. We shouldn't refactor the code as part of this PR but maybe you can come up with an ordering that brings the code closer together.

Yeah, I have the same opinion. I'm planning on doing a followup where I move some more code out into a separate submodule just for finding the search paths -- but although this PR makes the problem worse, I didn't want to do it in this PR, as it would have made the diff harder to read and review.

…ation rather than deletion of the `pth` file
AlexWaygood and others added 2 commits July 15, 2024 17:13
Co-authored-by: Carl Meyer <carl@astral.sh>
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks for addressing all feedback.

I think we should handle deduplication in a follow up PR. It's not specifically related to editable installs. It's just that editable installs make the problem worse.

My only ask before merging is to change PthFile to use a String instead of SourceText and to use IndexMap over OrderedSet.

@@ -73,6 +73,7 @@ enum ModuleResolutionPathBufInner {
FirstParty(SystemPathBuf),
StandardLibrary(FilePath),
SitePackages(SystemPathBuf),
EditableInstall(SystemPathBuf),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with either name but it might be at the time that we add some documentation to each variant because I don't think the linked order still applies (it doesn't mention editable installs)

crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member Author

to use IndexMap over OrderedSet.

I already changed the PR so that it uses indexmap::IndexSet internally rather than ordermap::OrderSet. But I think I have a confusingly named type alias in there. I'll change the name of that.

@AlexWaygood AlexWaygood enabled auto-merge (squash) July 16, 2024 18:14
@AlexWaygood AlexWaygood merged commit 9a2dafb into main Jul 16, 2024
19 checks passed
@AlexWaygood AlexWaygood deleted the editables branch July 16, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants