Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Add a read_directory() method to the ruff_db::system::System trait #12289

Merged
merged 9 commits into from
Jul 12, 2024

Conversation

AlexWaygood
Copy link
Member

Summary

Editable installations in Python are (usually1) implemented via static .pth files that are inserted into site-packages by the build backend. If the sole contents of a .pth file in the site-packages directory are an absolute path to a directory on disk, Python will consider that path to be an additional module-resolution search path that will be appended to sys.path on startup.

To support editable installations in the red-knot module resolver, we'll need to similarly search through the top level of site-packages searching for .pth files. In order to achieve this, this PR adds a read_dir() method to the ruff_db::system::System trait.

A rough indication of the functionality we'll need for editable support can be seen in the following function, which is part of the port @charliermarsh did of pyright's module resolver:

fn find_paths_from_pth_files(parent_dir: &Path) -> io::Result<impl Iterator<Item = PathBuf> + '_> {
Ok(parent_dir
.read_dir()?
.flatten()
.filter(|entry| {
// Collect all *.pth files.
let Ok(file_type) = entry.file_type() else {
return false;
};
file_type.is_file() || file_type.is_symlink()
})
.map(|entry| entry.path())
.filter(|path| path.extension() == Some(OsStr::new("pth")))
.filter(|path| {
// Skip all files that are much larger than expected.
let Ok(metadata) = path.metadata() else {
return false;
};
let file_len = metadata.len();
file_len > 0 && file_len < 64 * 1024
})
.filter_map(|path| {
let data = fs::read_to_string(path).ok()?;
for line in data.lines() {
let trimmed_line = line.trim();
if !trimmed_line.is_empty()
&& !trimmed_line.starts_with('#')
&& !trimmed_line.starts_with("import")
{
let pth_path = parent_dir.join(trimmed_line);
if pth_path.is_dir() {
return Some(pth_path);
}
}
}
None
}))
}

Test Plan

cargo test -p ruff_db

Footnotes

  1. Newer versions of setuptools use a more dynamic approach for editable installations, where the .pth file contains Python code which, when executed, dynamically computes the path which should be appended to sys.path. Setuptools is alone in using this approach for editable installations, and no other type checkers support editable installations produced via this method. Setuptools also provides a way of switching to the old approach for editable installations, where the .pth file simply contains a static path. As such, I do not intend to attempt to support editable installations created using this approach.

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Jul 11, 2024
Copy link
Contributor

github-actions bot commented Jul 11, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/chatgpt/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/chatgpt/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, but I'd rather have Micha look at this.

crates/ruff_db/src/system.rs Outdated Show resolved Hide resolved
crates/ruff_db/src/system/memory_fs.rs Outdated Show resolved Hide resolved
crates/ruff_db/src/system/memory_fs.rs Outdated Show resolved Hide resolved
crates/ruff_db/src/system/os.rs Outdated Show resolved Hide resolved
crates/ruff_db/src/system/os.rs Outdated Show resolved Hide resolved
@Hnasar
Copy link
Contributor

Hnasar commented Jul 11, 2024

If the sole contents of a .pth file in the site-packages directory are an absolute path to a directory on disk

Just noting for the future that pth files may contain multiple paths, and the paths need not be absolute.

https://docs.python.org/3/library/site.html

@AlexWaygood AlexWaygood changed the title [red-knot] Add a read_dir() method to the ruff_db::system::System trait [red-knot] Add a read_directory() method to the ruff_db::system::System trait Jul 12, 2024
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat

pub(crate) fn read_directory(
&self,
path: impl AsRef<SystemPath>,
) -> Result<impl Iterator<Item = Result<DirectoryEntry>> + '_> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have named return types (over impl) because callers can then name those types (and e.g. store it in a Struct).

I would suggest introducing a ReadDirectory struct similar to std::fs::read_dir that is a thin wrapper around std::vec::IntoIter (or whatever the type of your wild iter chain below is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value in doing that generally, but here I wonder if an opaque return type might actually be better? If you're e.g. relying on the exact type being returned from this method in a test (for example), you might get into difficulties if you later switch the test to using the OsSystem and find that a different type is returned from that struct's read_directory() method. I think there's some value in saying that the only API guarantee we give here is that some kind of iterator is returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the argument is API compatibility with System than the method should also return a Box<dyn...> (and accept &SystemPath instead of AsRef<SystemPath> as the argument). I think it's rare that we would switch between systems in tests and the change would be minimal. But having a concrete type can be helpful for a system implementation that's based on the memory file system (e.g. WASM).

Anyway, I don't feel strongly (except that we shouldn't change the return type to Box<dyn>)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly either. But I propose we add it when we need it

crates/ruff_db/src/system/os.rs Outdated Show resolved Hide resolved
crates/ruff_db/src/system/path.rs Outdated Show resolved Hide resolved
crates/ruff_db/src/system.rs Show resolved Hide resolved
@AlexWaygood AlexWaygood enabled auto-merge (squash) July 12, 2024 12:27
@AlexWaygood AlexWaygood merged commit 6febd96 into main Jul 12, 2024
34 of 38 checks passed
@AlexWaygood AlexWaygood deleted the readdir branch July 12, 2024 12:31
@MichaReiser
Copy link
Member

I just noticed one technical difference. The path returned on DirEntry bystd::fs::read_dir returns the joined path of the path passed to read_dir and the entry in the directory. That means, if you call read_dir("a"), you get a/bar.py and a/foo.py but not an absolute path. The MemoryFileSystem::read_directory always returns absolute paths.

This might be fine, but it could also be a real foot gun where tests pass because the paths are absolute and later fail in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants