Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Add basic workspace support #12318

Merged
merged 2 commits into from
Jul 17, 2024
Merged

[red-knot] Add basic workspace support #12318

merged 2 commits into from
Jul 17, 2024

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jul 14, 2024

Summary

This PR adds support for checking an entire workspace rather than just individual files. But it does more than that. It lays the foundation for how Red Knot integrates settings long-term.

The CLI changes from red_knot <files> to something more cargo like where running red_knot defaults to checking the entire workspace in the current directory. Right now, any directory is a valid workspace and each workspace always has exactly one package. The long-term goal is to find the enclosing workspace by reading Ruff's configuration (one with a [workspace] section?) and then finding all packages specified in [workspace.members].

New Workspace and Package ingredients

These ingredients represent the project's structure. They aren't used in any queries today, but I foresee the following usages:

  • package.check: Checks all files in the package. We could make this a query to short-circuit the analysis. This should be especially useful in projects with many packages.
  • package.files and package.settings: See below for the details on how we can compute the files and settings lazily.

This PR also introduces WorkspaceMetadata and PackageMetdata struct. These are the equivalent of Workspace and Package but as non-Salsa ingredient. The key motivation for these two is that we need to resolve the target-version and the module-resolution settings to decide on the persistent cache to load. This requires access to the workspace's and package's settings, and that's what these two structs will expose in the future.

Should Workspace be a singleton?

Possibly? I'm not sure. The idea is that a Database has exactly one Workspace. To me it is unclear if it may be necessary to create a second Workspace in response to file changes. E.g. what if the workspace's ruff.toml gets deleted? Maybe we should just create a new Database in that case but that complicates things in the CLI's main loop. My current thinking is to create a new Workspace ingredient in that case (which would fail if it is a singleton) and swap the Program's Workspace. This should also allow us to keep the existing Workspace if loading the new one, for whatever reason, fails.

New Program ingredient

The new Program ingredient (singleton) mainly generalizes the existing module resolution settings and moves them into ruff_db. The motivation is that these settings aren't just important for module resolution. They're fundamental to the entire analysis. Two analyses using different program settings aren't compatible and should be cached (persistent) separately.

@AlexWaygood we have to figure out how to solve our merge conflicts.

I decided to rename the root Database from Program to RootDatabase, mainly because I found it counterintuitive that a program contains a Workspace and to avoid a naming conflict with the new Program ingredient.

Should the Database support multiple open Workspaces?

My initial thinking was that it probably should, but I've concluded that it would complicate persistent caching by a lot.

Ideally, Ruff uses a different persistent cache for different workspaces. Storing multiple Workspaces in a single Database creates the challenge that it becomes necessary to decide for every Ingredient into which persistent cache it should be stored. Loading the persistent cache would require merging the Ingredients from different caches, which seems a headache.

Having a single Database per ProgramSettings also simplifies caching runs using different target-versions individually: ruff check --target-version 3.8 and ruff check --target-version 3.10 should each use their own persistent cache.

For now, I think it's best to have a 1:1 relationship between Database and Workspace. This doesn't mean Red Knot can't support multiple open Workspaces. It just means we must build this functionality around the Database instead of into the Database.

Checing individual files or folders

It remains possible to check a different folder using red_knot—-current-directory path. However, this PR removes the functionality for testing specific files or sub-folders. The challenge with explicit paths is that it isn't sufficient to detect the files in these paths once; instead, the paths must be monitored for newly added or removed files (and follow Ruff's exclude semantics). I'm not sure how to implement this best, but I think this is a CLI-specific problem and should be built on the added open_files feature.

How do settings work?

This is about the formatter or linter settings, not the workspace or package settings. Ruff supports that each directory can have its own linter and formatter settings. That means it is necessary to resolve the settings on a directory level.

The easiest solution is to define a linter_settings(File) query that returns the liner settings for a given file by searching the closest package and asking that package what its settings are. The main downside is that the query is more granular than what Ruff allows (it's a file-level query rather than a directory-level query). A possible solution is to create a new Directory ingredient (interned) and change the query to linter_settings(Directory). While this query now has the correct granularity, it still means we're creating too many ingredients if a project has only a custom setting in exactly two directories.

I think the best solution is to instead create a Settings input ingredient that has tracked linter_settings, formatter_settings, etc. methods. A package would create all the Settings the first time it is asked for a setting file (we can find all the settings while discovering all the package files). Resolving the linter settings for a file then simply becomes:

  • Find the closest package
  • Ask the package for the Settings (can use a router similar to Ruff today)
  • Call settings.linter_settings (salsa query)

Dependent queries would only re-run when:

  • A new package is added or an existing package is removed from a workspace
  • New settings have been added or removed from this package
  • The linter settings of this path have changed

That's pretty good.

Can packages.files and packages.settings be lazy?

In this PR, discovering the files of a package is done eagerly. I think it should be possible to make this lazily but i would prefer to do this in a separate PR. The overall idea is:

  • Introduce a new SourceRoot input ingredient. A source root is a directory, but it's not a 1:1 mapping between SourceRoot and a directory on the disk. A source root has two fields: Its a path (id) and a revision.
  • We create a SourceRoot for the Workspace and every Package
  • The semantic of revision depends on the specific SourceRoot
    • For packages: max:
      • the last modified time of any ignore file
      • the last deletion time of any ignore file
      • the time when the last file was added
    • For the workspace: Same as for the package but don't bump the revision if the file belongs to a package
    • For site packages: The last modified time of any root ph file ([red-knot] Add support for editable installs to the module resolver #12307)

Edit: I think we can simplify this by simple storing an incrementing counter on Package and Workspace rather then introducing a new ingredient.

Follow up work

  • File watching is a bit of a mess right now. I plan to tackle this next
  • We don't do any error handling yet. We should look at this soon (by creating diagnostics).

Test Plan

I ran red_knot --current-directory ../test and verified that the program is re-checked when adding, removing, or deleting files.

@MichaReiser MichaReiser changed the base branch from main to system-walk-directories July 14, 2024 09:47
@MichaReiser MichaReiser force-pushed the workspace-support branch 2 times, most recently from ec29832 to d0b9e16 Compare July 14, 2024 10:05
@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Jul 14, 2024
@MichaReiser MichaReiser changed the title [red-knot] Add workspace support [red-knot] Add basic workspace support Jul 14, 2024
@MichaReiser MichaReiser marked this pull request as ready for review July 14, 2024 13:34
Copy link
Contributor

github-actions bot commented Jul 14, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_outlook.ipynb:13:1:1: Expected an expression

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_outlook.ipynb:13:1:1: Expected an expression

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core abstractions here make a lot of sense to me! This is a great step forward in clarifying how everything will fit together.

crates/red_knot/src/main.rs Outdated Show resolved Hide resolved
crates/red_knot/src/workspace.rs Show resolved Hide resolved
crates/red_knot/src/workspace.rs Show resolved Hide resolved
crates/red_knot/src/workspace/metadata.rs Show resolved Hide resolved
crates/red_knot/src/workspace/metadata.rs Show resolved Hide resolved
crates/ruff_db/src/program.rs Show resolved Hide resolved
Base automatically changed from system-walk-directories to main July 16, 2024 06:40
@MichaReiser MichaReiser force-pushed the workspace-support branch 2 times, most recently from 7562aaf to 146a054 Compare July 17, 2024 09:02
@MichaReiser MichaReiser merged commit 91338ae into main Jul 17, 2024
20 checks passed
@MichaReiser MichaReiser deleted the workspace-support branch July 17, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants