Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat(rome_service): introduce a cross-language Workspace abstraction #2593

Merged
merged 4 commits into from
May 24, 2022

Conversation

leops
Copy link
Contributor

@leops leops commented May 18, 2022

Summary

This PR renames the rome_core crate to rome_service and replaces the Features member of the App struct with a new Workspace object. This new object takes on most of the responsibilities previously handled by the LSP crate: managing a set of "open" documents, allowing for these to be queried for diagnostics and code actions, run through the formatter, or any other action the user may want to perform on a "file"

With this change the Workspace is used as the backing implementation for most features in the language server (the LSP crate is now mostly acting as an adapter translating calls from the Language Server Protocol to the Workspace service) and the CLI. Overall this means adding support for additional languages should be relatively straightforward and mostly limited to the rome_service crate with little to no changes in the LSP and CLI consumer crates

Workspace is declared as a trait with a single implementation (WorkspaceServer) as it is designed to eventually work over an optional transport layer (an additional WorkspaceClient implementation may then delegate calls to a remote instance of the workspace server over an IPC channel for instance)

The Workspace interface refers to files using the RomePath struct (that now unconditionally holds an interned FileId along with the textual path) and is designed to abstract the underlying language-specific processing that may be happening for each call. Internally, the Features and Capabilities structs have been expanded into explicit vtables holding a set of (optional) function pointers representing the language-specific implementation of each feature supported by a given language. In order to handle a request like lint or format the workspace implementation determines the language associated with the provided RomePath, then looks up the corresponding Capabilities and delegates to the language-specific implementation

Internally this PR also add a new SendNode type to rome_rowan that root SyntaxNode can be converted to and from. SendNode is a language-agnostic handle to the root (green) node of a syntax tree and implements both Send and Sync, and can thus be sent or shared between threads. This is used to implement the "syntax cache" of the workspace server to allow documents to be parsed on demand by allowing the result of the parser to be stored in a shared and language-independent container.

Test Plan

I haven't written any tests for the WorkspaceServer itself yet, but it is already being indirectly checked through the existing test suite for the CLI. Additionally, I also made an attempt at adding a few basic tests for the language server implementation.

@leops leops temporarily deployed to aws May 18, 2022 13:41 Inactive
@github-actions
Copy link

Parser conformance results on ubuntu-latest

js/262

Test result main count This PR count Difference
Total 45878 45878 0
Passed 44938 44938 0
Failed 940 940 0
Panics 0 0 0
Coverage 97.95% 97.95% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 39 39 0
Passed 36 36 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.31% 92.31% 0.00%

ts/babel

Test result main count This PR count Difference
Total 588 588 0
Passed 519 519 0
Failed 69 69 0
Panics 0 0 0
Coverage 88.27% 88.27% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 16257 16257 0
Passed 12391 12391 0
Failed 3866 3866 0
Panics 0 0 0
Coverage 76.22% 76.22% 0.00%

@github-actions
Copy link

github-actions bot commented May 18, 2022

@leops leops linked an issue May 18, 2022 that may be closed by this pull request
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented May 18, 2022

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: d089ea0
Status: ✅  Deploy successful!
Preview URL: https://b23b71dd.tools-8rn.pages.dev

View logs

@leops leops temporarily deployed to aws May 18, 2022 14:03 Inactive
Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so cool!

About naming. Rome service might be ambiguous considering our plans. I personally would go either with:

  • rome_server but I can see how this isn't good considering that some of the infrastructure might be unrelated to running in a server. However, we could still extract these types in the future.
  • rome_workspace: If it's really about workspaces, but my guess is that the crate contains more than that?

Roslyn calls this component a [compiler server[(https://github.com/dotnet/roslyn/blob/main/docs/compilers/Compiler%20Server.md), which would be another option as crate name.

I'll now jump into the code :)

Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it but it's a lot to take in. I've a few comments around naming that might be worth looking into. It might also be good to give others some time to take a look at this PR and leave comments that are more familiar with the LSP handling than I am.

let mut options = JsFormatOptions::default();
/// Read the formatting options for the command line arguments and inject them
/// into the workspace settings
pub(crate) fn parse_format_options(session: &mut CliSession) -> Result<(), Termination> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should this function be renamed to parse_workspace_settings. For example, I could see a --project option where one can specify a workspace configuration file.

Comment on lines +54 to +56
/// Wrapper for an underlying `rome_service` error
#[error(transparent)]
WorkspaceError(#[from] RomeError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not sure about the naming here. The documentation refers to rome_service but the error itself is called WorkspaceError.

crates/rome_cli/src/traversal.rs Outdated Show resolved Hide resolved
@@ -10,7 +10,7 @@ use std::{fs::File, io, io::Write, ops::Deref, path::PathBuf};
#[derive(Debug, Clone, Eq, Hash, PartialEq)]
pub struct RomePath {
file: PathBuf,
file_id: Option<usize>,
file_id: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to introduce a FileId new type

crates/rome_fs/src/path.rs Outdated Show resolved Hide resolved
green: GreenNode,
}

impl SendNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

use rome_rowan::AstNode;

use crate::{
settings::{FormatSettings, Language, LanguageSettings, LanguagesSettings, SettingsHandle},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Language trait is different from rome_rowan's Langauge trait? I'm a bit concerned that this will be confusing because engineers will be exposed to both and it then is unclear when to use which.

It might help to rename this to SupportedLanguage, FileType (a file kind that the workspace supports).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rome_service::Language trait is actually a supertrait that inherits from rome_rowan::Language and adds some service-specific features (mostly related to resolving settings for that language at the moment hence why it's declared in settings.rs)

type FormatSettings = JsFormatSettings;
type FormatOptions = JsFormatOptions;

fn lookup_settings(languages: &LanguagesSettings) -> &LanguageSettings<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to star at this line for 30 seconds to notice that one struct is LanguageSettingsand the other is justLanguageSetting`.

line_with isn't a language specific option. Are LanguageSettings and LanguageSetting normalized struct where Rome fills in all options even if inherited from another option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concepts of "settings" and "options" differ on a few points, and a significant one is that "language specific settings" doesn't mean "settings that apply only to this language" but "settings that apply to all files of this language". So for instance a workspace may have a global line_width of 80, and a language-specific line_width of 120 for markdown files

Comment on lines +93 to +108
impl<T> From<Parse<T>> for AnyParse
where
T: AstNode,
T::Language: 'static,
{
fn from(parse: Parse<T>) -> Self {
let root = parse.syntax();
let diagnostics = parse.into_diagnostics();

Self {
// SAFETY: the parser should always return a root node
root: root.as_send().unwrap(),
diagnostics,
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be defined outside of the javascript.rs file? It doesn't seem to be specific to javascript. Same for some for the other functions following.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put all these declarations in javascript.rs as I tried to make it the only file that imports rome_js_* crates: here Parse<T> is declared in rome_js_parser for instance, and the following functions call into functions from rome_js_parser, rome_js_syntax or rome_js_formatter for instance (and rome_analyzer that's currently specialized for JS only)

Comment on lines +49 to +53
#[derive(Default)]
pub struct LanguageSettings<L: Language> {
/// Formatter settings for this language
pub format: L::FormatSettings,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of this trait over just using L::FormatSettings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I expect this struct will gets some additional fields, to specify settings for the linter for instance

@MichaReiser
Copy link
Contributor

What are you thoughts if it would be necessary to make any of the operations on the Workspace async and support cancelling operations (e.g. the workspace could cancel all pending lints for a file if a file changes)?

@leops
Copy link
Contributor Author

leops commented May 20, 2022

On the naming issue, I think it would make sense to call the crate rome_workspace (or rome_app as the App struct is currently the main entry point of the crate but I'm not really sure if it shouldn't be merged with the CliSession), and rome_server could be an alternate name for the current LSP crate (since "LSP" refers to the protocol and not the implementation) or some other form of shared daemon binary.

On making the Workspace interface async this is an idea I considered but decided not to got with mainly because it would complicate the initial implementation (async traits aren't well supported, requires adding an async runtime to the CLI) and the backend code wouldn't make use of it at the moment (calls into the formatter and analyzer and synchronous and blocking).
The "scalability" aspect of async Rust doesn't really come into player here since calls from language clients into the workspace are already taking place in a threadpool (rayon on the CLI and tokio in LSP), using futures probably wouldn't give that much improvement in parallelism.
As for the cancellation aspect for analysis this is currently handled implicitly (the resulting diagnostics are pushed to the editor with the version number of the document that was analyzed, if the document was modified in the meantime the editor will simply discard those) but it could be made explicit by discarding the result of the analysis before sending it to the language client if a document change notification or explicit cancellation request was received during analysis.
Obviously this only holds for the current version: if we made the analyzer or formatter async, or implemented calls to a remote workspace server through an async-capable transport, it would certainly be useful to consider making the workspace asynchronous

@xunilrj
Copy link
Contributor

xunilrj commented May 20, 2022

hum...

Workspace is declared as a trait with a single implementation (WorkspaceServer) as it is designed to eventually work over an optional transport layer (an additional WorkspaceClient implementation may then delegate calls to a remote instance of the workspace server over an IPC channel for instance)

On making the Workspace interface async this is an idea I considered but decided not to got with mainly because it would complicate the initial implementation (async traits aren't well supported, requires adding an async runtime to the CLI) and the backend code wouldn't make use of it at the moment (calls into the formatter and analyzer and synchronous and blocking).

Can't we model Workspace as enum?

enum Workspace {
    Local { ... },
    Ssh { ... },
    Websocket { ... }
}

That would allow us to use async now without hacks.

Another possibility is to implement a Workspace façade that just send messages to a channel (https://github.com/zesterer/flume or even Tokio's) and on the other end we create a WorkspaceLocal that is the your current implementation on a async loop.

loop {
    let msg = channel.recv_async().await;
    match msg {
        ...
    }
}

I grant that is a little bit annoying to generate request/reply with channels.

requires adding an async runtime to the CLI

Do you think this is a huge problem? I think we will in the end use an async runtime, and there is 99% chance that will be tokio. :D One good side effect is that we will probably benefit from https://tokio.rs/blog/2021-12-announcing-tokio-console.

and the backend code wouldn't make use of it at the moment (calls into the formatter and analyzer and synchronous and blocking).

I think because we are on a chicken-egg problem. I was discussing with @MichaReiser and we don't have async on lower levels, because the higher levels don't have async.

If the façades start to be async and block when calling lower levels (not ideal, but fine for now), lower levels will soon see the benefit and start to migrate.

crates/rome_fs/src/path.rs Outdated Show resolved Hide resolved
@yassere
Copy link
Contributor

yassere commented May 21, 2022

This certainly doesn’t need to be addressed in this PR, but what’s the long-term plan for RomePath? If an interned FileId is now mandatory, will we eventually be able to eliminate the PathBuf from RomePath entirely?

We would need to change how we handle language capabilities, and it could be the responsibility of whatever calls Workspace::open_file to specify a language. Using the file extension isn’t foolproof anyway, so different consumers of a Workspace could each have their own best effort approach (such as using the languageId in LSP).

In addition to allowing cheap copies, an important aspect of a FileId is that you can’t use it to interact directly with the actual file system. I’m not sure whether or not it will be practical for a Workspace implementation to operate entirely with no knowledge of file paths, but it seems worth attempting to reduce the exposure of real paths as much as possible.

On naming: I do think that the name of the Workspace trait is a little confusing, considering how the term is used in editors and LSP. I’d prefer some variation of server or service, but I don’t have a blocking objection to the current naming.

@leops
Copy link
Contributor Author

leops commented May 23, 2022

Do you think this is a huge problem? I think we will in the end use an async runtime

Absolutely, to be clear none of the reasons I outlined for not using async right now are actual blockers, only minor inconvenience that led me to not include that in this initial implementation, but I also believe we will eventually end up using async code at least for some parts of the toolchain

will we eventually be able to eliminate the PathBuf from RomePath entirely?

This is also my intention, I think in the long term it would certainly make sens to merge FileId and RomePath to have the latter be only a cheaply Copy-able opaque identifier. Injecting a client-provided language / source type when a document is opened is also how I think this could eventually work (each open document could then hold either a LanguageID enum or a &'static Capabilities directly), the main reason I kept the PathBuf around is actually mostly to implement the supports_feature method on the workspace on top of the existing can_format / can_lint code

I’m not sure whether or not it will be practical for a Workspace implementation to operate entirely with no knowledge of file paths

I don't think it would be entirely possible for the Workspace to be completely isolated from the underlying file system either. For instance we will want to reach into the imports of an open document by resolving the "module source" literal to the content of an actual file to verify they are valid, load their typing information to provide auto-completion, or include that code into a compiled bundle.

On naming: I do think that the name of the Workspace trait is a little confusing, considering how the term is used in editors and LSP

This is somewhat intended, the idea is that at least in the context of a language server the state of the Workspace object should mirror the state of the editor (with a certain root directory + a set of open documents), obviously the meaning of "open document" is a little more unclear in the context of the CLI but conceptually it's still operating in a given "workspace"


/// Wrapper for an underlying `rome_service` error
#[error(transparent)]
WorkspaceError(#[from] RomeError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does #[from] do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an attribute provided by thiserror, it automatically implements From<RomeError> for Termination

crates/rome_service/src/workspace.rs Outdated Show resolved Hide resolved
Comment on lines +27 to +94
pub struct OpenFileParams {
pub path: RomePath,
pub content: String,
pub version: i32,
}

pub struct GetSyntaxTreeParams {
pub path: RomePath,
}

pub struct ChangeFileParams {
pub path: RomePath,
pub content: String,
pub version: i32,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version is a bit opaque, and maybe it requires some documentation. Especially when it's used, where 0 is passed, I can't understand what's its purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the LSP this version number is provided by the editor and incremented on each change, then when the language server asynchronously sends diagnostics for a document the version of the document the diagnostics where computed for is sent alongside, so the editor can automatically invalidate previous diagnostics if the document changes.
I carried this concept over to the workspace since I think it will eventually be useful to tag the result of various queries with the version of the corresponding document, especially if multiple processes are interacting concurrently with the workspace, but this isn't entirely implemented yet.

Comment on lines +80 to +134
/// Add a new file to the workspace
fn open_file(&self, params: OpenFileParams) -> Result<(), RomeError>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name and documentation have different meanings. Maybe we should add more description to the documentation?

Comment on lines +83 to +137
// Return a textual, debug representation of the syntax tree for a given document
fn get_syntax_tree(&self, params: GetSyntaxTreeParams) -> Result<String, RomeError>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work only on supported files? (the ones that we are able to parse)
If so, maybe the documentation should mention it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something that I didn't really explain in the documentation, but most of the methods on the Workspace rely on the underlying language implementing the corresponding capability. So for instance get_syntax_tree will return a RomeError::SourceFileNotSupported if the language for this document does not implement the parse and debug_print capabilities

crates/rome_service/src/workspace.rs Outdated Show resolved Hide resolved
crates/rome_service/src/workspace/server.rs Show resolved Hide resolved
crates/rome_service/src/workspace/server.rs Outdated Show resolved Hide resolved
crates/rome_service/src/workspace/server.rs Show resolved Hide resolved
Comment on lines +145 to +166
self.syntax.remove(&params.path);
self.documents.insert(
params.path,
Document {
content: params.content,
version: params.version,
},
);
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Workspace is implemented only once, and here we don't handle any errors (no try operators), would it make sense to change the signature of the function to not return Result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there is no reason for WorkspaceServer to return a Result since this operation cannot fail, but I made most of the methods on Workspace return a result to ensure what's using them has error recovery code in place in case errors need to be introduced in the future.
For instance with a WorkspaceClient connected to a remote WorkspaceServer the connection may fail at any moment and return an error, this is also the reason why every method takes a single params struct as that will make it easier to eventually derive Serialize and Deserialize on those

@ematipico
Copy link
Contributor

This is somewhat intended, the idea is that at least in the context of a language server the state of the Workspace object should mirror the state of the editor (with a certain root directory + a set of open documents), obviously the meaning of "open document" is a little more unclear in the context of the CLI but conceptually it's still operating in a given "workspace"

I see your reasons, although "workspace" is also another way to identify multiple projects opened in the same editor. At least, that's what I always thought (and how I used it). Considering this premise, I find the term misleading, because it makes me think that that data structure is handling multiple projects/libraries. Which is not the case.

Could we change WorkspaceServer in RomeWorkspace? "Server" reminds of an actual server (http requests). Unless, that's the long-term plan.

@leops
Copy link
Contributor Author

leops commented May 23, 2022

"Server" reminds of an actual server (http requests). Unless, that's the long-term plan.

Yes that's the intent, eventually I imagine there could be a shared daemon process running in the background and hosting one instance of the WorkspaceServer for each active project, then the CLI and the Language Server would connect to that server process through a socket or named pipe, and transparently interact with it through a WorkspaceClient type (that would also implement the Workspace trait)

@leops leops temporarily deployed to aws May 23, 2022 12:27 Inactive
Copy link
Contributor

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have doubts about naming, but it should not be a show stopper. The changes make sense and we can revisit them in case there's something we want to change.

@leops leops temporarily deployed to aws May 24, 2022 13:22 Inactive
@leops leops merged commit a32588d into main May 24, 2022
@leops leops deleted the feature/rome-server branch May 24, 2022 13:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-Lang Formatter: Rome Multi-language architecture
5 participants