feat(rome_service): introduce a cross-language `Workspace` abstraction #2593

leops · 2022-05-18T13:41:43Z

Summary

This PR renames the rome_core crate to rome_service and replaces the Features member of the App struct with a new Workspace object. This new object takes on most of the responsibilities previously handled by the LSP crate: managing a set of "open" documents, allowing for these to be queried for diagnostics and code actions, run through the formatter, or any other action the user may want to perform on a "file"

With this change the Workspace is used as the backing implementation for most features in the language server (the LSP crate is now mostly acting as an adapter translating calls from the Language Server Protocol to the Workspace service) and the CLI. Overall this means adding support for additional languages should be relatively straightforward and mostly limited to the rome_service crate with little to no changes in the LSP and CLI consumer crates

Workspace is declared as a trait with a single implementation (WorkspaceServer) as it is designed to eventually work over an optional transport layer (an additional WorkspaceClient implementation may then delegate calls to a remote instance of the workspace server over an IPC channel for instance)

The Workspace interface refers to files using the RomePath struct (that now unconditionally holds an interned FileId along with the textual path) and is designed to abstract the underlying language-specific processing that may be happening for each call. Internally, the Features and Capabilities structs have been expanded into explicit vtables holding a set of (optional) function pointers representing the language-specific implementation of each feature supported by a given language. In order to handle a request like lint or format the workspace implementation determines the language associated with the provided RomePath, then looks up the corresponding Capabilities and delegates to the language-specific implementation

Internally this PR also add a new SendNode type to rome_rowan that root SyntaxNode can be converted to and from. SendNode is a language-agnostic handle to the root (green) node of a syntax tree and implements both Send and Sync, and can thus be sent or shared between threads. This is used to implement the "syntax cache" of the workspace server to allow documents to be parsed on demand by allowing the result of the parser to be stored in a shared and language-independent container.

Test Plan

I haven't written any tests for the WorkspaceServer itself yet, but it is already being indirectly checked through the existing test suite for the CLI. Additionally, I also made an attempt at adding a few basic tests for the language server implementation.

github-actions · 2022-05-18T13:45:37Z

Parser conformance results on ubuntu-latest

js/262

Test result	`main` count	This PR count	Difference
Total	45878	45878	0
Passed	44938	44938	0
Failed	940	940	0
Panics	0	0	0
Coverage	97.95%	97.95%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	39	39	0
Passed	36	36	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.31%	92.31%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	588	588	0
Passed	519	519	0
Failed	69	69	0
Panics	0	0	0
Coverage	88.27%	88.27%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	16257	16257	0
Passed	12391	12391	0
Failed	3866	3866	0
Panics	0	0	0
Coverage	76.22%	76.22%	0.00%

github-actions · 2022-05-18T13:46:04Z

Playground for commit 8baf304

cloudflare-workers-and-pages · 2022-05-18T14:03:36Z

Deploying with Cloudflare Pages

Latest commit:	`d089ea0`
Status:	✅ Deploy successful!
Preview URL:	https://b23b71dd.tools-8rn.pages.dev

View logs

MichaReiser

That's so cool!

About naming. Rome service might be ambiguous considering our plans. I personally would go either with:

rome_server but I can see how this isn't good considering that some of the infrastructure might be unrelated to running in a server. However, we could still extract these types in the future.
rome_workspace: If it's really about workspaces, but my guess is that the crate contains more than that?

Roslyn calls this component a [compiler server[(https://github.com/dotnet/roslyn/blob/main/docs/compilers/Compiler%20Server.md), which would be another option as crate name.

I'll now jump into the code :)

MichaReiser

I love it but it's a lot to take in. I've a few comments around naming that might be worth looking into. It might also be good to give others some time to take a look at this PR and leave comments that are more familiar with the LSP handling than I am.

MichaReiser · 2022-05-20T07:57:53Z

crates/rome_cli/src/commands/format.rs

-    let mut options = JsFormatOptions::default();
+/// Read the formatting options for the command line arguments and inject them
+/// into the workspace settings
+pub(crate) fn parse_format_options(session: &mut CliSession) -> Result<(), Termination> {


Nit: should this function be renamed to parse_workspace_settings. For example, I could see a --project option where one can specify a workspace configuration file.

MichaReiser · 2022-05-20T07:58:52Z

crates/rome_cli/src/termination.rs

+    /// Wrapper for an underlying `rome_service` error
+    #[error(transparent)]
+    WorkspaceError(#[from] RomeError),


Nit: Not sure about the naming here. The documentation refers to rome_service but the error itself is called WorkspaceError.

crates/rome_cli/src/traversal.rs

MichaReiser · 2022-05-20T09:16:02Z

crates/rome_fs/src/path.rs

@@ -10,7 +10,7 @@ use std::{fs::File, io, io::Write, ops::Deref, path::PathBuf};
 #[derive(Debug, Clone, Eq, Hash, PartialEq)]
 pub struct RomePath {
    file: PathBuf,
-    file_id: Option<usize>,
+    file_id: usize,


Might be good to introduce a FileId new type

crates/rome_fs/src/path.rs

MichaReiser · 2022-05-20T09:22:31Z

crates/rome_rowan/src/syntax/node.rs

+    green: GreenNode,
+}
+
+impl SendNode {


MichaReiser · 2022-05-20T09:24:06Z

crates/rome_service/src/file_handlers/javascript.rs

+use rome_rowan::AstNode;
+
+use crate::{
+    settings::{FormatSettings, Language, LanguageSettings, LanguagesSettings, SettingsHandle},


Is this Language trait is different from rome_rowan's Langauge trait? I'm a bit concerned that this will be confusing because engineers will be exposed to both and it then is unclear when to use which.

It might help to rename this to SupportedLanguage, FileType (a file kind that the workspace supports).

The rome_service::Language trait is actually a supertrait that inherits from rome_rowan::Language and adds some service-specific features (mostly related to resolving settings for that language at the moment hence why it's declared in settings.rs)

MichaReiser · 2022-05-20T09:28:34Z

crates/rome_service/src/file_handlers/javascript.rs

+    type FormatSettings = JsFormatSettings;
+    type FormatOptions = JsFormatOptions;
+
+    fn lookup_settings(languages: &LanguagesSettings) -> &LanguageSettings<Self> {


I had to star at this line for 30 seconds to notice that one struct is LanguageSettingsand the other is justLanguageSetting`.

line_with isn't a language specific option. Are LanguageSettings and LanguageSetting normalized struct where Rome fills in all options even if inherited from another option?

The concepts of "settings" and "options" differ on a few points, and a significant one is that "language specific settings" doesn't mean "settings that apply only to this language" but "settings that apply to all files of this language". So for instance a workspace may have a global line_width of 80, and a language-specific line_width of 120 for markdown files

MichaReiser · 2022-05-20T09:29:52Z

crates/rome_service/src/file_handlers/javascript.rs

+impl<T> From<Parse<T>> for AnyParse
+where
+    T: AstNode,
+    T::Language: 'static,
+{
+    fn from(parse: Parse<T>) -> Self {
+        let root = parse.syntax();
+        let diagnostics = parse.into_diagnostics();
+
+        Self {
+            // SAFETY: the parser should always return a root node
+            root: root.as_send().unwrap(),
+            diagnostics,
+        }
+    }
+}


Should these be defined outside of the javascript.rs file? It doesn't seem to be specific to javascript. Same for some for the other functions following.

I put all these declarations in javascript.rs as I tried to make it the only file that imports rome_js_* crates: here Parse<T> is declared in rome_js_parser for instance, and the following functions call into functions from rome_js_parser, rome_js_syntax or rome_js_formatter for instance (and rome_analyzer that's currently specialized for JS only)

MichaReiser · 2022-05-20T09:32:13Z

crates/rome_service/src/settings.rs

+#[derive(Default)]
+pub struct LanguageSettings<L: Language> {
+    /// Formatter settings for this language
+    pub format: L::FormatSettings,
+}


What's the benefit of this trait over just using L::FormatSettings?

Eventually I expect this struct will gets some additional fields, to specify settings for the linter for instance

MichaReiser · 2022-05-20T09:35:15Z

What are you thoughts if it would be necessary to make any of the operations on the Workspace async and support cancelling operations (e.g. the workspace could cancel all pending lints for a file if a file changes)?

leops · 2022-05-20T12:52:48Z

On the naming issue, I think it would make sense to call the crate rome_workspace (or rome_app as the App struct is currently the main entry point of the crate but I'm not really sure if it shouldn't be merged with the CliSession), and rome_server could be an alternate name for the current LSP crate (since "LSP" refers to the protocol and not the implementation) or some other form of shared daemon binary.

On making the Workspace interface async this is an idea I considered but decided not to got with mainly because it would complicate the initial implementation (async traits aren't well supported, requires adding an async runtime to the CLI) and the backend code wouldn't make use of it at the moment (calls into the formatter and analyzer and synchronous and blocking).
The "scalability" aspect of async Rust doesn't really come into player here since calls from language clients into the workspace are already taking place in a threadpool (rayon on the CLI and tokio in LSP), using futures probably wouldn't give that much improvement in parallelism.
As for the cancellation aspect for analysis this is currently handled implicitly (the resulting diagnostics are pushed to the editor with the version number of the document that was analyzed, if the document was modified in the meantime the editor will simply discard those) but it could be made explicit by discarding the result of the analysis before sending it to the language client if a document change notification or explicit cancellation request was received during analysis.
Obviously this only holds for the current version: if we made the analyzer or formatter async, or implemented calls to a remote workspace server through an async-capable transport, it would certainly be useful to consider making the workspace asynchronous

xunilrj · 2022-05-20T16:57:22Z

hum...

Workspace is declared as a trait with a single implementation (WorkspaceServer) as it is designed to eventually work over an optional transport layer (an additional WorkspaceClient implementation may then delegate calls to a remote instance of the workspace server over an IPC channel for instance)

On making the Workspace interface async this is an idea I considered but decided not to got with mainly because it would complicate the initial implementation (async traits aren't well supported, requires adding an async runtime to the CLI) and the backend code wouldn't make use of it at the moment (calls into the formatter and analyzer and synchronous and blocking).

Can't we model Workspace as enum?

enum Workspace {
    Local { ... },
    Ssh { ... },
    Websocket { ... }
}

That would allow us to use async now without hacks.

Another possibility is to implement a Workspace façade that just send messages to a channel (https://github.com/zesterer/flume or even Tokio's) and on the other end we create a WorkspaceLocal that is the your current implementation on a async loop.

loop {
    let msg = channel.recv_async().await;
    match msg {
        ...
    }
}

I grant that is a little bit annoying to generate request/reply with channels.

requires adding an async runtime to the CLI

Do you think this is a huge problem? I think we will in the end use an async runtime, and there is 99% chance that will be tokio. :D One good side effect is that we will probably benefit from https://tokio.rs/blog/2021-12-announcing-tokio-console.

and the backend code wouldn't make use of it at the moment (calls into the formatter and analyzer and synchronous and blocking).

I think because we are on a chicken-egg problem. I was discussing with @MichaReiser and we don't have async on lower levels, because the higher levels don't have async.

If the façades start to be async and block when calling lower levels (not ideal, but fine for now), lower levels will soon see the benefit and start to migrate.

crates/rome_fs/src/path.rs

yassere · 2022-05-21T18:08:00Z

This certainly doesn’t need to be addressed in this PR, but what’s the long-term plan for RomePath? If an interned FileId is now mandatory, will we eventually be able to eliminate the PathBuf from RomePath entirely?

We would need to change how we handle language capabilities, and it could be the responsibility of whatever calls Workspace::open_file to specify a language. Using the file extension isn’t foolproof anyway, so different consumers of a Workspace could each have their own best effort approach (such as using the languageId in LSP).

In addition to allowing cheap copies, an important aspect of a FileId is that you can’t use it to interact directly with the actual file system. I’m not sure whether or not it will be practical for a Workspace implementation to operate entirely with no knowledge of file paths, but it seems worth attempting to reduce the exposure of real paths as much as possible.

On naming: I do think that the name of the Workspace trait is a little confusing, considering how the term is used in editors and LSP. I’d prefer some variation of server or service, but I don’t have a blocking objection to the current naming.

leops · 2022-05-23T09:24:56Z

Do you think this is a huge problem? I think we will in the end use an async runtime

Absolutely, to be clear none of the reasons I outlined for not using async right now are actual blockers, only minor inconvenience that led me to not include that in this initial implementation, but I also believe we will eventually end up using async code at least for some parts of the toolchain

will we eventually be able to eliminate the PathBuf from RomePath entirely?

This is also my intention, I think in the long term it would certainly make sens to merge FileId and RomePath to have the latter be only a cheaply Copy-able opaque identifier. Injecting a client-provided language / source type when a document is opened is also how I think this could eventually work (each open document could then hold either a LanguageID enum or a &'static Capabilities directly), the main reason I kept the PathBuf around is actually mostly to implement the supports_feature method on the workspace on top of the existing can_format / can_lint code

I’m not sure whether or not it will be practical for a Workspace implementation to operate entirely with no knowledge of file paths

I don't think it would be entirely possible for the Workspace to be completely isolated from the underlying file system either. For instance we will want to reach into the imports of an open document by resolving the "module source" literal to the content of an actual file to verify they are valid, load their typing information to provide auto-completion, or include that code into a compiled bundle.

On naming: I do think that the name of the Workspace trait is a little confusing, considering how the term is used in editors and LSP

This is somewhat intended, the idea is that at least in the context of a language server the state of the Workspace object should mirror the state of the editor (with a certain root directory + a set of open documents), obviously the meaning of "open document" is a little more unclear in the context of the CLI but conceptually it's still operating in a given "workspace"

ematipico · 2022-05-23T09:02:32Z

crates/rome_cli/src/termination.rs

+
+    /// Wrapper for an underlying `rome_service` error
+    #[error(transparent)]
+    WorkspaceError(#[from] RomeError),


What does #[from] do?

This is an attribute provided by thiserror, it automatically implements From<RomeError> for Termination