Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook "variable provider" API #166296

Open
roblourens opened this issue Nov 14, 2022 · 21 comments
Open

Notebook "variable provider" API #166296

roblourens opened this issue Nov 14, 2022 · 21 comments
Assignees
Labels
api-proposal feature-request Request for new features or functionality notebook-variables
Milestone

Comments

@roblourens
Copy link
Member

roblourens commented Nov 14, 2022

To enable notebook kernels to show variables in a builtin variable viewer for #165445, we need an API for variable values to be reported to vscode. There's a lot of similarity between debugger variables and notebook variables, so the API should be compatible with DAP. If debug variables get any exposure in our extension API, it would share types with notebook variables.

Notes

  • The UI for this is still under discussion Explore built-in "variable explorer" notebook functionality #165445
  • The current variables view only shows a single flat list of variables, no scopes, this API supports the same
  • Hover
    • The vscode debug support extracts the hovered expression in two ways: by using the EvaluatableExpressionProvider from an extension, or by using a slightly hacky regex
    • And it evaluates the result in two ways: by calling "evaluate" on the debugger with the expression if the DA supports this, or by searching scopes in the current stackFrame for a variable with a matching name, and following the path in the format a.b.c, which is hacky
    • Notebook support can use the EvaluatableExpressionProvider but calling a method like evaluate is tough because Jupyter can't respond to an evaluate while a cell is running except in a debug session, and we won't start a debug session unless the user is actually debugging
  • Inline Values
    • Have discussed showing variable values in a notebook as inline values, same as the debugger's inline values
    • This works via an extension's InlineValuesProvider (or again, hacky fallback), and again, looking up a variable with that name in the current stackFrame

Possible future work

export interface NotebookController {
	/** Set this to attach a variable provider to this controller. */
	variableProvider?: NotebookVariableProvider;
}

interface VariablesResult {
	namedVariableCount: number;
	indexedVariableCount: number;
}

interface NotebookVariableProvider {
	/** VS Code decides when to call the provider due to execution and variable view visibility. But if other things can change variables (interactive output widgets, background tasks...?) the provider needs to signal us. */
	onDidChangeVariables: Event<void>;

	/**
	 * Here's how this works in DAP:
	 * scopesRequest returns a Scope[].
	 * Each Scope has a variablesReference and named/indexed counts.
	 * variablesRequest is called with the Scope's variablesReference, a start index and a count.
	 */

	/** Returns the global list of variables, with count info for paging. VariablesResult is like the DAP Scope. */
	getVariableCounts(): Promise<VariablesResult>;

	/** When variablesReference is undefined, this is requesting global Variables. When a variableReference is passed, it's requesting child props of another Variable. provideVariables is like the DAP variables request. */
	provideVariables(variablesReference: number | undefined, start: number, count: number): Variable[];
}

// excludes memoryReference and evaluateName from DAP
// need "size" property, should that be part of DAP? Part of the type? In a generic "detail" field?
interface Variable {
	/** The variable's name. */
	name: string;
	/** The variable's value.
		This can be a multi-line text, e.g. for a function the body of a function.
		For structured variables (which do not have a simple value), it is recommended to provide a one-line representation of the structured object. This helps to identify the structured object in the collapsed state when its children are not yet visible.
		An empty string can be used if no value should be shown in the UI.
	*/
	value: string;
	/** The type of the variable's value. Shown in the UI when hovering over the value. */
	type?: string;
	/** Properties of a variable that can be used to determine how to render the variable in the UI. */
	presentationHint?: VariablePresentationHint;
	/** If `variablesReference` is > 0, the variable is structured and its children can be retrieved by calling provideVariables with the variablesReference. */
	variablesReference: number;
	/** The number of named child variables.
		This information is used to present the children in a paged UI and fetch them in chunks.
	*/
	namedVariables?: number;
	/** The number of indexed child variables.
		This information is used to present the children in a paged UI and fetch them in chunks.
	*/
	indexedVariables?: number;
}

// TODO- eliminate values not read by vscode, or include for future compat?
// eliminate values that mention breakpoints
interface VariablePresentationHint {
	// VS Code only supports virtual
	/** The kind of variable. Before introducing additional values, try to use the listed values.
		Values:
		'property': Indicates that the object is a property.
		'method': Indicates that the object is a method.
		'class': Indicates that the object is a class.
		'data': Indicates that the object is data.
		'event': Indicates that the object is an event.
		'baseClass': Indicates that the object is a base class.
		'innerClass': Indicates that the object is an inner class.
		'interface': Indicates that the object is an interface.
		'mostDerivedClass': Indicates that the object is the most derived class.
		'virtual': Indicates that the object is virtual, that means it is a synthetic object introduced by the adapter for rendering purposes, e.g. an index range for large arrays.
		'dataBreakpoint': Deprecated: Indicates that a data breakpoint is registered for the object. The `hasDataBreakpoint` attribute should generally be used instead.
		etc.
	*/
	kind?: 'property' | 'method' | 'class' | 'data' | 'event' | 'baseClass' | 'innerClass' | 'interface' | 'mostDerivedClass' | 'virtual' | 'dataBreakpoint' | string;

	// VS Code only supports readOnly
	/** Set of attributes represented as an array of strings. Before introducing additional values, try to use the listed values.
		Values:
		'static': Indicates that the object is static.
		'constant': Indicates that the object is a constant.
		'readOnly': Indicates that the object is read only.
		'rawString': Indicates that the object is a raw string.
		'hasObjectId': Indicates that the object can have an Object ID created for it.
		'canHaveObjectId': Indicates that the object has an Object ID associated with it.
		'hasSideEffects': Indicates that the evaluation had side effects.
		'hasDataBreakpoint': Indicates that the object has its value tracked by a data breakpoint.
		etc.
	*/
	attributes?: ('static' | 'constant' | 'readOnly' | 'rawString' | 'hasObjectId' | 'canHaveObjectId' | 'hasSideEffects' | 'hasDataBreakpoint' | string)[];

	// VS Code only supports internal
	/** Visibility of variable. Before introducing additional values, try to use the listed values.
		Values: 'public', 'private', 'protected', 'internal', 'final', etc.
	*/
	visibility?: 'public' | 'private' | 'protected' | 'internal' | 'final' | string;

	/** If true, clients can present the variable with a UI that supports a specific gesture to trigger its evaluation.
		This mechanism can be used for properties that require executing code when retrieving their value and where the code execution can be expensive and/or produce side-effects. A typical example are properties based on a getter function.
		Please note that in addition to the `lazy` flag, the variable's `variablesReference` is expected to refer to a variable that will provide the value through another `variable` request.
	*/
	lazy?: boolean;
}
@weinand
Copy link
Contributor

weinand commented Nov 15, 2022

Because I noticed above the Variable and VariablePresentationHint interfaces, here a few comments about DAP and extension API:

We always tried to avoid "polluting" the extension API (i.e. "vscode.d.ts") with DAP types. So we neither import DAP as a whole, nor do we copy specific DAP types into "vscode.d.ts".
If there is really a need for using a DAP type in extension API, then we introduce a new type "DebugXxxx" that functions as a "stand-in" for the DAP type. See these examples. Extension authors can then decide whether they want to use the opaque stand-in-type or whether they want to import DAP and cast the stand-in-type to the DAP type.

Since the Variable provider API must work outside of debug session, the DAP types are not that important.
Because of this I would try to design a Variable provider in more abstract terms: e.g. I would not surface variablesReference but use an accessor method instead. Maybe a Variable provider could be more aligned with the existing TreeItem (or some yet to be designed TableItem)...

@roblourens
Copy link
Member Author

I agree with that, I think that I should not worry about matching DAP types but design something that makes sense in our extension API. I also didn't realize that we have a way to map from the extension API breakpoint to a DAP breakpoint: https://github.com/w9jds/vscode/blob/e31458e5c673dcac1c6b4dcf1d80abda9544d458/src/vscode-dts/vscode.d.ts#L14567 so if there needs to be interop, I would be using a pattern like that.

Notes from the API call

  • Design something similar to TreeDataProvider
  • Experiment with returning an AsyncIterable to support paging
  • Keep the distinction between named/indexed properties, look at CodeActionKind for inspiration
  • Use an enum for presentation hint attributes
  • And keep trying to clarify what the UI will be, so I understand how that might impact API

@roblourens
Copy link
Member Author

Note for the future- microsoft/vscode-jupyter#5626 shows why scopes shouldn't be left out if users will look at this while debugging

@roblourens
Copy link
Member Author

roblourens commented Jan 19, 2023

Here's an updated sketch of a potential API after discussions way back in November. Things to point out

We discussed how you can know how many array items/child props/variables are in the current set, without totally exhausting the AsyncIterable, and we discussed having the iterator return variable counts along with the variable. That is shown below. It works but maybe it's odd that I can't know how many children a Variable has until I pull one thing off the iterator. An alternative is getting rid of VariablesResult and putting the child counts on Variable. Then you need another method to know the count of variables at the root.

We also have this UI for paging large arrays, which is driven by DAP asking for N variables at some index. I'd like to be able to support the same UI in the variables view.

image

The AsyncIterator doesn't let us do random access. An idea I have for this is to ask for an AsyncIterator that starts at a particular index, that is shown below. It turns out... debugpy doesn't actually implement this in DAP- microsoft/debugpy#1179 but I am optimistic that they will.

And another possible concern is that the AsyncIterator could lead to perf issues. To resolve one variable, I have to eval some python code which takes time. If we iterate 10 times, my obvious implementation might eval python code 10 separate times. I guess this is still similar to how resolve methods are called with one item at a time, the difference here again is just the scale. I don't know how bad this actually is for Jupyter, but I suspect that I might end up optimizing it by requesting batches of variables from python and just return one at a time. And if that's best, then it seems simplest to let the API ask for N variables at a time, like DAP. At least I'll probably be working with batches in the internal EH IPC interface.

export interface NotebookController {
	/** Set this to attach a variable provider to this controller. */
	variableProvider?: NotebookVariableProvider;
}

enum VariablesRequestKind {
	Named,
	Indexed
}

interface VariablesResult {
	variable: Variable;
	namedVariableCount: number;
	indexedVariableCount: number;
}

interface NotebookVariableProvider {
	/** VS Code decides when to call the provider due to execution and variable view visibility. But if other things can change variables (interactive output widgets, background tasks...?) the provider needs to signal a change. */
	onDidChangeVariables: Event<void>;

	/** When variablesReference is undefined, this is requesting global Variables. When a variable is passed, it's requesting child props of that Variable. */
	getChildren(variable: Variable | undefined, kind: VariablesRequestKind, start: number): AsyncIterable<VariablesResult>;
}

// This is the minimum, and omits stuff that could go on the variable, that I don't want to focus on now
interface Variable {
	/** The variable's name. */
	name: string;
	/** The variable's value.
		This can be a multi-line text, e.g. for a function the body of a function.
		For structured variables (which do not have a simple value), it is recommended to provide a one-line representation of the structured object. This helps to identify the structured object in the collapsed state when its children are not yet visible.
		An empty string can be used if no value should be shown in the UI.
	*/
	value: string;

	/** The type of the variable's value */
	type?: string;

	/** The variable size, if defined for this type (array length, dimensions of data frame) */
	size?: string;
}

@roblourens
Copy link
Member Author

roblourens commented Jan 19, 2023

For hover and inline values, my notes in the OP still hold. Will add that for hover, it would be better to be able to evaluate an expression rather than trying to look up an expression value in the variables list. I don't know where to fit in a new evaluate method into our controller API. It doesn't fit with the NotebookCellExecution flow. Maybe like

export interface NotebookController {
  evaluate?: (expression: string) => Promise<Variable | undefined>;
}

@roblourens
Copy link
Member Author

roblourens commented Jan 19, 2023

Notes from API call

  • Merge sizes into Variable
  • getChildren name isn't great, maybe getVariables
  • Variable should be a class and the same instance should be passed back into getVariables
  • getChildren could be two methods instead of one with the context enum
  • evaluate is a bit odd, we can do without for now
  • Maybe should be a provider pattern?

@roblourens roblourens modified the milestones: February 2023, March 2023 Feb 21, 2023
@roblourens roblourens modified the milestones: March 2023, April 2023 Mar 21, 2023
@roblourens roblourens modified the milestones: April 2023, May 2023 Apr 26, 2023
@roblourens roblourens modified the milestones: May 2023, June 2023 May 31, 2023
@roblourens roblourens modified the milestones: June 2023, Backlog Jun 28, 2023
@connor4312
Copy link
Member

My initial point of view is that we should try to consider sticking with DAP if possible. Namely, we could have a DAP 'connection' that acts like a debug session, except that there's no user control exposed to pause/set breakpoints/etc. I think this has several good benefits:

  • Notebooks gets all the features from DAP and VS Code debugging, and DAP would also benefit from additional capabilities we might need for notebooks. Contributions into debug, such as the existing hex editor or possible visualizations (Investigate debug visualization extension points #197287), would work
  • Users would get a consistent experience whether they're debugging or running a notebook cell, versus having two different ways to see variables, evaluate expressions, etc.

Having a tidy extension API for this is nice, but aside from that, is there anything we want to do for notebooks that DAP can't do? If so, are those things out of scope for DAP, or things it would benefit from having?

@amunger amunger self-assigned this Nov 7, 2023
@roblourens
Copy link
Member Author

I experimented with exactly this one year ago and built a hacked up DA that delivers kernel variables. I decided it wasn't the right direction, and here are some reasons from my notes

  • It pushes too much effort onto the extension to correctly implement a fake DA. Notebooks are a first-class concept in vscode, so you shouldn't have to implement a fundamental part of them by sort of abusing a different API, rather we should just have a nice API that I implement in a normal way
  • Variables only make sense in DAP when a debug session is paused. We would have to add a weird flag to hide the UI that says it's "paused", or support asking for variables when not paused, which isn't normal
  • A debug session is controlled by the user- they start and stop it. But a kernel variable session is owned by the extension, and will start and stop it as you open notebooks. Just seems like a different model. If we change that, it drastically changes the meaning of a debug session.

A notebook variables API is definitely going to have similarities to DAP, and from a UI perspective, I'm not opposed to something that uses the debug view. We can separate the UI discussion from the API side of it, because you could have vscode call a variables API and show this as if it were a special debug session internally. It's simpler to have vscode control the lifecycle too, rather than the extension figuring out that its notebook controller is active, so it should start a debug session, vscode can decide this and enforce it. And we won't be the only implementor of notebook variables and I think that having to implement a DA in this special way is just a big ask.

@connor4312
Copy link
Member

Yea, I we could 'translate' to DAP under the hood or something like that for the limited purpose of reusing the Variables view and other debug components.

I still wonder if there's a way we can let visualization extensions in on the action, as in my linked issue. With that issue, the hex editor would also move to using the standard visualization API -- but that API necessarily gives a handle to the DebugSession to the extension, which would not exist for a purely API-based notebook experience. It would be nice for, if/when equivalent read/writeMemory calls are available on the notebook API, the extension could plug into that.

@amunger
Copy link
Contributor

amunger commented Nov 8, 2023

After syncing with Connor, we'll plan on using the nice and simple provider API for extensions to implement, and the most likely use that within vscode core to create a partial implementation of a DebugSession that is specific to variables so that we can hopefully reuse the components that work with that class.

@amunger
Copy link
Contributor

amunger commented Nov 9, 2023

@roblourens - I want to make sure I'm not missing something, but with the current implementation, one NotebookVariableProvider will need to provide variables for multiple active kernel sessions.
The Variable provider is set on a NotebookController that an extension registers, but the jupyter extension will use the same registered controller for each notebook that starts a kernel with that controller selected. So, with the current implementation, we'd need to include a notebook URI parameter in the getChildren call.

@roblourens
Copy link
Member Author

Yeah, that is missing, I think taking a NotebookDocument would make sense to match executeHandler

@amunger
Copy link
Contributor

amunger commented Nov 15, 2023

Also, referring to a variable solely by name isn't going to cut it when asking for the children. Names won't be unique since they can just be the child property of an object, with a shared name for every instance of that object (e.g. childProp in this case):

image

We'll likely need an ID in the Variable interface, and the variable provider will need to ensure that IDs can uniquely identify each variable provided.

@roblourens
Copy link
Member Author

But you're only asking for the children of one variable, so those would be separate variables. They just need to be unique under one parent.

@amunger
Copy link
Contributor

amunger commented Nov 15, 2023

OK, I misunderstood the intent there - I wasn't thinking of actually holding references to those objects, but that makes more sense than creating the full variable object when only the name used to look up the variable.

@amunger
Copy link
Contributor

amunger commented Nov 16, 2023

I made a couple changes in the attached PR:

  • provideChildren -> provideVariables, and the variable parameter renamed to parent
  • namedVariableCount and indexedVariableCount -> namedChildrenCount and indexedChildrenCount
  • changed VariableRequestKind to a string union, rather than an enum.
    • The trigger for this was needing to copy the type to expose it in the renderer thread, it was easier to just copy strings, but I'm also just not a big fan of typescript enums. I'd be willing to change it back though if there's a reason to prefer them.
export interface NotebookController {
    /** Set this to attach a variable provider to this controller. */
    variableProvider?: NotebookVariableProvider;
}

type VariablesRequestKind = 'named' | 'indexed';

interface VariablesResult {
    variable: Variable;
    namedChildrenCount: number;
    indexedChildrenCount: number;
}

interface NotebookVariableProvider {
    onDidChangeVariables: Event<void>;

    /** When parent is undefined, this is requesting global Variables. When a variable is passed, it's requesting child props of that Variable. */
    provideVariables(notebook: NotebookDocument, parent: Variable | undefined, kind: VariablesRequestKind, start: number, token: CancellationToken): AsyncIterable<VariablesResult>;
}

interface Variable {
    /** The variable's name. */
    name: string;

    /** The variable's value.
        This can be a multi-line text, e.g. for a function the body of a function.
        For structured variables (which do not have a simple value), it is recommended to provide a one-line representation of the structured object.
        This helps to identify the structured object in the collapsed state when its children are not yet visible.
        An empty string can be used if no value should be shown in the UI.
    */
    value: string;
}

@roblourens
Copy link
Member Author

We will probably want to do an enum just for consistency in extension API. I think they are nice as an extension developer because you can just import and use this value rather than introducing a bunch of strings in your source that can't be minified.

@davidanthoff
Copy link
Contributor

Great to see progress here!

One question: for the Julia extension it is quite key that the variable explorer API is not exclusively tied to notebooks. We also show a variable explorer for REPL instances that are launched from the extension. If we were to end up with a variable explorer implementation that only works for notebook kernels, we would probably have to stick with our custom variable explorer implementation so that we can continue to support that scenario... The details of this are here.

As far as I can see, though, the current API seems to link this super tightly to notebooks? Is that right? Is that something that could still be changed?

@amunger
Copy link
Contributor

amunger commented Jan 22, 2024

Good to know, and thanks for bringing that up again. The python extension is also eager to use the variable view for their REPL as well, so I've been trying to think of how to alter things to fit that in.

We might be able to let extensions associate a variable provider for a particular file as well somehow...

@davidanthoff
Copy link
Contributor

And from my example that I showed in the linked issue, we also at some point might want to add a variable explorer to the processes that run tests.

I still kind of like the UI that we use in the Julia extension, where the top level node in the tree can be all sorts of things: notebooks, REPLs, test processes, or something entirely different. I think in my ideal world the API that VS Code has would support that kind of model.

@amunger
Copy link
Contributor

amunger commented Jan 23, 2024

yes, I think we'll have to have that top level grouping if variables could be provided for multiple components that could all be active.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-proposal feature-request Request for new features or functionality notebook-variables
Projects
None yet
Development

No branches or pull requests

5 participants