-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running DVC from a virtual environment, conda, etc #51
Comments
Currently, at least in the #28 and #26 some existing discussion on this. I've used the current implementation on both virtualenv and conda, and it works as long as the environment folder is @ryanraposo has pointed out some APIs in the existing Python extension for VS Code that, if I understand correctly, we can lean on which would allow us to focus more on running DVC than building out a system to manage the user's Python environment. |
hmm, I thought that you would need env to be activated (PYTHON PATH is set properly, etc) for this to work. I'll try, thanks Roger.
good points! It should probably depend on those extensions/size vs how much code does it take to provide some basic environment support. |
Since current implementation of dvc data pipelines require python scripts, I think it would make sense to list Python as a dependency for dvc extension. This is done in https://code.visualstudio.com/api/references/extension-manifest#extension-packs I don't think we should mock anything extra other than checking if dvc cli is installed and displaying info/warning message with a link to dvc installation docs for the dvc extension to function. That can be easily done by extension simply running |
... as for required Python env. config dvc command might need to run python scripts we should be able to get Python settings from See Python settings doc: https://code.visualstudio.com/docs/python/settings-reference @shcheklein is that what you were looking for? |
https://code.visualstudio.com/docs/python/settings-reference this link looks right to me (at least in the right direction). I'm still not sure how exactly should the workflow/settings look like in our case (e.g. should we take replicate some settings from the Python extension), or should we get is a dependency (I guess it's easier to start with a dependency). Is there a settings/configuration page where I can pick the Python env I'd like to use (including DVC itself which might be installed in multiple places). To clarify there are a few different cases to my mind:
|
Those Python settings are accessible via File -> Preferences - > Settings menu in vscode if you navigate to Extensions -> Python in that view. Regardless of what language DVC & data pipeline config requires, I think it should use default or configured language settings from vscode per user preference. If you think we should have a DVC setting for its path, we can add it to our extension. I just don't know if we need to since DVC gets added to env. path for running it on win and other platforms, and seems to work just fine. |
DVC doesn't require any specific language. DVC is a command line tool (like Git). The only tricky part that it could be installed as a global package (Windows installer that you mention) or into a virtual env (like To be clear- I'm not a VS code expert, so, don't know how the exact solution should look like here. But it should cover all the possible scenarios in the most convenient and predictable way for VS code users. E.g. if it's a Python project, virtualenv is set, VS Code itself might detect it (or Python extension). We should be able to detect if DVC is installed with pip here. |
well, dvc.yaml has python commands. so, I think it's valid to say DVC requires Python. If you later add R or node.js data piplelines support I expect dvc.yaml to contain commands for those interpreters as well. However, as long as we can run |
they are not specific to Python already. Those could bash, R, anything else even today. The whole project can be written in R. It's like Make in this case- it's language agnostic. Project itself defines the environment, requirements, etc.
It might get complicated if DVC is installed in |
I'll see what @rogermparent does for In any case, I want to create a base DVCCommand class we can all use that runs these checks & wires launching dvc in one place. Should be part of #40 We might also consider creating a dialog to install DVC if we detect that it's not installed for the env. config you are trying to run it in. |
@shcheklein regarding: conda, python configs, the user flow, and our options. This might help connect some dots.
Like @RandomFractals mentioned, we can easily retrieve that pythonPath. I'm not sold on a particular approach either, though. EDIT: something worth taking a second to highlight: configurations are often dynamic like this and things like UI toggles, detections, etc are often hard-wired to them; changing the We don't ever need to look down on defining a DVC path in that .vscode/settings.json, for example, because it doesn't mean its the whole story. We can have detection layers on top, and come away with users being able to override all of it. |
@ryanraposo thanks! Let me ask a few question to keep the discussion going and understand it better.
does it happen only if Python extension is installed?
I know about conda shell side (it is similar to any other virtualenv in that sense), but what about VS code? does it detect it same way?
let's also keep in mind that this is a path for the project. It's useful to have access to it, but in general, DVC could be installed outside, e.g. globally - we should be able to detect that.
sounds good to me, and that's what I probably had in mind. The question here in the details - can we start outlining exact logic of that layer? |
@shcheklein Of course! It'll help me check myself on my own assumptions/gaps, too.
Yeah, there won't be any detection of envs/interpreters, instead you'll get 1) a prompt to install it, which is enabled by 2) language detection for Python which is built-in.
Right. So in my overview up there, I left that critical part out. And I should say: thats how VS Code keeps track of envs per project.
So this is related to the part I left out. Right now, as far as I understand, the Python extension looks for:
It's very capable, but yeah, this is all a means to the end of detecting DVC, and it can be installed independent of python/pip (right?)
I'm not able to include it here right this second--but I'll do that. It would be nice to have something to poke holes in, especially with your input! EDIT: I'm pushed some code (#55 on |
I ran head first into this problem when trying to setup the The initial error was:
This error occurs because pandas is not a direct dependency of dvc but is a dependency of the demo project. The error persisted after enabling the python extension (
I confirmed this by adding Sidebar: I then ran into another issue caused by dvc's Dulwich dependency (jelmer/dulwich#793) and contributed a fix which is now in master but not yet released. The above solution is a very narrow use case within a huge range of possibilities for the project being setup. I investigated hooking into the python extension and the functions that it exposes but they are very limited. The most useful option that I could find is Here is a screen recording of a working prototype: Screen.Recording.2021-01-22.at.10.29.04.am.movAfter re-reading the thread and understanding that the bulk of dvc cli functions are standalone perhaps we could simply split out the ones that can be executed standalone from ones that can't and execute those behind the scenes. This would also mean that we could easily and quickly make all of the functions shown in #40 available in the command palette and have them passed to a terminal, they should work as expected, give the user a feel for what they would see in the terminal if using the CLI and get them familiar with the mapping of "words to commands": I do also understand that the underlying project could be in any language with its own environment which pushes me towards thinking that this only solves part of a much larger problem. If we break the problem down and identify core languages that we want to cater for and prioritise then execute on each we should be ok. Perhaps each language we want to support has a similar extension to Is there a list of core languages that we want to support? Is the initial use case Python projects only? One other option would be to open source / provide a vscode devcontainer per language and install all dependencies globally within the container. That would make a lot of these problems go away but brings in docker (and user knowledge of docker) as a dependency. I do have experience with this but not sure how big appetite / adoption of such an involved solution would be in your wider community / target audience. Keen to know everyone's thoughts on this. We would be able to include the extension as part of the Apologies for the long post and thanks for reading. Happy to answer any follow up questions that anyone has. Matt |
Here is a screen recording of a basic miniconda environment being activated throughout our test suite: Screen.Recording.2021-02-05.at.2.51.51.pm.movThis show that the built functionality holds true for both venv and conda environments. @shcheklein would you be happy for me to close this one off now? |
This is perfect. Sure, let's close this. Thanks @mattseddon ! |
We probably should take a look at how Python plugin is implemented, or similar plugins for that sake.
It's important to simplify debug (not need to install DVC from master globally), we'll need it anyway for a good user experience.
@rogermparent @RandomFractals @ryanraposo how do you guys run it now? what would it take to implement it? how should it look like in VSCode?
The text was updated successfully, but these errors were encountered: