Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential naming conflicts with default pystow home directory name. #10

Open
steppi opened this issue May 4, 2021 · 8 comments
Open

Comments

@steppi
Copy link

steppi commented May 4, 2021

Pystow does not check if there is an existing .data directory on the users system and happily commandeers this folder even if it already exists. Since this is a very common and generic name, it is not unlikely that a user may already have a .data directory in their home folder. It is also not unlikely that a user will end up in a situation where they or some software they have installed other than pystow will try to place a .data directory in their home folder. I suggest to change to a less generic name such as
".pystow_data" to avoid potential naming conflicts. I think just having the possibility of changing the default folder name with an environment variable is insufficient because the direct users of pystow are python package developers not python package users. We should seek to minimize any cognitive burden or sources of surprise for end users of python packages that use pystow.

@bgyori
Copy link

bgyori commented May 11, 2021

I agree that if pystow is used as a dependency of some package that someone installs, it might not be clear what it is and what the .data folder is. I wonder if instead of naming it .pystow_data which assumes someone knows what pystow is, naming it simply .pydata would at least convey the fact that "this is a data folder for Python packages".

@cthoyt
Copy link
Owner

cthoyt commented May 11, 2021

I've never used a package that created a .data folder, so I'm curious if you guys or anyone else has an example of another package that's doing this that would create the conflict.

I don't want to make the name python-specific nor pystow-specific because the concept transcends languages. I have actually been planning to write an R port of this that should have a similar interface (and potentially have a higher impact, since R users are terrible with reproducibility).

There are so many application-specific folders littering the home directory now that I'd rather keep this one generic as a reminder that it should supersede the other ones.

@steppi
Copy link
Author

steppi commented May 11, 2021

I'm not aware of any such packages either. My thinking was based on the sheer scale of the number of software and data professionals in the world, as well as hobbyists, and the impossibility of knowing if any are using a .data folder, especially for the case of custom/bespoke workflows that wouldn't be public knowledge. That it is such generic name, and that it seemed like an obvious choice to you, makes it seem at least plausible to me that could also be an obvious choice for someone else.

You bring up a good point though. If you want to create something that is not pystow or language specific, then a somewhat generic name is in order. In this case, I think you need to start thinking about branding. If you want this thing in its language agnostic form to gain mindshare, maybe it should have a pithy name to help brand it. In any case, I've updated adeft to use appdirs to place the models in the platform specific user data location (which I think is a better fit for adeft), so I no longer feel personally responsible here.

@cmungall
Copy link

I agree with having the layout transcend language or a particular implementation, but there are a set of conventions at play here, so I think having having some more specific name is warranted, and I agree with @steppi that it's likely that its not unlikely other applications will choose .data.

@cthoyt
Copy link
Owner

cthoyt commented Oct 22, 2022

Can someone point to a concrete example of another application (any platform/language) that’s using the .data directory in the home folder?

If that’s really an issue, there are several ways to configure where pystow uses its home directory both by specifying it explicitly or by falling back to the xgd standard

@steppi
Copy link
Author

steppi commented Oct 22, 2022

Can someone point to a concrete example of another application (any platform/language) that’s using the .data directory in the home folder?

I’m not aware of any examples. To summarize my thoughts.

  1. .data is a generic name and if you think it’s a natural choice it’s not unlikely someone else will too.
  2. The consequences of a clash could be catastrophic. I misconfigured pystow before due to Confusing folder configuration #11 and it happily deleted all of the content in the clashing directory.
  3. The direct users of pystow are Python package developers. The users of these packages may not even know that pystow exists and shouldn’t have to worry about configuring it.

Even if the frequency of clashes is very small, that a clash could lead to catastrophic results for a user is enough to scare me away from using pystow in one of my own packages. Absence of evidence isn’t necessarily evidence of absence, and that I’m not aware of any applications using a .data directory doesn’t make me feel secure given my complete ignorance of the bespoke workflows that are used in different teams/groups/labs.

That said, I don’t intend to push any further on this and hope I don’t come off as too aggressive.

@matentzn
Copy link

matentzn commented Dec 7, 2022

Whatever the outcome of this discussion, CLIs depending on pystow MUST provide a way to change the location. I have had now cases where using, say, OAK with ODK, where all the processing outside PWD will be lost after the run (because the process, say an ontology query) is running inside the docker container. In fact, its possible that the caller does not have write right outside PWD at all, which needs to be considered.

@sierra-moxon
Copy link

I noticed in the docs, that the location of the data is configurable:

If you want to use an alternate folder name to .data inside the home directory, you can set the PYSTOW_NAME environment variable. For example, if you set PYSTOW_NAME=mydata, then the following code for the pykeen app will create the $HOME/mydata/pykeen/ directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants