Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation around index restrictions #5029

Merged
merged 4 commits into from
Apr 6, 2022
Merged

Conversation

matteius
Copy link
Member

@matteius matteius commented Apr 3, 2022

The primary goal of this PR is to improve the documentation around use of additional indexes and what it means to be the default index.

Additionally, since we would be declining#5028 due to the logical complexity of supporting that approach and the security concerns--I capture a trivial refactor change from that PR to re-use a util we already have for determining if an index is pypi.

The issue

Fixes #5028
Fixes #5022
Fixes #5021

Additional TODOs:

  • How do we deprecate something in the project?
  • Are there additional options we want to support to make life nicer?
  • Additional documentation that would be useful?

docs/advanced.rst Outdated Show resolved Hide resolved
matteius and others added 2 commits April 4, 2022 21:21
Co-authored-by: Yusuke Nishioka <yusuke.nishioka.0713@gmail.com>
@oz123 oz123 merged commit 99cf729 into main Apr 6, 2022
@oz123 oz123 deleted the issue-5022-documentation branch April 6, 2022 09:16
@mungojam
Copy link
Contributor

mungojam commented Apr 9, 2022

Have I understood correctly from this that there is no way we could have packages restorable from multiple package sources? We are migrating our builds from one server to another and the package sources available on them are mutually exclusive. It would be good if we could still specify multiple allowed sources in the pipfile.lock which is what we used to restore packages.

@matteius
Copy link
Member Author

matteius commented Apr 9, 2022

@mungojam What you are describing is no longer possible due to security concerns. While I understand that what you are describing is to be able to install the already locked package versions/hashes from either source you have available, it is non-trivial because its the similar logic used for locking and installing, index restrictions were kind of an all or nothing choice. Due to the mechanism that allows for searching multiple indexes at once was deemed insecure in pip due to the potential for package confusion attacked--there have been discussions around removing --extra-index-url from pip itself, or at a minimum making it much more restrictive, see: pypa/pip#9715 we have taken the stance to enforce index restricted packages.

What we needed most from a security perspective was to prevent the notion that the package could come from any index and not to care which one during locking, as this can introduce a compromised package into the lock file due the nature of package confusion attacks. Additionally there were bugs in pipenv where it was possible to specify multiple --extra-index-urls and really pipenv would take the first one from the list and search that in addition to the default index, so it was not really possible to search more than 2 indexes in the prior logic.

The way it works now is that unspecified packages get the default index (first source in the Pipfile), and any other package can be specified to use any named index that is required for that package. I appreciate that you are migrating packages from one server to the other, but I think there are plenty of ways to handle that migration outside of relying on pipenv to scan both places. For example, leave all the old package on the old server and keep it the default until all new packages are on the new server, then flip defaults -- this would require the least amount of updating of named indexes. If you needed more targeted granularity while migrating you can adjust the named indexes in the Pipfile or the Pipfile.lock. The takeaway is you would always be aware of which server you expect to be pulling the package from. If the servers are essentially one in the same and you don't mind pulling from either server then I would recommend putting them behind a load balancer.

@mungojam
Copy link
Contributor

mungojam commented Apr 9, 2022

@matteius thank you for giving the time to explain it so well and your thoughts on our dilemma

Our dilemma is not quite that. We are migrating our many projects to different build servers, not migrating our python packages. However one build server has access to pypi, the other does not, and one has access to a particular internal package repo while the other build server will need to use a different one as it cannot access the internal one.

@mungojam
Copy link
Contributor

mungojam commented Apr 9, 2022

Sorry, didn't mean to click send, but I think I've conveyed it anyway 🙂. I suppose we could update the package sources as we came to migrate projects between build servers but the other issue is that some Devs have access to pypi while others don't and can only access our internal one.

Nuget recently introduced a system to solve package name confusion attacks where you can specify which package repos (plural) can be used for particular prefixes. So then we can easily say that all packages starting with our company name must come from our internal sources while anything else can come from anywhere.

Ideally it would be good if pipenv sync command could be less strict and just check hashes.

@matteius
Copy link
Member Author

matteius commented Apr 10, 2022

@mungojam I gave your feedback some consideration, and I think it may be possible to support as a non-default option. Could you take a look at this branch and test out if it meets your use case: https://github.com/pypa/pipenv/pull/5039/files

I should be able to add to the documentation once we settle on what the option should be named and that this is the direction to head in. Just to be clear, in that branch you still have to lock from a reliable source, but for an already locked file it will install by searching from all sources like before if that option is enabled in your Pipfile.

That being said, with all due respect, could you describe more how this is a possible and preferred setup for a development team:

The other issue is that some Devs have access to pypi while others don't and can only access our internal one.

@matteius
Copy link
Member Author

@mungojam Also if you could open an issue report about this issue for documentation, then I can link to that for the PR and news item.

@mungojam
Copy link
Contributor

@mungojam Also if you could open an issue report about this issue for documentation, then I can link to that for the PR and news item.

thanks for this, I'll raise an issue and try to have an experiment later today

@mungojam
Copy link
Contributor

That being said, with all due respect, could you describe more how this is a possible and preferred setup for a development team:

The other issue is that some Devs have access to pypi while others don't and can only access our internal one.

It's a very valid question. In our org. there are a wide range of developers, from those who have not really programmed before but are keen to learn python and make use of our packages as well as external ones to those who are experienced developers who understand the security risks around pypi packages. We've got approval for more experienced developers to bring pypi packages into our internal package repository for others to be able to consume.

It's not a perfect system, but it's been working reasonably for a few years now. We have automation that helps us bring in new or updated packages. In reality, most of us use the internal package repo for syncing. I think the build server requirement is the much bigger one.

I'll raise a ticket now and try and test the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants