Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove targets and sources and only take files from the src directory #317

Merged
merged 30 commits into from
Feb 23, 2022

Conversation

thomashoneyman
Copy link
Member

@thomashoneyman thomashoneyman commented Feb 1, 2022

Fixes #316, fixes #292, and fixes #164.

This PR removes the Target type altogether as described in #164. It also remove the source key altogether. It ensures in the API pipeline that when we create a tarball for a package we only include the src directory and some always-included files like the LICENSE or README or purs.json file.

@thomashoneyman thomashoneyman requested review from f-f and colinwahl and removed request for f-f February 1, 2022 23:22
@thomashoneyman thomashoneyman changed the title Only include required files and files from target directories Remove targets and only include files from the 'sources' key Feb 2, 2022
ci/src/Registry/API.purs Outdated Show resolved Hide resolved
ci/src/Registry/Scripts/LegacyImport/Bowerfile.purs Outdated Show resolved Hide resolved
@thomashoneyman thomashoneyman changed the title Remove targets and only include files from the 'sources' key Remove targets and sources and only take files from the src directory Feb 6, 2022
@thomashoneyman
Copy link
Member Author

With the patch as it stands it's impossible to include arbitrary files in the root of the repo, while anything goes in the src directory - this is inflexible, so I think we should offer some kind of escape hatch now that we don't have the one from targets.sources anymore

I think that we should choose one of:

  1. Take all files from the package, minus some files we know we don't want, like a .git directory
  2. Take only a set of files we know we want, like the manifest and src directory, and allow the user to specify extra files they want via an optional files key.

In other words, if we want to be permissive about files you can include in your package, then we should strip out only files we know for sure shouldn't be included. If we want to be restrictive, then we should only take the purs.json file and src directory and perhaps some other files as you mentioned, but we should leave an escape hatch open in case we are being too restrictive.

I'm not completely sure which one I prefer. My risk averse self feels like the safe option is to be permissive and only take out files we know for sure we can remove. But I see how removing files keeps package sizes down.

@f-f
Copy link
Member

f-f commented Feb 8, 2022

I'm for option (2): we can be restrictive if users have an escape hatch. This is also how npm does it, and it seems to work nicely.

Though I still think we should always pick up - even if we don't think we need them - at least the various files that we use in the legacy import (bower.json, spago.dhall, etc) and the ones specifying native dependencies (package.json is one, there might be more)

@f-f
Copy link
Member

f-f commented Feb 8, 2022

I'm for option (2): we can be restrictive if users have an escape hatch

And I think we can cut this corner for now since it's not strictly necessary to move forward: we could add the files key to the schema, and then just throw an exception in the API pipeline if anyone tries to put anything in there. This would allow us to have a look at the usecase at hand, and eventually backtrack on this decision.
(i.e. we can add things to the schema and then take them away later as long as no published Manifest contains them 🙂 )

@thomashoneyman
Copy link
Member Author

thomashoneyman commented Feb 8, 2022

I'm for option (2): we can be restrictive if users have an escape hatch. This is also how npm does it, and it seems to work nicely.

That's not quite how npm does it:
https://docs.npmjs.com/cli/v8/commands/npm-publish#files-included-in-package

Rather, they take all files except some guaranteed exclusions by default, but if you include a files key then they take only some guaranteed inclusions plus the files you provide in the files key. I would be happy to follow suit.

And I think we can cut this corner for now since it's not strictly necessary to move forward: we could add the files key to the schema, and then just throw an exception in the API pipeline if anyone tries to put anything in there. This would allow us to have a look at the usecase at hand, and eventually backtrack on this decision.

I'm OK with this, especially if we add the files key to the schema with a link to a discussion like this one. As we've found out recently, if something's there and I can't remember or figure out why, I'm gonna try and remove it!

I'm especially OK with this because it feels safer to take all files for legacy imports, rather than be restrictive, just in case someone was doing something funky in their setup.

@f-f
Copy link
Member

f-f commented Feb 12, 2022

As we discussed in the call, the way I'd like this to work would be:

  • for legacy packages: take all the things except some obvious things we exclude
  • for new packages: include specific things, plus a list of things specified in the future files field

@thomashoneyman
Copy link
Member Author

thomashoneyman commented Feb 15, 2022

plus a list of things specified in the future files field

And, to be clear, this will be a list of glob patterns, where allowable globs are up to us to define? (We could keep it easy on ourselves at first by deferring to fast-glob, and later decide what specifically we want to allow.)

@thomashoneyman
Copy link
Member Author

@f-f I have implemented the strategy we agreed on in this discussion:

  1. Legacy import packages have some files explicitly ignored but are otherwise packaged up with all files intact
  2. Normal packages have only the src folder and some explicit files copied over, but users can opt-in to additional files via the optional files key

I believe this is up to date with all our suggested changes at this point.

Comment on lines 408 to 409
filterUnsafeGlobs :: FilePath -> Array String -> Aff (Array String)
filterUnsafeGlobs path globs = case Array.uncons globs of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we accept arbitrary globs then we can leak access to the file system, which would allow someone to publish a package that pulls in files from anywhere in the system and puts them in the tarball. We can take two approaches to this:

  1. We can reject a package altogether if we detect it uses an unsafe glob (one that accesses anything outside of the package directory itself)
  2. We can skip any globs that are unsafe, simply ignoring the user's input and taking files using safe globs only

I've taken the second path here. But I would understand if we want to throwWithComment instead if we can determine that the files key contains unsafe globs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use globs at all, as they feel more of a liability than anything else (hard to parse, and to validate the security of).
We should take this list of paths (which can be files or directories), canonicalize them, and only check that they only point to subdirectories and not parent directories (this can be done by checking that the current absolute path is a substring of the canonical path we're checking)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we can also just add a TODO in here and skip implementing the files field for now - we'd leave it in place and just blow up if anyone is trying to use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globs are undeniably useful, though, for example for matching only .purs files in another directory. And if someone is going through the effort to explicitly include files via the files key, I suspect they're going to want more fine-grained control than just "a directory".

It's hard to parse globs (hence why we defer to fast-glob), but is sanitizing them any harder than sanitizing a list of directory paths?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to parse globs (hence why we defer to fast-glob), but is sanitizing them any harder than sanitizing a list of directory paths?

Yes, unless the underlying library offers facilities for that. E.g. I couldn't find anything to canonicalize globs in the fast-glob library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed all code related to the files key and we can punt on it.

@thomashoneyman
Copy link
Member Author

We had another chat in the registry call this morning and agreed to go back to the NPM style as described in my comment above 1. That means we take all files by default, but let you selectively choose files to include via a files key. The reason for this decision is to preserve things like test directories so that folks downstream can still execute your tests if they want to.

Footnotes

  1. https://github.com/purescript/registry/pull/317#issuecomment-1032819187

ci/test/Registry/Scripts/LegacyImport/Stats.purs Outdated Show resolved Hide resolved
ci/src/Registry/API.purs Outdated Show resolved Hide resolved
ci/src/Registry/API.purs Outdated Show resolved Hide resolved
@thomashoneyman thomashoneyman merged commit 8dcf5c7 into master Feb 23, 2022
@thomashoneyman thomashoneyman deleted the trh/target-dirs branch February 23, 2022 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants