Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import from Bower PoC, the Package type and Package Sets #4

Merged
merged 9 commits into from
Apr 25, 2020

Conversation

f-f
Copy link
Member

@f-f f-f commented Mar 16, 2020

This PR shows a PoC of importing all of Bower packages into this repo as sketched in the design document.

Some notes:

  • There are two commits in this PR: the first one adds the basic infrastructure/types, the second contains all the packages
  • We're encoding the package definitions as Dhall files, as it provides some very nice properties:
    • the type is self documenting: we don't need to keep an implementation around that reads the type, nor a spec
    • we can represent things with proper types, e.g. the license is a sum type of all the SPDX license ids
    • for how dependencies are represented, it's a type error to include the same dependency twice (which happens currently with some Bower packages
  • There are some (~50) versions that failed to import, mostly because:
    • they have a malformed Bowerfile (i.e. bower-json refuses to parse them)
    • they include dependencies that are on Bower but are not PureScript packages
  • I decided to strip the purescript- prefix, as it's lots of chars and we don't really need it if we're not on the Bower registry
  • The code that generated all the files is here, ~150 lines
  • I structured the Package type in such a way that it could be used to replace the spago.dhall and maybe to be the universal format that it's needed to make the compiler package aware

cc @hdgarrood

@joneshf
Copy link
Member

joneshf commented Mar 16, 2020

This is so awesome!

I know a non-goal of this project is making a usable frontend. Does that include the format of the data, or is it possible to also store the JSON representation of the data here? I can explain more if you'd like, but don't need to if the answer is "we're only going to store Dhall."

@f-f
Copy link
Member Author

f-f commented Mar 16, 2020

@joneshf what use-case do you have in mind for versioning the JSON too?

We are doing this (i.e. versioning both the Dhall and JSON render) in package-sets - the use-case there is that psc-package wants a JSON file - and I find it annoying because we have to ask contributors to run dhall-to-json to keep the JSON updated anyways. I'd much prefer to free contributors from that hassle and only ask consumers of the registry data (i.e. people making frontends, which is a smaller number) to run dhall-to-json.
If there's demand we can easily host a rendered version somewhere (e.g. on GitHub pages), but I wouldn't like to keep a render on master

@f-f f-f changed the title Import from bower WIP: Import from bower Mar 16, 2020
@hdgarrood
Copy link
Contributor

Nice! Are there really over 10,000 package versions???

Can I request that we keep the registry data separate from everything else, (i.e. the readme, the dhall type definitions)? Perhaps by putting it into a separate branch or something? I was looking for the Dhall type definition for Registry but I haven't managed to find it in the github web UI.

the type is self documenting: we don't need to keep an implementation around that reads the type, nor a spec

I am not totally convinced 😅 I think some prose is probably needed somewhere to help first-time authors understand the meanings of all of the fields. Also, can Dhall implement the package name constraints I mentioned in #3?

@hdgarrood
Copy link
Contributor

Okay, found it. https://github.com/purescript/registry/blob/f582adf2167b6fb6f898a27b3292c646d657c9b1/v1/Package.dhall#L1-L13

Some thoughts:

  • On backend: We aren't storing info about compatible backends anywhere at the moment, are we? Pure PS packages and packages which only depend on the core libraries will generally be able to run on any backend, so having backend as Optional Text probably isn't going to be able to cut it, I think it at least needs to be List Text. But that also raises the questions of which Text values correspond to which backends, and who decides what those are. Personally I think it's probably a bit too early to be thinking about alternate backends.
  • Are sources and devSources necessary? So far, build tools have operated on the assumption that within a published library, the sources are always src/**/*.purs. This assumption is built in to the ecosystem in many places. In particular, spago sources. It's not just build tools which rely on this though, I think basically any build script which calls purs compile or purs repl will too. Allowing published libraries to deviate from this standard (that source files are always in src/**/*.purs) risks significantly complicating lots of tooling. I'd much rather continue to keep this non-configurable for published libraries (of course, apps should be allowed to do whatever they want).
  • What is output intended for? The output directory we tell purs compile to use? Does this need to be in the registry?

@f-f
Copy link
Member Author

f-f commented Apr 6, 2020

Sorry for delay on this, I'm currently working on improvements following the feedback from @hdgarrood, will post updates soon

@f-f
Copy link
Member Author

f-f commented Apr 18, 2020

@hdgarrood thanks for the thorough review!

Since the last activity here we had a pretty good iteration on the design with @reactormonk.
The basic ideas are the same, but many many rough edges and unspecified things were taken care of
and I'm pretty pleased with the elegance and simplicity of the whole structure at this point.

In the meanwhile I also realized that I didn't really manage to communicate why the
design and the use of Dhall made so much sense to me for this usecase, so in this latest
iteration I added a bunch of examples that hopefully better show how all the pieces fit together (see README)

I also removed the vast majority of the package manifests and only kept some of them, for ease of review.
In the beginning I was generating all of them to make sure that my Bower -> Registry conversion made sense, but now that's verified we can leave most of them out and then add them at a later time
once we settle the design.

I shuffled around something in the config based on the feedback, and I basically rewrote the README to clarify lots of aspects of the overall design and inner workings.

Also most importantly this iteration properly brings into the picture Package Sets and how they would fit into the overall design.

I feel like I took care of most of the things you mentioned in the new sections/examples, so I'd like to ask you to take a look at the PR again (mostly the README and the types in v1, and I'd recommend looking at the tree view), and in the rest of the comment I'll lightly get back to some of your points:



Can I request that we keep the registry data separate from everything else, (i.e. the readme, the dhall type definitions)? Perhaps by putting it into a separate branch or something?

I think this request came from being overwhelmed at the amount of generated package manifests, so it should not be a problem anymore. Let me know if it still is, and if so I'd be curious to know why.

Anyways, I'll expand a bit more on this: I was thinking about moving the packages folder under v1 too, but decided otherwise.
The reason is that when we change the Package type (or in general when the hash of Registry.dhall will change) we'll make a v2 folder - we wouldn't have to migrate all the files right away, so I think keeping some of them on old versions of the schema it's probably fine.
I don't have a strong opinion on this and I'm fine with either - there's value in having a packages folder for every version: it's more files, but consumers then have the assurance that all packages down that folder match the corresponding type.

As a note, migrating between versions of different schemas is usually possible in pure Dhall.
Fictional example: let's say in v2 we want to go from a name : Text to name : List Text.

Then we could write a migration function in Dhall:

let v1tov2 = \(pkg : ./v1/Package.dhall) ->  pkg // { name = [ pkg.name ] }

in v1tov2

..and if you'd like to migrate some old definition of a package, then it would just be a matter of applying the function to it:

./v1tov2.dhall ./some_v1_package_definition.dhall

This is nice because consumers can choose which version of the schema they want to work with, and migrate the data according to their needs, while at the same time we don't need to duplicate data here at all.

I am not totally convinced 😅 I think some prose is probably needed somewhere to help first-time authors understand the meanings of all of the fields

Oh yeah sorry, I meant to add documentation to all the types before opening the PR, but it slipped.

You should find the Package definitions and docs for that already in the README,
but otherwise now all Dhall types have documentation.

Also, can Dhall implement the package name constraints I mentioned in #3?

You are aware of the fact that Dhall doesn't allow Text manipulation for now, but that's being discussed as not everything can be done without it (personal feeling: I think we'll get some amount of text manipulation, but it will not be rushed)

However, we can enforce basic checks just in CI: since all packages have to pass
our CI, we can enforce all the checks we want there.
And perhaps most importantly, I think the constraints you mentioned there can (and should) be enforced with a Perl oneliner, or something like that.

However, since this design keeps package names in record labels, package names have to satisfy Dhall's constraints for labels, which is the reason why in #3 I asked if we could forbid the . from package names.

From Dhall's grammar:

simple-label-first-char = ALPHA / "_"
simple-label-next-char = ALPHANUM / "-" / "/" / "_"
simple-label = simple-label-first-char *simple-label-next-char

Some thoughts: ...

Before I get to the specific points, I'll note that I included in there some of the configuration options that you have in Spago config files - documentation for the format here.
The overall goal with this schema is - as mentioned in my first post - to unify the config files between:

  • Spago
  • the Registry
  • (possibly?) the format for the compiler's package-awareness

You can think of this in the same way of .cabal files: they do contain fields that are related to:

  • the registry (e.g. dependency bounds)
  • project config (e.g. compile flags)
  • package-config for the compiler (e.g. the list of source files)

Moreover, note that packages do not need to specify all fields. All of them except name and targets already have defaults. E.g. this is how the latest release of prelude looks like:
https://github.com/purescript/registry/blob/a24544601280611941c720058ea365272ee620d4/packages/prelude/v4.1.1.dhall#L1-L20

so having backend as Optional Text probably isn't going to be able to cut it, I think it at least needs to be List Text

See the Spago docs linked above: the backend key there is just the name of the command used to compile the project. When None then purs is used, when present then the string there is used.

At first I thought about a more structured form to encompass all the use cases, e.g.:

backend : { cmd : Text, compatibile : List Text }

In this way users could specify the default command to compile their projects with (backend.cmd), and a list of compatible backends (backend.compatible, which is what I think you've been trying to get at)

..but then I got inspired by @JordanMartinez in purescript/spago#416 (comment) and added the targets section. In there, I find the backend key to be a good fit.
In this way one could actually target several backends with just multiple targets

What is output intended for? The output directory we tell purs compile to use? Does this need to be in the registry?

Yes. This is another Spago option (that is not implemented yet, but I intend to), so we file this under the "project build" aspect of the Package type

If we move towards a shared schema for registry, build tools, and the compiler, then purs could potentially be able to pick up the output from here too.

This has been moved into a Target as well.

Are sources and devSources necessary? So far, build tools have operated on the assumption that within a published library, the sources are always src/**/*.purs. This assumption is built in to the ecosystem in many places. In particular, spago sources. It's not just build tools which rely on this though, I think basically any build script which calls purs compile or purs repl will too. Allowing published libraries to deviate from this standard (that source files are always in src/**/*.purs) risks significantly complicating lots of tooling. I'd much rather continue to keep this non-configurable for published libraries (of course, apps should be allowed to do whatever they want).

Context for other readers: purescript/spago#288

cc @joneshf

I put sources and devSources in first and foremost because they are Spago config options, and in addition to that I would also like to be able to publish packages from inside monorepos (disclaimer: I got personally annoyed by this limitation, as I wanted to do it and was forced to make more repos and do submodules, etc)

I think the main thing that was preventing us from doing that was the equivalence "1 package = 1 repo", which is going away here since packages will not be fetched from repos anymore, but from tarballs uploaded here.

If the worry is about build systems or users catching up with this, I don't think that's an issue at all:

  • spago sources can totally read package manifests instead of blindly assuming where source is
  • the average call to purs compile or purs repl is very long and I don't think people type that.
    I.e. they script it. Changing a script to call spago sources or look into package manifests rather than hardcoding something is not going to be a big deal IMHO.
    And of course we can enhance spago sources with things like a JSON mode and flags to filter things, etc etc if it makes people's life easier while allowing us to get rid of this limitation

I think a statement like "risks significantly complicating" is fairly strong, so I'd love a detailed example/explanation of what these complications could be in practice?

@f-f
Copy link
Member Author

f-f commented Apr 21, 2020

@hdgarrood I though about this PR a bit and it seems like the range of things we're discussing is getting very broad, so it might be hard to cook a response to the whole thing (e.g. it took me a while and I imagine getting back in detail to all of this might take a while too).

I was wondering about ways to streamline this process (my goal here: avoid getting stuck), and I'd like to propose that if there are no objections of the kind "this design is not viable at all because.." then we just merge to master (as I'm pretty happy myself with this design, and the PR is an extension of the design already existing in master), and then we take all the subthreads to dedicated issues, so we can keep every discussion on topic and easy to get back to instead of one giant thread with everything intertwined.

How does this sound?

@f-f f-f changed the title WIP: Import from bower Import from Bower, the Package type and Package Sets Apr 21, 2020
@hdgarrood
Copy link
Contributor

That sounds good to me. I'll try to take another look as soon as I can and I'll approve if I have no major objections like that.

Copy link
Contributor

@hdgarrood hdgarrood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice, thanks! There's nothing showstopping here so I'll approve merging.

I wasn't aware of the context re wanting to unify the spago config and the registry, but I agree that this makes sense. The issue I'm worried about with sources is basically for setups which don't use Spago; I want to avoid making life difficult for people who are still using other things like psc-package or pulp/bower. Yes, it's probably not too much effort to switch a build script of just a few lines in most cases, especially with spago init, but it's harder if the assumption is built into more complex tooling, such as, say, editor plugins. This assumption is built into purs publish right now too. It becomes significantly more complicated if these build scripts or tooling can't/don't want to assume the presence of Spago on the machine in use, or don't want to add Dhall as a dependency.

I would be happy if we could have the curator require that the sources for the lib target for any published package are just ["src/**/*.purs"], at least to start with; I could see us relaxing that later once most of the ecosystem is using this registry.

v1/Registry.dhall Show resolved Hide resolved
v1/Repo.dhall Show resolved Hide resolved
v1/Package.dhall Show resolved Hide resolved
README.md Show resolved Hide resolved
- ..as well as **overriding the package manifest by trustees**, so that e.g. version bounds
can be updated without having to ask authors to cut a new release.
- **ease of publishing a release** is optimized for and entirely automated.
- package manifests are **declarative**: authors need not to concern themselves with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think package manifests are just declarative by nature, so I'm not sure if this really counts as a feature of this design. Can you give an example of a package manifest which isn't declarative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPM's preinstall/postinstall hooks - they allow package maintainers to decide how their package is installed. I don't think they are a good idea, even though we use them to easily distribute spago and purs

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@f-f
Copy link
Member Author

f-f commented Apr 25, 2020

Thanks for the review @hdgarrood! I'll merge now, and open issues to track all the unresolved threads 😊

@f-f f-f changed the title Import from Bower, the Package type and Package Sets Import from Bower PoC, the Package type and Package Sets Apr 25, 2020
@f-f f-f merged commit e90a200 into master Apr 25, 2020
@f-f f-f deleted the import-from-bower branch April 25, 2020 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants