Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: tarballs.nixos.org is too helpful #46432

Closed
clacke opened this issue Sep 9, 2018 · 9 comments
Closed

Problem: tarballs.nixos.org is too helpful #46432

clacke opened this issue Sep 9, 2018 · 9 comments

Comments

@clacke
Copy link
Contributor

clacke commented Sep 9, 2018

Issue description

When a source URL changes, but the fixed-output hash remains constant, Nix would normally download the URL again, and verify it against the hash. But with tarballs.nixos.org in place, it will just download the file from the cache, and never validate that it corresponds to the real-world resource behind the URL.

Steps to reproduce

  1. Get a file into tarballs.nixos.org.
  2. Create a fetchurl derivation with the same hash, but an URL that doesn't have this content at all.
  3. See how Nix just gets the content from tarballs.nixos.org and never notices that the URL is wrong.

Technical details

In #45952 we saw that the racket version was bumped, but the sha256 of the -minimal override wasn't, which resulted in producing a racket-minimal-7.0 derivation built from 6.12 source.

If it weren't for tarballs.nixos.org providing the file based on hash alone, hydra would have discovered that the resource behind the new URL didn't match the hash and the mistake wouldn't have been merged.

Having a purely content-addressable cache is obviously an advantage, but how can we get that while still detecting when the cache is too helpful?

Should e.g. the Racket package verify which version it built, by querying racket --version as part of the derivation?

Or is the content-adressable cache a mistake, and getting the paths from fixed-output derivations from hydra is good enough?

@srhb
Copy link
Contributor

srhb commented Sep 9, 2018

You seem to be the describing exactly the desired operation of the fixed output hash and the intended purpose of tarballs.nixos.org

If the hash were available locally, we would not even ask tarballs.nixos.org but just conclude that we already have that src, so fetching must be a no-op.

@vcunat
Copy link
Member

vcunat commented Sep 9, 2018

I believe the racket(-minimal) expression should be changed to avoid this; I added a comment at least: d0413d1.

IIRC we've had some discussions around suggestions like including the basename in the hash computation, but anything like this is rather hard to make happen, as compatibility is quite a headache and it would have slight down-sides as well.

I don't think a similar change would happen in close future, and it would belong to https://github.com/nixos/nix (but certainly feel free to continue some discussion in this thread)

@vcunat vcunat closed this as completed Sep 9, 2018
@clacke
Copy link
Contributor Author

clacke commented Sep 9, 2018

That comment will probably improve this specific case -- thanks.

I suspect I am missing some project history here, that there is some specific case or pattern that caused tarballs.* to come into existance. What is that?

I imagine that most derivation builds never go there, as the fetchurl derivation would usually be available on hydra. Am I wrong?

@vcunat
Copy link
Member

vcunat commented Sep 9, 2018

tarballs.nixos.org isn't really the root of the problem here anyway. The fixed-output derivations are (i.e. content-addressed ones). The main motivation is that switching a fetcher's URL or even the protocol shouldn't cause a rebuild/refetch (which would cascade to dependants).

@clacke
Copy link
Contributor Author

clacke commented Sep 10, 2018

Without it #45650 would have had a build failure because https://mirror.racket-lang.org/installers/7.0/racket-minimal-7.0-src.tgz is a different fixed-output derivation from https://mirror.racket-lang.org/installers/6.12/racket-minimal-6.12-src.tgz, and the former does not return content with the hash 0c565jy[...].

What's the compelling reason for t.n.o and why is it better than what hydra already caches?

@clacke
Copy link
Contributor Author

clacke commented Sep 10, 2018

The main motivation is that switching a fetcher's URL or even the protocol shouldn't cause a rebuild/refetch (which would cascade to dependants).

If you update the URL of the fixed-output derivation, the hash of the derivation will be changed regardless of t.n.o and all its dependents will be rebuilt, right? But the URL itself will only be queried until the derivation has run on hydra.

@clacke
Copy link
Contributor Author

clacke commented Sep 10, 2018

It looks like the compelling reason is in bb67280 , that upstreams sometimes change URL structures and that introduces fragility in nixpkgs -- unless the derivation is cached. I guess the issue is that it's currently hard to protect the derivation from getting gc'ed?

I'm just asking the stupid questions here, I know I'm probably missing some important point.

@srhb
Copy link
Contributor

srhb commented Sep 10, 2018

If you update the URL of the fixed-output derivation, the hash of the derivation will be changed regardless of t.n.o and all its dependents will be rebuilt, right?

No, fixed-output-derivations have their hashes based on the name (which is usually simply "source") and the output, not the input.

I guess the issue is that it's currently hard to protect the derivation from getting gc'ed?

In a sense. Generally, we have gcroots for eg. hello, not hello.src.

@clacke
Copy link
Contributor Author

clacke commented Sep 10, 2018

No, fixed-output-derivations have their hashes based on the name (which is usually simply "source") and the output, not the input.

There are certainly source derivations just named source (e.g. fetchFromGitHub derivations without an explicit name), but in the case of e.g. fetchurl, and therefore in the case of e.g. the racket source, the name of the derivation comes from the basename of the URL:

$ nix-instantiate -E 'with import <nixpkgs> {}; fetchurl { url = http://example.com/; sha256="0000000000000000000000000000000000000000000000000000"; }' 2>/dev/null
/nix/store/n9pad0sw3xfycrqzcj1p2kngv3l7kqyw-example.com.drv

we have gcroots for eg. hello, not hello.src.

Yeah, I guess for a small project it's easy to say "just set keep-outputs to true" (which I think would preserve those source outputs, right?), but for nixpkgs that might mean huge disk space needs?

Anyway, I think my point was that t.n.o was unintuitive to me, because I expected fetchurl caching to work like any other fixed-output derivation. I even prepared a patch to make sure racket-minimal.src would change its name when the racket version was bumped, before I realized that the existing implementation already did that and it was something else that threw it off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants