Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake lock parsing performance scaling concerns #6626

Closed
nrdxp opened this issue Jun 8, 2022 · 7 comments
Closed

flake lock parsing performance scaling concerns #6626

nrdxp opened this issue Jun 8, 2022 · 7 comments
Labels

Comments

@nrdxp
Copy link
Contributor

nrdxp commented Jun 8, 2022

Describe the bug
I have a flake.lock at work which has grown to 23742 lines. Early on, I tried to avoid this with explicit follows behavior, but because of a tedious UX #5576, coupled with the fact that it doesn't seem possible to follow more than a single level deep (i.e. foo.inputs.bar.inputs.baz.follows doesn't work) I sort of gave up and just let the thing go for the moment.

I started to experience slow evaluations on pretty much everything I called from the flake, and at first I thought maybe there was some IFD somewhere I hadn't noticed, but when I actually went to profile it, I disabled IFD all together and still saw the issue. I began to suspect the lock file when I straced and saw that the majority of the execution time was spent calling brk() on a fully eval cached and prebuilt package, because what else could it be doing but reading more data in from the lock if the package and evaluation are already cached?

I then removed all outputs of my flake, keeping the inputs and the lock file intact and exported only a single value: outputs.foo = "bar" and called nix eval .#foo, and it took just over 4 seconds. This is a cost that is incurred anytime I reference the flake, since this parsing stage isn't cached in any way. This is particularly annoying when coupled with #3913 which makes just entering a devshell particularly annoying and time consuming, even when it is already built.

Expected behavior

Given that we have gone to such great lengths to cache both evaluations and package builds, it would be a shame to have the cli runtime bound solely by the size of the lock file, especially when there is no simple and straightforward way to tell nix something like "make all transitive inputs named nixpkgs follow this one" to mitigate it from growing in the first place.

Possible solutions

I still think auto follows would be nice, but I could also see legitimate reasons not to use the feature. Ideally we'd want to optimize this parser as much as possible. I'm sure there could be away to avoid calling brk() so much. I measured 287 calls on initial analysis.

@nrdxp
Copy link
Contributor Author

nrdxp commented Jun 28, 2022

Just took a brief look at the code today to see what can be done. Was hoping there would be some options and/or tunables (e.g. buffer size) in the current library in use but no such luck. In order to improve parsing speed it looks like the only option would be to switch to a different json library.

According to synthetic benchmarks (huge grain of salt) we could get the time down by about a factor of 9 or so this way. But as I have interpreted it so far, the library currently in use makes the programmers quality of life much simpler, which should never be discounted either.

Another option might be to have some sort of flake lock parse cache, which stores an already parsed lock in a more performance friendly format, and incooperate it into the evaluation cache, but this adds a fairly non-trivial amount of complexity.

So all these things considered, and with my own use-case in mind, I still think the best solution is some sort of auto follows as outlined here #5576 and at least partially implemented in #6550. This doesn't account for large monorepos that have a huge flake.lock by necessity, but that is not really the situation I am in personally. Though I am working in a monorepo, a majority of the lock inptus are just replicas of each other, and I don't forsee any major issues with reducing them to a handful or even a single input in most cases.

With all that said, I may just try to implement a faster json parsing library at some point just to see how difficult it would make things, and to see if I get any serious gains in my real world project. On second thought, I think a better idea would be to read the lockfile into a buffer and simulate a call to fromJSON as if it were a Nix string. If you could manage that, it seems you should be able to add the result to the eval cache reasonably trivially, at least in theory.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/parser-performance-json-vs-nix/20237/4

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/private-flake-inputs/21277/1

@roberth
Copy link
Member

roberth commented Feb 1, 2023

If we were not to copy everything into the local flake lock unnecessarily, I think we'd solve this performance issue right away.

@domenkozar
Copy link
Member

@nrdxp you'll be amazed by #10055

@nrdxp
Copy link
Contributor Author

nrdxp commented Feb 21, 2024

Thanks! I will go ahead and close this for now then

@nrdxp nrdxp closed this as completed Feb 21, 2024
@fzakaria
Copy link
Contributor

fzakaria commented Aug 4, 2024

Give https://github.com/fzakaria/nix-auto-follow a try maybe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants