Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel: allow LFS rules to use cached downloads without internet #16522

Merged
merged 2 commits into from
May 21, 2024

Conversation

redsun82
Copy link
Contributor

@redsun82 redsun82 commented May 17, 2024

If the cache is prefilled, LFS rules were still trying to query for LFS urls.

Now the strategy is to first try to fetch the files from the repository cache (which is possible by providing an empty url list and allow_fail to repository_ctx.download), and only run the LFS protocol if that fails. Technically this is made possible by enhancing git_lfs_probe.py with a --hash-only flag.

This is also an optimization where no uneeded internet access is done (including the slightly slow SSH call) if the repository cache is warm.

Additionally, -oStrictHostKeyChecking=accept-new is now passed to the SSH call for LFS authentication. This makes a difference in the case the build is executed within a container on an already checked-out repository.

If the cache is prefilled, LFS rules were still trying to query LFS
urls.

Now the strategy is to first try to fetch the files from the repository
cache (which is possible by providing an empty url list and `allow_fail`
to `repository_ctx.download`), and only run the LFS protocol if that
fails. Technically this is possible by enhancing `git_lfs_probe.py` with
a `--hash-only` flag.

This is also an optimization where no uneeded access is done (including
the slightly slow SSH call) if the repository cache is warm.
@redsun82 redsun82 requested a review from criemen May 17, 2024 15:31
@redsun82 redsun82 requested a review from a team as a code owner May 17, 2024 15:31
Copy link
Collaborator

@criemen criemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about performance, otherwise LGTM!

if extract:
for src in srcs:
repository_ctx.report_progress("extracting %s" % src.basename)
repository_ctx.extract(src.basename, stripPrefix = stripPrefix)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now slower on CI (with an empty repository cache) where we'll get a separate download and extract step, or is download_and_extract performance-wise 1. download 2. extract with no overlap anyways?

Copy link
Contributor Author

@redsun82 redsun82 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't really hard measure it, but it did not seem to make a difference locally.

@redsun82 redsun82 merged commit 9d21e2c into main May 21, 2024
16 checks passed
@redsun82 redsun82 deleted the redsun82/lfs branch May 21, 2024 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants