Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't try to resolve pinned sha's to versions when updating docker images #6150

Merged

Conversation

deivid-rodriguez
Copy link
Contributor

@deivid-rodriguez deivid-rodriguez commented Nov 16, 2022

This may seem too aggressive, and I'm probably missing something, and I haven't yet run full tests BUT, I think this is incorrect and not necessary.

When a base image is pinned to a specific sha, Dependabot does a very heavy operation of requesting the list of tags from the registry, and then asking for the sha of each of them until one tag that matches the current sha is found.

However, this is not necessary, because Dependabot will end up updating to the latest sha anyways, since it defaults to always keeping the current style/precision.

Also, I believe this is incorrect, because docker tags are mutable, so the registry will only return the latest sha associated to each tag. So this loop is likely to run forever to end up not finding any matches if the current sha present in the Dockerfile has been overwritten.

Finally, this should be a big performance boost because this operation is very very slow.

Fixes #2997.
Fixes #4419.

Just opening a WIP PR to check CI for now.

@deivid-rodriguez deivid-rodriguez changed the title Don't try to resolving pinned sha's to version when updating docker Don't try to resolve pinned sha's to versions when updating docker images Nov 16, 2022
@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/docker-dont-fetch-all-the-things branch 2 times, most recently from 27e9d97 to b03c9a1 Compare December 7, 2022 13:11
@deivid-rodriguez deivid-rodriguez marked this pull request as ready for review December 7, 2022 13:15
@deivid-rodriguez deivid-rodriguez requested a review from a team as a code owner December 7, 2022 13:15
@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/docker-dont-fetch-all-the-things branch from b03c9a1 to 1cd6b45 Compare December 7, 2022 13:17
@Nishnha
Copy link
Member

Nishnha commented Dec 13, 2022

However, this is not necessary, because Dependabot will end up updating to the latest sha anyways, since it defaults to always keeping the current style/precision.

I think the reason we fetch versions and try to match them against the SHA is to determine the version precision. Otherwise, we can't know what version the image needs to be upgraded to

Not sure if I'm misunderstanding something here though

@Nishnha
Copy link
Member

Nishnha commented Dec 13, 2022

Oh I think I was missing something.... Docker doesn't support any other versioning strategies other than default https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#versioning-strategy

So we really might just be updating all Docker SHAs to latest. The feature seems to be redundant!

Can you confirm that matches what your reasoning for removing this feature? The only exception I can think of is security updates, but GitHub doesn't publish any Docker advisories 🤔

@deivid-rodriguez
Copy link
Contributor Author

Let me try explain what happens when Dependabot tries to update a Dockerfile using a sha256 reference before and after this PR through an example.

Let's take for example the declaration FROM ghcr.io/dependabot/dependabot-updater@sha256: 654abc3436d0c99896df14d3d6bb450774fd5639a97e8a2bf88bc813151ef7bd.

This is what happens without this PR:

  • Dependabot tries to find a tag with sha matching sha256: 654abc3436d0c99896df14d3d6bb450774fd5639a97e8a2bf88bc813151ef7bd.
  • This sometimes works and something don't because while today a specific SHA may be associated with a tag, docker tags are usually moved and previous SHAs pointed to the tag get "orphaned". Also, when this works, it's very slow because it needs to make an API request for every tag checked, and some images have a lot of tags.
  • Assuming the previous point works, Dependabot finds a tag associated to the SHA. In this case, it's v2.0.20221207081814.
  • Now it uses the that tag to find another tag to update. In this case, since the tag uses patch level precision, it will find an up to date patch level tag. In this case, it would figure out it needs to update to the latest tag: v2.0.20221213172033.
  • Finally it finds the SHA of the updated tag: sha256:e95a9f539fa58bc6a2e57fe6333cdcb08d4da9774bddc7d7b66e07bd748a89dd.
  • The final result is Dependabot would create a PR with title "Update ghcr.io/dependabot/dependabot-updater from v2.0.20221207081814 to v2.0.20221213172033" and that updates the Dockerfile from FROM ghcr.io/dependabot/dependabot-updater@sha256: 654abc3436d0c99896df14d3d6bb450774fd5639a97e8a2bf88bc813151ef7bd to FROM ghcr.io/dependabot/dependabot-updater@sha256:e95a9f539fa58bc6a2e57fe6333cdcb08d4da9774bddc7d7b66e07bd748a89dd.

This is the logic with this PR:

  • Dependabot sees that the Dockerfile is locked to a sha256 reference.
  • Dependabot finds the sha256 reference of the latest tag.
  • Dependabot creates a PR with title "Update ghcr.io/dependabot/dependabot-updater from 654abc3436d0c99896df14d3d6bb450774fd5639a97e8a2bf88bc813151ef7bd to e95a9f539fa58bc6a2e57fe6333cdcb08d4da9774bddc7d7b66e07bd748a89dd" and the same diff as with the previous logic.

@deivid-rodriguez
Copy link
Contributor Author

So we really might just be updating all Docker SHAs to latest. The feature seems to be redundant!

Can you confirm that matches what your reasoning for removing this feature? The only exception I can think of is security updates, but GitHub doesn't publish any Docker advisories 🤔

Yes, that's exactly right by the way! Indeed, if we wanted a "lowest fixed version" logic for Security Updates, we wouldn't have that, but that does not seem a concern now, and we don't have that logic in place at the moment anyways.

Copy link
Member

@Nishnha Nishnha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for walking me through the logic on this ❤️

Awesome change. I had one minor nit but this LGTM

@stevehipwell
Copy link

@deivid-rodriguez do you have a status on this?

@deivid-rodriguez
Copy link
Contributor Author

Hei! There's no ETA but the PR is already approved and just needs a rebase and a final look to get shipped. It should not take too long.

@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/docker-dont-fetch-all-the-things branch 3 times, most recently from 7b39e2f to 608e38b Compare April 3, 2023 21:12
@deivid-rodriguez
Copy link
Contributor Author

This should be now ready for another review unless CI proves me wrong. It's a big PR, but mostly deletes a lot of code!

@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/docker-dont-fetch-all-the-things branch from a5a9c9b to 00c1363 Compare April 5, 2023 16:28
Copy link
Member

@Nishnha Nishnha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice this can match tags with different prefixes. It seems like these are both valid?

  • sha265:18305429afa14ea462f810146ba44d4363ae76e4c8dfc38288cf73aa07485005
  • :sha265:18305429afa14ea462f810146ba44d4363ae76e4c8dfc38288cf73aa07485005

But most tags will be in the form of FROM ghcr.io/dependabot/dependabot-updater@sha256: 654abc3436d0c99896df14d3d6bb450774fd5639a97e8a2bf88bc813151ef7bd, right?

Is there a reason you support the other prefix with a leading :? Do they also appear in Dockerfiles?

Copy link
Contributor Author

@deivid-rodriguez deivid-rodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nishnha Regarding your comment about leading :, is it because the regex is now

IMAGE_SPEC = %r{^(#{REGISTRY}/)?#{IMAGE}#{TAG}?(?:@sha256:#{DIGEST})?#{NAME}?}x

If that's it, the ?: part before sha256 is meant to mean "match this group, but don't include it as a named match". So in other words, that leading : is not supposed to be matched, it's just a special character when preceded by ?.

See this: https://rubular.com/r/FXdZeMjcFcMH03.

@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/docker-dont-fetch-all-the-things branch from 00c1363 to 497cb4a Compare April 5, 2023 23:18
@Nishnha
Copy link
Member

Nishnha commented Apr 6, 2023

@Nishnha Regarding your comment about leading :, is it because the regex is now

IMAGE_SPEC = %r{^(#{REGISTRY}/)?#{IMAGE}#{TAG}?(?:@sha256:#{DIGEST})?#{NAME}?}x

If that's it, the ?: part before sha256 is meant to mean "match this group, but don't include it as a named match". So in other words, that leading : is not supposed to be matched, it's just a special character when preceded by ?.

See this: https://rubular.com/r/FXdZeMjcFcMH03.

TIL! 😮

The only other question I have is why remove the private registry tests in 666c100 ?

@deivid-rodriguez
Copy link
Contributor Author

Sorry @Nishnha, not sure what happened there, I was sure I had answered the question about the removed tests 😅.

I removed those because the logic they were testing has been removed.

Previously, if we found an image reference with only a SHA-version (no numeric version), we would talk to the registry to try figure out whether there's a numeric tag associated to that SHA. Those tests were testing that the registry requests to do this happen correctly.

With the new approach, we don't mess with trying to find a numeric version for a given SHA, and rely exclusively to the information present in the Dockerfile itself. So we don't talk to the registry anymore when parsing Dockerfile dependencies.

This is why those tests were removed, I hope it makes sense!

@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/docker-dont-fetch-all-the-things branch from 497cb4a to bd5b3c0 Compare April 10, 2023 15:42
* If the Dockerfile specifies a version, we don't need to do this,
  since we already have a previous version.

* If the Dockerfile only specifies a SHA, I don't think we need it
  either because we will be updating to another SHA, so the user seems
  uninterested in specific versions. And this case doesn't really work
  in most cases anyways because of how inefficient this is.

With this patch, we can find updates to Dockerfiles specifiying only
SHAs much faster.
Previously we would generate something like

```
bump ubuntu from sha256:817cfe4672284dcbfee885b1a66094fd907630d610cab329114d036716be49ba to sha256:67211c14fa74f070d27cc59d69a7fa9aeff8e28ea118ef3babc295a0428a6d21
```

when updating a Dockerfile like

```
FROM ubuntu@sha256:817cfe4672284dcbfee885b1a66094fd907630d610cab329114d036716be49ba
```

but something like

```
bump ubuntu from `817cfe4` to `67211c1`
```

when updating a Dockerfile like

```
FROM ubuntu:22.04@sha256:817cfe4672284dcbfee885b1a66094fd907630d610cab329114d036716be49ba
```

Now we generate the shortened version consistently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L: docker Docker containers
Projects
Archived in project
4 participants