Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote cache manifest to support oci mediatypes for spec compatibility #1550

Closed
Craga89 opened this issue Jun 30, 2020 · 12 comments · Fixed by #1746
Closed

remote cache manifest to support oci mediatypes for spec compatibility #1550

Craga89 opened this issue Jun 30, 2020 · 12 comments · Fixed by #1746
Milestone

Comments

@Craga89
Copy link

Craga89 commented Jun 30, 2020

I'm currently use the Quay registry, and attempting to utilise the --export-cache feature i.e. --export-cache type=registry,ref=quay.io/hivehr/app,push=true. However, I'm getting the following error consistently:

ERROR: error writing manifest blob: failed commit on ref "sha256:<SHA>": unexpected status: 400 BAD REQUEST

After contacting Quay's support team, they've looked into this and determined that BuildKit's --export-cache seems to be creating a malformed schema v2 manifest list. This is the exact error:

[ERROR] [endpoints.v2.manifest] failed to parse manifest when writing by tagname
Traceback (most recent call last):
  File "/quay-registry/endpoints/v2/manifest.py", line 246, in _parse_manifest
    return parse_manifest_from_bytes(Bytes.for_string_or_unicode(request.data), content_type)
  File "/quay-registry/image/docker/schemas.py", line 22, in parse_manifest_from_bytes
    return DockerSchema2ManifestList(manifest_bytes)
  File "/quay-registry/image/docker/schema2/list.py", line 210, in __init__
    raise MalformedSchema2ManifestList("manifest data does not match schema: %s" % ve)
MalformedSchema2ManifestList: manifest data does not match schema: 'platform' is a required property

Quay checks the validity of the manifest pushed before it takes it into consideration, and it's failing to parse the manifest output by BuildKit in this instance.

To rule out it being a Quay-specific issue, I've also tried multiple times to push to Docker Hub with the following command:

./buildctl build --frontend dockerfile.v0 --local dockerfile=. --output type=image,name=hivehr/app:latest,push=true --export-cache type=registry,ref=ibazulic/centos-build:latest,push=true

There appears to be no errors, but the image is not available in the web UI and if I try to get the manifest of the image with Docker CLI, I get a 500 error. The image also cannot be pulled from Docker Hub either.

Can you give any guidance as to what the cause might be here? Apologies if this issue is already logged. I attempted to find a similar issue but none appeared to be quite the same.

@Craga89 Craga89 changed the title Pushing to the registry--export-cache [Bug] --export-cache is generating a malformed v2 schema manifest. Missing platform Jun 30, 2020
@Craga89 Craga89 changed the title [Bug] --export-cache is generating a malformed v2 schema manifest. Missing platform [Bug] --export-cache is generating a malformed v2 schema manifest. Missing platform property Jun 30, 2020
@tonistiigi
Copy link
Member

Platform is not a required property https://github.com/opencontainers/image-spec/blob/master/image-index.md#image-index-property-descriptions

platform object

This OPTIONAL property describes the minimum runtime requirements of the image. This property SHOULD be present if its target is platform-specific.

@Craga89
Copy link
Author

Craga89 commented Jul 1, 2020

I've passed that information on the Ivan at Quay.io support and received the following response, which you may be able to shed some light on @tonistiigi ?


Hi Craig,

The 2nd sentence of that link sounds like it indeed should be there in some cases. I changed the output for caching to a local directory and managed to get the manifest, the error could be a red herring because of the content of the manifests:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "digest": "sha256:b606632bbc6dfb0797c52a09ac3e89370a6aa94ff52b2c3c4492296b50059eab",
      "size": 1284,
      "annotations": {
        "org.opencontainers.image.ref.name": "latest"
      }
    }
  ]
}

This one looks fine. However, the SHA it references is this:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
  "manifests": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:6910e5a164f725142d77994b247ba20040477fbab49a721bdbe8d61cf855ac23",
      "size": 74866818,
      "annotations": {
        "buildkit/createdat": "2020-06-30T16:01:07.973893119+02:00",
        "containerd.io/uncompressed": "sha256:eb29745b8228e1e97c01b1d5c2554a319c00a94d8dd5746a3904222ad65a13f8"
      }
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:999b175092e6cb543ca714d498b5f679e0d18b5b1dc263ef8986bd632b8f9281",
      "size": 19770948,
      "annotations": {
        "buildkit/createdat": "2020-06-30T16:01:13.703627202+02:00",
        "containerd.io/uncompressed": "sha256:dbf1b1707df8f154e6a369f5ddb3ae5a912a6fcbb033369d2a382c096041a8cc"
      }
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:a58491d2c839b2cae946673c81043e036cd34eb25a5f378c63454a582fd42978",
      "size": 22127824,
      "annotations": {
        "buildkit/createdat": "2020-06-30T16:01:18.173803177+02:00",
        "containerd.io/uncompressed": "sha256:7326c4a5ae270945e15200eb087d4d1736f78e327110c94a7c35de57ceb47002"
      }
    },
    {
      "mediaType": "application/vnd.buildkit.cacheconfig.v0",
      "digest": "sha256:35ae7bb4b38b03d69b5bf59d996a9ac84c4a7584a61fcfdd990009820d6830a0",
      "size": 841
    }
  ]
}

It could be a problem in the last media type for the last digest. Content types for Docker v2 schema 2 in Quay are defined here:

https://github.com/quay/quay/blob/312717c7891b56ce5433dd573d06d3f579bce944/image/docker/schema2/__init__.py

OCI content types are defined here:

https://github.com/quay/quay/blob/312717c7891b56ce5433dd573d06d3f579bce944/image/oci/__init__.py

It could be that this is why Quay refuses to validate the manifest. I can open an internal bug tracker for this issue to see what our engineers will say. Unfortunately, for bugs and RFEs I can't give you any ETA as to when the issue will be fixed.

@josephschorr
Copy link

@tonistiigi Engineer of Quay here

platform is not a required property in the OCI image format, but this is being pushed in Docker Manifest 2, Schema 2 format, which based on my reading of https://docs.docker.com/registry/spec/manifest-v2-2/, does not seem to allow for it to be optional.

Further, even if we did allow it to be optional, it appears BuiltKit is adding a layer of type "mediaType": "application/vnd.buildkit.cacheconfig.v0",, which is also unsupported in Docker Manifest 2 Schema 2.

@tonistiigi
Copy link
Member

If you want we could add the oci-mediatypes=true option for cache export as we do for images. Not ready to have it default yet as older registries don't support them.

@Craga89
Copy link
Author

Craga89 commented Jul 1, 2020

If I understand the problem correctly, it seems the problem is that Quay doesn't support the OCI format at current, which is why the manifest upload is failing.

If you want we could add the oci-mediatypes=true option for cache export as we do for images. Not ready to have it default yet as older registries don't support them.

Would this setting solve anything given the above issue? What would setting is to true do in this context?

@josephschorr
Copy link

josephschorr commented Jul 1, 2020

@Craga89 Well, and the media type of the produced list is "application/vnd.docker.distribution.manifest.list.v2+json", but it contains tar-gzipped layers (instead of manifests) as well as a cache config, neither which is (AFAIK) valid in Schema 2 Manifest List

Edit: Oh, and even if this was an OCI index, there is (as of yet) no defined means for storing custom layer types in OCI indexes, as the artifacts spec does not yet support it; the expected design would be to store the cache as a manifest with a custom schema type in OCI, not as an index/list.

@tonistiigi
Copy link
Member

tonistiigi commented Jul 1, 2020

What would setting is to true do in this context?

It would just replace "docker" string in the mediatype values to "oci". No changes to the actual objects. We don't set it by default so that more registries that don't know about oci would be supported. This would allow us to be compatible with the spec(by switching the spec document). Looks like the spec docs from 2016 do not explicitly mark the platform field as optional although all the implementations of docker/distribution and hub have always done that. The pattern of using descriptor lists like this(for non-manifests) is nothing novel to buildkit, same thing is used bu cnab-oci, contained snapshots etc.

@josephschorr
Copy link

@tonistiigi Given that the OCI artifacts spec is expressly designed to address these kinds of custom pushed resources, I'd suggest we circle back on whether this should be using an Index vs a Manifest, as Quay will likely be following the specification and only allow artifacts as manifests until such time as the spec is extended.

@tonistiigi tonistiigi changed the title [Bug] --export-cache is generating a malformed v2 schema manifest. Missing platform property remote cache manifest to support oci mediatypes for spec compatibility Jul 2, 2020
@jemcclin
Copy link

I've noticed the same problem when pushing caches to Azure Container Registry--it throws a 400 Bad Request error (tested on Buildkit 0.6.2 and 0.7.1 with the option --export-cache type=registry,ref=$(params.image_repo)/$(params.image_name):buildcache with params substituted in by our CI system).

It happens intermittently, but often enough that we've had to switch to an alternate cache export for the time being due to the volume of "failed" builds due exclusively to the cache export. I don't have Azure-side logs to provide more diagnostic information yet, though.

@tonistiigi
Copy link
Member

It happens intermittently

If it happens intermittently it probably is not related to the mediatypes issue here. @cpuguy83 have you seen something like this in ACR?

@cpuguy83
Copy link
Member

Yep, let me bring this up with the ACR team again.

@tonistiigi tonistiigi added this to the v0.8.0 milestone Jul 21, 2020
@TBBle
Copy link
Collaborator

TBBle commented Jul 26, 2020

Worth tracking that Amazon ECR has a similar concern as that raised by @josephschorr:

There is an additional issue specifically related to using manifest lists as build caches (which reference layers, instead of referencing images). Due to this, buildkit caching will continue to return '405 Method Not Allowed'. We will track support for this as a separate feature in #876.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants