Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JSON schema available for verification under https:// URIs #739

Merged
merged 2 commits into from
Feb 8, 2018

Conversation

mtrmac
Copy link
Contributor

@mtrmac mtrmac commented Jan 10, 2018

tl;dr: After updating gojsonschema to include xeipuuv/gojsonschema#171 , tests fail with

unable to validate: Could not read schema from HTTP, response status is 404 Not Found

This fixes that, but the owner of the JSON schema URI namespace needs to revise the existing status.


Before that gojsonschema change, "$ref" links were interpreted by taking the current schema source file's URI as a base, and treating "$ref" as relative to this.

For example, starting with the [file://]/image-manifest-schema.json URI, as used by Validator.Validate (based on the specs map), the

"$ref": "content-descriptor.json"

reference used to evaluate to file:///content-descriptor.json. gojsonschema.jsonReferenceLoader would then load these file:///*.json URIs via _escFS.


After the gojsonschema change, "$ref" links are evaluated relative to a URI base specified by the "id" attribute inside the schema source, regardless of the “external” URI passed to the gojsonschema.JSONLoader.

This is consistent with http://json-schema.org/latest/json-schema-core.html#rfc.section.8 and http://json-schema.org/latest/json-schema-core.html#rfc.section.9.2 (apart from the "id" vs. "$id" attribute name).

In the same example, the [file://]/image-manifest-schema.json URI contains

"id": "https://opencontainers.org/schema/image/manifest",

so the same

"$ref": "content-descriptor.json"

now evaluates to https://opencontainers.org/schema/image/content-descriptor.json, which is not found by gojsonschema.jsonReferenceLoader (it uses _escFS only for file:/// URIs), resulting in the 404 quoted above.


This is a minimal fix, making the schema files available to gojsonschema at the https:// URIs, while continuing to read them from _escFS.

Because gojsonschema.jsonReferenceLoader can only use the provided fs for file:/// URIs, we are forced to implement our own gojsonschema.JSONLoaderFactory and gojsonschema.JSONLoader; something like this might be more generally useful and should therefore instead be provided by the gojsonschema library.

This particular JSONLoader{Factory,} implementation, though, is image-spec specific because it locally works around various inconsistencies in the image-spec JSON schemas, and thus is not suitable for gojsonschema as is.

Namely, the specs/*.json schema files use URIs with two URI path prefixes, https://opencontainers.org/schema/{,image/}, in the top-level "id" attributes, and the nested "id" attributes along with "$ref" references use several more URI path prefixes, e.g.

"id": "https://opencontainers.org/schema/image/manifest/annotations",
"$ref": "defs-descriptor.json#/definitions/annotations"

in image-manifest-schema.json specifies the https://opencontainers.org/schema/image/manifest/defs-descriptor.json URI.

In fact, defs-descriptor.json references use all of the following URIs:

https://opencontainers.org/schema/defs-descriptor.json
https://opencontainers.org/schema/image/defs-descriptor.json
https://opencontainers.org/schema/image/descriptor/defs-descriptor.json
https://opencontainers.org/schema/image/index/defs-descriptor.json
https://opencontainers.org/schema/image/manifest/defs-descriptor.json

So, this PR introduces a loader which preserves the original _escFS layout by recognizing and stripping all of these prefixes, and using the same /*.json paths for _escFS lookups as before; this is clearly unsuitable for gojsonschema inclusion.


Finally, the reason this PR only uses such a fairly hacky loader is that merely changing the _escFS structure is still not sufficient to get consistent schema: the schema/*.json paths in this repository, and the "$ref" values, do not match the "id" values inside the schemas at all. E.g. image-manifest-schema.json refers to https://opencontainers.org/schema/image/manifest/content-descriptor.json, while content-descriptor.json identifies itself as https://opencontainers.org/schema/descriptor, matching neither the path prefix nor the file name.

Overall, it is completely unclear to me which of the URIs is the canonical URI of the “content descriptor” schema, and other schemas in this repo, and the owner of the URI namespace needs to decide on the canonical schema URIs. Only afterwards can the code be cleanly modified to match the specification.

Until then, this PR at least keeps the tests passing, and the validator usable
by external callers who want to use the public image-spec/schema.ValidateMediaType*.Validate() API.

This only commits the result of (make schema-fs) and is otherwise
unrelated to the rest of the PR.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@wking
Copy link
Contributor

wking commented Jan 10, 2018 via email

mtrmac added a commit to mtrmac/skopeo that referenced this pull request Jan 10, 2018
Anyone running (vndr) currently ends up with failing tests in OCI schema
validation because gojsonschema has fixed its "$ref" interpretation, exposing
inconsistent URI usage inside image-spec/schema.

So, this runs (vndr), and uses mtrmac/image-spec:id-based-loader
( opencontainers/image-spec#739 ) to make the tests pass
again.  As soon as that PR is merged we should revert to using the upstream
image-spec repo again.
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Jan 18, 2018
Anyone running (vndr) currently ends up with failing tests in OCI schema
validation because gojsonschema has fixed its "$ref" interpretation, exposing
inconsistent URI usage inside image-spec/schema.

So, this runs (vndr), and uses mtrmac/image-spec:id-based-loader
( opencontainers/image-spec#739 ) to make the tests pass
again.  As soon as that PR is merged we should revert to using the upstream
image-spec repo again.
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Jan 22, 2018
Anyone running (vndr) currently ends up with failing tests in OCI schema
validation because gojsonschema has fixed its "$ref" interpretation, exposing
inconsistent URI usage inside image-spec/schema.

So, this runs (vndr), and uses mtrmac/image-spec:id-based-loader
( opencontainers/image-spec#739 ) to make the tests pass
again.  As soon as that PR is merged we should revert to using the upstream
image-spec repo again.
}

// JsonReference implements gojsonschema.JSONLoader.JsonReference. The "Json" capitalization needs to be maintained to conform to the interface.
func (l *fsLoader) JsonReference() (gojsonreference.JsonReference, error) { // nolint: golint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSONReference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment says,

The "Json" capitalization needs to be maintained to conform to the interface.

@stevvooe
Copy link
Contributor

stevvooe commented Feb 1, 2018

REJECT

I don't want to see the further adoption of gojsonschema in this project. The complexity of the validation outweigh its usefulness and it doesn't even meet basic needs. I don't see how adding network calls to validation makes this situation any better.

Even worse, if this schema is further published for use in other languages, the assumption will be made that it is sufficient to have just this and not higher-level validation that is required to be compliant with the specification.

Sorry to be harsh here, but this is a huge waste of time. There are problems with the existing validator, such as printing to stdout, that would be a much better use of resources.

@mtrmac
Copy link
Contributor Author

mtrmac commented Feb 2, 2018

REJECT

I don't want to see the further adoption of gojsonschema in this project.

I’m just a consumer of another library which calls public API of this package like schema.ValidatorMediaTypeImageIndex.Validate, and that public API fails.

*shrug*. I ultimately don’t care one iota whether it is fixed by this PR, by modifying the schema as @wking suggests, or by rewriting schema.Validator.Validate to be just a return nil.

The one thing that does not work for me is doing nothing and keeping the self-tests in this package broken.

There are problems with the existing validator, such as printing to stdout, that would be a much better use of resources.

My only concern is that the validator does not work at all. Whether it prints unwanted output somewhere can not even begin to be a concern until that if fixed AFAICT.


I don't see how adding network calls to validation makes this situation any better.

This is removing network calls. The current implementation is trying to really access URLs like https://opencontainers.org/schema/image/image-layout-schema.json; this PR modifies the validator to instead use the .json files in this repo.

@stevvooe
Copy link
Contributor

stevvooe commented Feb 2, 2018

@mtrmac I am sorry if I came across as not wanting to accept a fix here. We really do want the same result: a working schema.ValidatorMediaTypeImageIndex.Validate call. The gojsonschema and JSON Schema is impeding that.

This is removing network calls. The current implementation is trying to really access URLs like https://opencontainers.org/schema/image/image-layout-schema.json; this PR modifies the validator to instead use the .json files in this repo.

I apologize if I'm confused here: this gojsonschema package is a nightmare. I really can't tell what it is and it not doing. This PR is adding urls to the validator layout which looks like it is creating the assumption that these will be hosted on opencontainers.org.

My only concern is that the validator does not work at all. Whether it prints unwanted output somewhere can not even begin to be a concern until that if fixed AFAICT.

I would love to see this package fixed, as well, but doubling down on json-schema, after it has been so problematic, is not the solution. This PR just adds more complexity that will have to be removed when we have a real solution. I am even more worried that we are creating more dependencies on, as well as the effect of removing it.

Let's cut our losses here and get working validation.

@mtrmac
Copy link
Contributor Author

mtrmac commented Feb 2, 2018

I’m sorry; apparently there is some history to this code and the discussion, about which I know nothing. I didn’t mean to take sides in any such discussion, or really to revive it at all. The PR is, from the start, a minimal hack which completely defers to the maintainers of the schema for actually cleaning up the inconsistencies.

I’m happy to work on implementing any other reasonably-sized fix, whether a “real solution” or a short-term one, on which there can be consensus. Completely dropping the schema files and reimplementing the equivalent in Go is probably more than I can commit to at the moment, though.


This PR is adding urls to the validator layout which looks like it is creating the assumption that these will be hosted on opencontainers.org.

The "id" values inside the schema files claim that this is the way to refer to them. gojsonschema used to incorrectly ignore that and use file://$relative paths. Hence the code before this PR implements a fs which loads file:// URIs from the fs.go file, compiled from the .schema files in the repo.

This PR changes nothing fundamental about the existence or purpose of fs.go.

Now that gojsonschema has been fixed, because of the "id" values, it uses https://opencontainers.org/* URIs to read referenced subschemas. So, the PR only instructs gojsonschema to treat these https://* URIs the same way, by loading the data from fs.go.

Regardless of the https schema, the loading is done by the new fsLoader from the data in fs.go, with no network interaction.

It was marginally more consistent to stop mapping file:// URIs, and to modify the specs map to use https:// URIs, which is why the PR seems to add https references. It would also be possible to add file:/// to the schemaNamespaces array, and prefix all values of specs with file:/// — but the schema files already contain https:// URIs anyway.

schema/schema.go Outdated
// specs maps OCI schema media types to schema files.
// schemaNamespaces is a set of URI prefixes which contain the schema files of fs (used in JSON schema "id" and "$ref" attributes).
// fs stores all of its files in the root of the filesystem tree.
// (Note that this must contain subdirectories before its parent directories for fsLoaderFactory.refContents to work.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs stronger wording to indicate that these aren't actually hosted at the URIs provided and this is requirement of the fix to avoid remote IO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs stronger wording to indicate that these aren't actually hosted at the URIs provided and this is requirement of the fix to avoid remote IO.

If there is no intention of hosting the files at the URIs listed in the id properties, why don't we just drop the id properties from the JSON Schema (like opencontainers/runtime-spec#945 is doing for runtime-spec)? Then image-spec won't need to fake the presence of resources at those URIs with loader hoop-jumping and other JSON Schema consumers which ingest this JSON Schema without going through the gojsonschema loader will work too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to hopefully explain the situation, while still being concise enough.

I have left the more long-winded explanation of what the commit does, and why, and what it doesn’t do, to the commit message and the PR discussion. If you’d rather have more of the rationale in code comments, please say so.

@stevvooe
Copy link
Contributor

stevvooe commented Feb 2, 2018

@mtrmac I think we go ahead with this hack, since it removes the IO and unblocks your activity. I have one request about adding a stronger comment around the URIs.

Please feel free to hit me up on a more synchronous method of communication so we can work out the next steps.

@mtrmac mtrmac force-pushed the id-based-loader branch 2 times, most recently from 244d16b to 1ed8b65 Compare February 6, 2018 15:50
After updating gojsonschema to include
xeipuuv/gojsonschema#171 , tests fail with
> unable to validate: Could not read schema from HTTP, response status is 404 Not Found

Before that gojsonschema change, "$ref" links were interpreted by taking
the current schema source file's URI as a base, and treating "$ref"
as relative to this.

For example, starting with the [file://]/image-manifest-schema.json
URI, as used by Validator.Validate (based on the "specs" map), the
>  "$ref": "content-descriptor.json"
reference used to evaluate to file:///content-descriptor.json.
gojsonschema.jsonReferenceLoader would then load these file:///*.json
URIs via _escFS.

After the gojsonschema change, "$ref" links are evaluated relative to
a URI base specified by the "id" attribute inside the schema source,
regardless of the "external" URI passed to the gojsonschema.JSONLoader.

This is consistent with
http://json-schema.org/latest/json-schema-core.html#rfc.section.8 and
http://json-schema.org/latest/json-schema-core.html#rfc.section.9.2
(apart from the "id" vs. "$id" attribute name).

In the same example, [file://]/image-manifest-schema.json URI contains
>  "id": "https://opencontainers.org/schema/image/manifest",
so the same
>  "$ref": "content-descriptor.json"
now evaluates to
"https://opencontainers.org/schema/image/content-descriptor.json",
which is not found by gojsonschema.jsonReferenceLoader (it uses
_escFS only for file:/// URIs), resulting in the 404 quoted above.

This is a minimal fix, making the schema files available to
gojsonschema at the https:// URIs, while continuing to read them from
_escFS.

Because gojsonschema.jsonReferenceLoader can only use the provided fs
for file:/// URIs, we are forced to implement our own
gojsonschema.JSONLoaderFactory and gojsonschema.JSONLoader; something
like this might be more generally useful and should therefore instead
be provided by the gojsonschema library.

This particular JSONLoader{Factory,} implementation, though, is
image-spec specific because it locally works around various
inconsistencies in the image-spec JSON schemas, and thus is not suitable
for gojsonschema as is.

Namely, the specs/*.json schema files use URIs with two URI path prefixes,
https://opencontainers.org/schema/{,image/}
in the top-level "id" attributes, and the nested "id" attributes along
with "$ref" references use _several more_ URI path prefixes, e.g.
>       "id": "https://opencontainers.org/schema/image/manifest/annotations",
>      "$ref": "defs-descriptor.json#/definitions/annotations"
in image-manifest-schema.json specifies the
https://opencontainers.org/schema/image/manifest/defs-descriptor.json
URI.

In fact, defs-descriptor.json references use all of the following URIs:
> https://opencontainers.org/schema/defs-descriptor.json
> https://opencontainers.org/schema/image/defs-descriptor.json
> https://opencontainers.org/schema/image/descriptor/defs-descriptor.json
> https://opencontainers.org/schema/image/index/defs-descriptor.json
> https://opencontainers.org/schema/image/manifest/defs-descriptor.json

So, this commit introduces a loader which preserves the original _escFS
layout by recognizing and stripping all of these prefixes, and using
the same /*.json paths for _escFS lookups as before; this is clearly
unsuitable for gojsonschema inclusion.

Finally, the reason this commit uses such a fairly hacky loader is that merely
changing the _escFS structure is still not sufficient to get consistent
schema: the schema/*.json paths in this repository, and the "$ref" values,
do not match the "id" values inside the schemas at all.  E.g.
image-manifest-schema.json refers to
https://opencontainers.org/schema/image/manifest/content-descriptor.json ,
while content-descriptor.json identifies itself as
https://opencontainers.org/schema/descriptor , matching neither the path prefix
nor the file name.

Overall, it is completely unclear to me which of the URIs is the canonical URI
of the "content descriptor" schema, and the owner of the URI namespace
needs to decide on the canonical schema URIs.  Only afterwards can the
code be cleanly modified to match the specification; until then, this
commit at least keeps the tests passing, and the validator usable
by external callers who want to use the public
image-spec/schema.ValidateMediaType*.Validate() API.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@cyphar
Copy link
Member

cyphar commented Feb 6, 2018

I'm 👍 on this hack for unblocking the general validation issue (it's causing problems with umoci to switch to the newest image-spec because we use the validation suite to make sure that we are compliant). I also would like to come up with a much more sane way of validating the JSON schema we have at the moment -- and hopefully we can come up with a way of doing this that doesn't also have the drawbacks of the runtime-spec.

@stevvooe stevvooe merged commit 1492521 into opencontainers:master Feb 8, 2018
@mtrmac mtrmac deleted the id-based-loader branch February 9, 2018 17:04
@mtrmac
Copy link
Contributor Author

mtrmac commented Feb 9, 2018

Thanks!

mtrmac added a commit to mtrmac/skopeo that referenced this pull request Feb 9, 2018
Anyone running (vndr) currently ends up with failing tests in OCI schema
validation because gojsonschema has fixed its "$ref" interpretation, exposing
inconsistent URI usage inside image-spec/schema.

So, this runs (vndr), and uses mtrmac/image-spec:id-based-loader
( opencontainers/image-spec#739 ) to make the tests pass
again.  As soon as that PR is merged we should revert to using the upstream
image-spec repo again.
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Feb 9, 2018
Anyone running (vndr) currently ends up with failing tests in OCI schema
validation because gojsonschema has fixed its "$ref" interpretation, exposing
inconsistent URI usage inside image-spec/schema.

So, this runs (vndr), and uses mtrmac/image-spec:id-based-loader
( opencontainers/image-spec#739 ) to make the tests pass
again.  As soon as that PR is merged we should revert to using the upstream
image-spec repo again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants