sample-apiserver returns 200 with error marshalling protobuf response #50342

cheftako · 2017-08-08T23:00:32Z

/kind bug

What happened:
Sent a request for an aggregated resource (sample-api wardle/flunders). Response was a 200 status code and an error message saying the response could not be given.

What you expected to happen:
We should have an error status code.

How to reproduce it (as minimally and precisely as possible):

Set up sample-apiserver.
curl -k -v -XGET -H "User-Agent: kubectl/v1.8.0 (linux/amd64) kubernetes/49ae996" -H "Accept: application/vnd.kubernetes.protobuf" -H "Authorization: Bearer " https:///apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders
Response
< HTTP/1.1 200 OK
< Content-Length: 173
< Content-Type: application/vnd.kubernetes.protobuf
< Date: Tue, 08 Aug 2017 22:57:37 GMT
<
k8s
�v1��Status��
�

Connection #0 to host 35.193.112.40 left intact
��Failure�~object *v1alpha1.FlunderList does not implement the protobuf marshalling interface and cannot be encoded to a protobuf message"0��"

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.8
Cloud provider or hardware configuration**: GCE
OS (e.g. from /etc/os-release): Debian GNU/Linux 7 (wheezy)
Kernel (e.g. uname -a): 3.16.0-4-amd64
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

cheftako · 2017-08-08T23:01:00Z

/sig api-machinery

liggitt · 2017-08-09T03:16:00Z

I expect that's a shortcoming of the example server, not the aggregator itself

fejta-bot · 2018-01-11T06:25:36Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

nikhita · 2018-01-11T06:40:17Z

/remove-lifecycle stale

fejta-bot · 2018-05-30T09:51:21Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

nikhita · 2018-06-01T09:16:06Z

/remove-lifecycle stale
/lifecycle frozen

tristanburgess · 2018-08-01T16:38:14Z

Hello, I'd like to take a look into this as my first contribution.

cheftako · 2018-08-01T20:45:42Z

/assign @tristanburgess

k8s-ci-robot · 2018-08-01T20:45:43Z

@cheftako: GitHub didn't allow me to assign the following users: tristanburgess.

Note that only kubernetes members and repo collaborators can be assigned.
For more information please see the contributor guide

In response to this:

/assign @tristanburgess

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tristanburgess · 2018-08-03T02:15:44Z

Looking into this so far, I'm able to reproduce the issue, and also that using -H "Accept: application/json" does serialize correctly:

~/go/src/k8s.io/sample-apiserver$ curl -k -v  -H "Accept: application/json" -H "Authorization: Bearer " http://localhost:8080/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders --output -
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: application/json
> Authorization: Bearer 
> 
< HTTP/1.1 200 OK
< Content-Length: 673
< Content-Type: application/json
< Date: Fri, 03 Aug 2018 01:51:15 GMT
< 
{
  "kind": "FlunderList",
  "apiVersion": "wardle.k8s.io/v1alpha1",
  "metadata": {
    "selfLink": "/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders",
    "resourceVersion": "5"
  },
  "items": [
    {
      "metadata": {
        "name": "my-first-flunder",
        "namespace": "default",
        "selfLink": "/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder",
        "uid": "3286f86a-96bd-11e8-bad0-0242ac110003",
        "resourceVersion": "2",
        "creationTimestamp": "2018-08-03T01:33:17Z",
        "labels": {
          "sample-label": "true"
        }
      },
      "spec": {
        
      },
      "status": {}
    }
  ]
}

There are potentially two separate considerations out of this as far as I can tell (in addition to the original issue):

sample-apiserver's example resource types do serialize JSON correctly, but does not appear implement protobuf serialization. I still need to confirm that it definitely isn't implemented (the error conditions at play may be too general to pick up a more specific error in play). There's a Fischer example resource type in addition to Flunder. Would be worth confirming the behavior of that type as well. If desired, I can also file a feature request to implement protobuf serialization for the example resource types.
I will investigate separately if disabling the JSON serial/deserial impl has a similar effect as this bug, or a more desirable effect; either way, would be good to unify the two behaviors. Lumped in with this part of the investigation will be to figure out what's handling the protobuf unmarshalable error from apimachinery, and ultimately giving the 200.

In terms of the actual status code we want to return, would 406 Not Acceptable make sense for this?
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406
Looking at the doc above it seems to fit the situation that the requestor requests a response format that isn't servable. It looks like completely invalid Accept: formats to the API do serve a 406. Any thoughts?

curl -k -v  -H "Accept: application/invalid" -H "Authorization: Bearer " http://localhost:8080/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders --output -
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: application/invalid
> Authorization: Bearer 
> 
< HTTP/1.1 406 Not Acceptable
< Content-Length: 182
< Content-Type: application/json
< Date: Fri, 03 Aug 2018 02:12:26 GMT
< 
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "406: Not Acceptable",
  "reason": "NotAcceptable",
  "details": {},
  "code": 406
}

tristanburgess · 2018-08-03T18:06:41Z

Pertaining to 2), JSON serialization appears to be baked in to the API as the default, so there really isn't a similar unmarshalable error for it. A JSON marshalization fault at the apimachinery level currently would prevent the apiserver from starting up successfully. Therefore, only have to worry about protobuf for this.

tristanburgess · 2018-08-04T00:16:36Z

For 1), .proto definitions definitely weren't created for the extension API resource types. I generated them and confirmed that API response was successful after that. I'll file a separate ticket/PR for the addition to the sample APIs after this issue is fixed.

~/go/src/k8s.io/sample-apiserver$ curl -k -v  -H "Accept: application/vnd.kubernetes.protobuf" -H "Authorization: Bearer " http://localhost:8080/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder --output -
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: application/vnd.kubernetes.protobuf
> Authorization: Bearer 
> 
< HTTP/1.1 200 OK
< Content-Length: 236
< Content-Type: application/vnd.kubernetes.protobuf
< Date: Sat, 04 Aug 2018 00:13:30 GMT
< 
k8s
!
�wardle.k8s.io/v1alpha1�Flunder���
��
�my-first-flunder��default"I/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder*$314ac05b-977b-11e8-a525-0242ac1100032�28�ߓ��Z�

sample-label��truez�

tristanburgess · 2018-08-05T17:34:51Z

I had a starting implementation that successfully returned a 406 on protobuf marshalling definition not implemented (errNotMarshalable), but on running unit tests I stumbled upon a larger problem.

There's an order of operations issue which is preventing http status code from being handled properly with both serialization encode success and serialization encode failure between apiserver and apimachinery.

http.ResponseWriter::WriteHeader(statusCode) can only be called once.
http.ResponseWriter::Write([]byte) will call WriteHeader(200) if WriteHeader(statusCode) is not called explicitly beforehand.

In k8s.io/apiserver/pkg/endpoints/responsewriters/writers::SerializeObject(), WriteHeader() is called with the statusCode passed in before calling Encode(),which generates a serialized response body with machinery in k8s.io/apimachinery/pkg/runtime/serializer.

The problem in SerializeObject() and child methods is if encoding fails, regardless of the failure reason. We have already called WriteHeader() before Encode(), so the Encode() error handler in SerializeObject(), errorJSONFatal(), has no net effect on the status code in the response, even though it generates an appropriate error and tries to call WriteHeader() with the new status code.

The specific case of encode failing can be resolved by calling WriteHeader() only after Encode(). If there's an error during encoding, no full Write() will occur such that the default 200 status code will be set in place. Thus, errorJSONFatal() would successfully write the new error code resultant from encode failure, which includes handling of the errNotMarshalable protobuf error enhancements I have added.

The catch is, this change would now cause successful encode to return the wrong status code, because our Encode(), and even Golangs built in encoding/json package (which is used for our JSON encoding), will both translate and Write() the data, which in turn will call WriteHeader(200), because we haven't called WriteHeader() ourselves yet.

The solution here is likely to move header modifications closer to Write() time, to allow for this differentiation between original statusCode and a potential statusCode resultant from an encode failure.

Two questions I'm currently weighing are:

If the original status code is failure (4xx, 5xx), and encoding also fails, what's the right status code and response?
If we move WriteHeader() closer to Write(), such that pre-Write() errors can be handled with the correct status, but Write() itself fails, we still have a problem where we've already called WriteHeader(), so we can't further change the status code. What's the correct behavior here? Possibly to return the original status code but generate a raw text response body instead.

…andling_refinement_for_serialization_encode Automatic merge from submit-queue (batch tested with PRs 67041, 66948). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. 50342: Establish '406 Not Acceptable' response for protobuf serializa… …tion 'errNotMarshalable' - Added metav1.Status() that enforces '406 Not Acceptable' response if protobuf serialization is not fully supported for the API resource type. - JSON and YAML serialization are supposed to be more completely baked in, so serialization involving those, and general errors with seralizing protobuf, will return '500 Internal Server Error'. - If serialization failure occurs and original HTTP status code is error, use the original status code, else use the serialization failure status code. - Write encoded API responses to intermediate buffer - Use apimachinery/runtime::Encode() instead of apimachinery/runtime/protocol::Encode() in apiserver/endpoints/handlers/responsewriters/writers::SerializeObject() - This allows for intended encoder error handling to fully work, facilitated by apiserver/endpoints/handlers/responsewriters/status::ErrorToAPIResponse() before officially writing to the http.ResponseWriter - The specific part that wasn't working by ErrorToAPIResponse() was the HTTP status code set. A direct call to http.ResponseWriter::WriteHeader(statusCode) was made in SerializeObject() with the original response status code, before performing the encode. Once this method is called, it can not again update the status code at a later time, with say, an erro status code due to encode failure. - Updated relevant apiserver unit test to reflect the new behavior (TestWriteJSONDecodeError()) - Add build deps from make update for protobuf serializer **What this PR does / why we need it**: This PR fixes a bug that was blocking extensible error handling in the case that serializing response data fails, and implements a '406 Not Acceptable' status code response if protobuf marshal definitions are not implemented for an API resource type. See commit message for further details. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes kubernetes#50342 **Special notes for your reviewer**: **Release note**: ```release-note Fixed a bug that was blocking extensible error handling when serializing API responses error out. Previously, serialization failures always resulted in the status code of the original response being returned. Now, the following behavior occurs: - If the serialization type is application/vnd.kubernetes.protobuf, and protobuf marshaling is not implemented for the requested API resource type, a '406 Not Acceptable is returned'. - If the serialization type is 'application/json': - If serialization fails, and the original status code was an failure (e.g. 4xx or 5xx), the original status code will be returned. - If serialization fails, and the original status code was not a failure (e.g. 2xx), the status code of the serialization failure will be returned. By default, this is '500 Internal Server Error', because JSON serialization is our default, and not supposed to be implemented on a type-by-type basis. ```

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 8, 2017

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 8, 2017

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Aug 8, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 8, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018

sttts added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 1, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2018

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 1, 2018

This was referenced Aug 6, 2018

50342: Establish '406 Not Acceptable' response for protobuf serializa… #67041

Merged

sample-apiserver API resource types don't have protobuf definitions, requests respond with errNotMarshalable #67047

Closed

k8s-github-robot closed this as completed in #67041 Aug 18, 2018

smarterclayton mentioned this issue May 4, 2019

Replace HTTP compression with a more scoped impl, only use on responses > 128KB #77449

Merged

liggitt mentioned this issue Dec 13, 2019

sample-apiserver returns "$type does not implement the protobuf marshalling interface and cannot be encoded to a protobuf message" error instead of falling back to second Accept type #86253

Open

liggitt changed the title ~~Aggregator returns 200 with error marshalling protobuf response~~ sample-apiserver returns 200 with error marshalling protobuf response Dec 13, 2019

liggitt mentioned this issue Dec 13, 2019

wardle example e2e #84735

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample-apiserver returns 200 with error marshalling protobuf response #50342

sample-apiserver returns 200 with error marshalling protobuf response #50342

cheftako commented Aug 8, 2017

cheftako commented Aug 8, 2017

liggitt commented Aug 9, 2017

fejta-bot commented Jan 11, 2018

nikhita commented Jan 11, 2018

fejta-bot commented May 30, 2018

nikhita commented Jun 1, 2018

tristanburgess commented Aug 1, 2018

cheftako commented Aug 1, 2018

k8s-ci-robot commented Aug 1, 2018

tristanburgess commented Aug 3, 2018 •

edited

Loading

tristanburgess commented Aug 3, 2018 •

edited

Loading

tristanburgess commented Aug 4, 2018 •

edited

Loading

tristanburgess commented Aug 5, 2018 •

edited

Loading

sample-apiserver returns 200 with error marshalling protobuf response #50342

sample-apiserver returns 200 with error marshalling protobuf response #50342

Comments

cheftako commented Aug 8, 2017

cheftako commented Aug 8, 2017

liggitt commented Aug 9, 2017

fejta-bot commented Jan 11, 2018

nikhita commented Jan 11, 2018

fejta-bot commented May 30, 2018

nikhita commented Jun 1, 2018

tristanburgess commented Aug 1, 2018

cheftako commented Aug 1, 2018

k8s-ci-robot commented Aug 1, 2018

tristanburgess commented Aug 3, 2018 • edited Loading

tristanburgess commented Aug 3, 2018 • edited Loading

tristanburgess commented Aug 4, 2018 • edited Loading

tristanburgess commented Aug 5, 2018 • edited Loading

tristanburgess commented Aug 3, 2018 •

edited

Loading

tristanburgess commented Aug 3, 2018 •

edited

Loading

tristanburgess commented Aug 4, 2018 •

edited

Loading

tristanburgess commented Aug 5, 2018 •

edited

Loading