Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when etcd3 watch finds delete event with nil prevKV #76675

Merged
merged 1 commit into from
Apr 29, 2019

Conversation

ryanmcnamara
Copy link

@ryanmcnamara ryanmcnamara commented Apr 16, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind bug

What this PR does / why we need it:

See #76624 (comment), taking half of that suggestion (for now), which is to end the watch when a delete event does not have a prevKV

Does this PR introduce a user-facing change?:

Fixes an error with stuck informers when an etcd watch receives update or delete events with missing data

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 16, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @ryanmcnamara. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 16, 2019
@ryanmcnamara
Copy link
Author

ryanmcnamara commented Apr 16, 2019

@liggitt, should we do the same thing for "update" events, I believe in this part of the code that would be events that !isDeleted && !isCreated?

@ryanmcnamara
Copy link
Author

I'll be testing this change out on our stack that initially ran into #76624 using this branch: https://github.com/ryanmcnamara/kubernetes/tree/rm/error-on-nil-event-demo

@liggitt
Copy link
Member

liggitt commented Apr 16, 2019

@liggitt, should we do the same thing for "update" events, I believe in this part of the code that would be events that !isDeleted && !isCreated?

I believe so, yes

@@ -53,5 +54,9 @@ func parseEvent(e *clientv3.Event) *event {
if e.PrevKv != nil {
ret.prevValue = e.PrevKv.Value
}
return ret
if ret.isDeleted && ret.prevValue == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the previous value existed, but was a zero-length byte array, does this show up as nil here? do we know if e.PrevKv as nil as well when we encountered this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the previous value existed, but was a zero-length byte array, does this show up as nil here?

I expect it would show up as the zero length byte array

do we know if e.PrevKv as nil as well when we encountered this?

All I can be certain is that when I encountered this either e.PrevKv was nil or e.PrevKv.Value was nil. I expect that e.PrevKv was nil though, given the info I had on compaction taking place and this comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect it would show up as the zero length byte array

expectations are funny things... if it uses protobuf, zero-length and nil arrays can be serialized identically

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was mostly trying to fix the specific problem I encountered, to error here if and only if the prev value was nil, and not erroring on the empty array as I'm not sure we have that guarantee here.

Do you think this should error for both? error if len(ret.preValue)==0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if e.PrevKv is nil, that seems like a much clearer indicator of a problem

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 16, 2019
@ryanmcnamara
Copy link
Author

Will deploy this change as is now, wait for the repro (which could take several days) and confirm that it handles it better

@fedebongio
Copy link
Contributor

/assign @jpbetz
/cc @wenjiaswe @jingyih

@k8s-ci-robot
Copy link
Contributor

@fedebongio: GitHub didn't allow me to request PR reviews from the following users: jingyih.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @jpbetz
/cc @wenjiaswe @jingyih

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ryanmcnamara
Copy link
Author

The conditions of the issue repro'd and this change did the correct thing, spotting the nil prevKV and closing the watch.

Full log line:

2019-04-24 02:03:49.757 watch chan error: etcd event was not a create and has a nil prevKV, key: /registry/pods/ns1/pod1, modRevision: 4842145275, isDelete false k8s.io/kubernetes/kube-apiserver/watcher.go:215

Here is a graph of the number of pods an informer is holding
image
The metrics were obtained by making a separate informer per apiserver. As you can see, updates seem to halt for some time, but recovers immediately when the watch is closed.

While this graph is secondary (shows the view of the problem from a client of the apiserver's perspective) it's still useful in understanding the timeline.

@liggitt
Copy link
Member

liggitt commented Apr 25, 2019

nice graph! do you have any insight as to what happened between 1:55 and 2:00 (based on apiserver or etcd logs)? do you know which apiserver/etcd member was affected? anything suspect in the logs for that member?

@ryanmcnamara
Copy link
Author

The event started exactly when the kube controller manager leader switched (this happens at 1:55). The apiserver it's now hitting is the impacted one. That's how this issue has always reproduced.

Looking at etcd logs and metrics I don't see anything interesting.

Each time this happens I notice load going quite high on the master host, but only around .6 or so normalized at the start of the incident, going to 1 briefly, here's a screenshot (note the 32 vcpu machines being used)

Screen Shot 2019-04-25 at 1 47 41 PM

I don't really think this is the issue though, because the load stays well below 1 normalized

I can take another look at the logs and see if I find anything

Do you want to continue debugging on the issue? Seems like this pr is useful regardless of what's causing the delay in events.

func parseEvent(e *clientv3.Event) (*event, error) {
if !e.IsCreate() && e.PrevKv == nil {
// If the previous value is nil, error. One example of how this is possible is if the previous value has been compacted already.
return nil, fmt.Errorf("etcd event was not a create and has a nil prevKV, key: %s, modRevision: %d, isDelete %v", string(e.Kv.Key), e.Kv.ModRevision, e.Type == clientv3.EventTypeDelete)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz can you sanity check that all non-create events should have a PrevKv set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we only use this method in one place, on a watch, started with the PrevKv option)

opts := []clientv3.OpOption{clientv3.WithRev(wc.initialRev + 1), clientv3.WithPrevKV()}

Copy link
Contributor

@jpbetz jpbetz Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like etcd not providing a prevKV if it was compacted was intentional. See my comment here: #76624 (comment)

I'm betting this really is due to compaction. It looks like we're using clientv3.WithPrevKV correctly.

@ryanmcnamara
Copy link
Author

@liggitt done

@liggitt
Copy link
Member

liggitt commented Apr 26, 2019

/priority critical-urgent
/ok-to-test

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 26, 2019
@liggitt
Copy link
Member

liggitt commented Apr 26, 2019

looks like hack/update-bazel.sh is needed

@jpbetz
Copy link
Contributor

jpbetz commented Apr 26, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 26, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 26, 2019
@liggitt
Copy link
Member

liggitt commented Apr 29, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, ryanmcnamara

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2019
@liggitt
Copy link
Member

liggitt commented Apr 29, 2019

once this merges, can you pick it to 1.12, 1.13, and 1.14 (using hack/cherry_pick_pull.sh)?

@liggitt
Copy link
Member

liggitt commented May 3, 2019

picks opened

k8s-ci-robot added a commit that referenced this pull request May 4, 2019
…5-upstream-release-1.12

Automated cherry pick of #76675: Error when etcd3 watch finds delete event with nil prevKV
k8s-ci-robot added a commit that referenced this pull request May 6, 2019
…5-upstream-release-1.14

Automated cherry pick of #76675: Error when etcd3 watch finds delete event with nil prevKV
k8s-ci-robot added a commit that referenced this pull request May 7, 2019
…5-upstream-release-1.13

Automated cherry pick of #76675: Error when etcd3 watch finds delete event with nil prevKV
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants