Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make resourceVersion parameter semantics consistent across all storage.Interface implementations #72170

Merged
merged 1 commit into from
Oct 3, 2019

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Dec 18, 2018

[This is part of effort to improve Apiserver Resource Version Semantics (as previously circulated with sig-apimachinery) and fix the "stale read" issue]

Make the minimum resourceVersion semantics consistent across all storage.Interface implementations by:

  • enforcing "minimum RV" constraints in the etcd3 storage implementation per the storage.Interface documentation
  • modifying the etcd3 List() implementation to use "minimum RV" semantics instead of "exact RV" semantics when RV>0

Important: the watch cache is enabled by default on the kube-apiserver for all but a select few types (e.g. Events, CRDs) and so while API clients do not get to decide when they make requests what semantics they get, they typically will get the watch cache semantics.

Before:

Watch Cache (RV=0 & RV>0) etcd3 RV=0 etcd RV>0
Get "min RV" quorum read, RV ignored quorum read, RV ignored
List "min RV" quorum read, RV ignored "exact RV"
GetToList "min RV" quorum read, RV ignored quorum read, RV ignored
Watch "min RV" "min RV" "min RV"

Note that "quorum read, RV ignored" behaves like "min RV" for all RVs that were retrieved from some previous operation on the store because quorum reads are guaranteed to be processed with the latest RV. So it appears from these semantics that "min RV" semantics are pervasive across both implementations and what we should converge to.

After:

Watch Cache (RV=0 & RV>0) etcd3 (RV=0 & RV>0)
Get "min RV" "min RV"
List "min RV" "min RV"
GetToList "min RV" "min RV"
Watch "min RV" "min RV"

The change from "exact RV" to "min RV" for the List operation is the most notable change here. But given that the watch cache is enabled by default for all but a select few types, and that the semantics change here would only be visible if providing a non-zero RV, we don't expect this change to be particularly risky/visible. And we believe it to be lower risk than leaving the semantic inconsistent across implementations, esp. given that clients don't necessarily have any way of knowing if the watch cache is enabled and what semantics to expect.

As part of this change we're introducing a ResourceVersionTooLargeError type to the storage interface so that both the watch cache and etcd3 return the same error type if unable to serve a requested resource version because it's too high. Previously the watch cache would return a timeout error in the case because it attempts to wait for 3 seconds to see if the "too high" resource version might appear if it waits a bit. I've done a quick review and it looks like this change is low risk, but if reviewers know of reasons why this might be breaking, please let me know.

The resource version option, when passed to a list call, is now consistently interpreted as the minimum allowed resource version.  Previously when listing resources that had the watch cache disabled clients could retrieve a snapshot at that exact resource version.  If the client requests a resource version newer than the current state, a TimeoutError is returned suggesting the client retry in a few seconds.  This behavior is now consistent for both single item retrieval and list calls, and for when the watch cache is enabled or disabled.

cc @cheftako @lavalamp @wojtek-t @liggitt @smarterclayton

@jpbetz jpbetz added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Dec 18, 2018
@jpbetz jpbetz self-assigned this Dec 18, 2018
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 18, 2018
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 18, 2018
@jpbetz jpbetz added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Dec 18, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 18, 2018
@jpbetz jpbetz added this to the v1.14 milestone Dec 18, 2018
@jpbetz
Copy link
Contributor Author

jpbetz commented Dec 19, 2018

/test pull-kubernetes-integration

1 similar comment
@jpbetz
Copy link
Contributor Author

jpbetz commented Dec 19, 2018

/test pull-kubernetes-integration

@jpbetz
Copy link
Contributor Author

jpbetz commented Dec 19, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@jpbetz jpbetz changed the title [WIP] Make resourceVersion parameter semantics consistent across all storage.Interface implementations Make resourceVersion parameter semantics consistent across all storage.Interface implementations Dec 19, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 19, 2018
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me - I just left one comment.

But I would like at least someone else to take a look into that too.
@liggitt @smarterclayton

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go Outdated Show resolved Hide resolved
@jpbetz jpbetz force-pushed the stale-read-2 branch 3 times, most recently from c9a3909 to 1b2c3b7 Compare September 20, 2019 18:20
@jpbetz
Copy link
Contributor Author

jpbetz commented Sep 20, 2019

This is ready for another review pass.

@jpbetz
Copy link
Contributor Author

jpbetz commented Sep 23, 2019

friendly nudge

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor nits - other than that LGTM.
Though I would also wait for @liggitt to make a final look too.

@jpbetz
Copy link
Contributor Author

jpbetz commented Sep 24, 2019

/test pull-kubernetes-dependencies

@jpbetz
Copy link
Contributor Author

jpbetz commented Sep 27, 2019

Friendly nudge

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 1, 2019

But given that the watch cache is enabled by default for all but a select few types, and that the semantics change here would only be visible if providing a non-zero RV, we don't expect this change to be particularly risky/visible

I'm really hoping this is true. Is watch cache enabled for CRDs by default?

Edit: looks like not - so are we really sure no one is relying on this?

@liggitt
Copy link
Member

liggitt commented Oct 1, 2019

I'm really hoping this is true. Is watch cache enabled for CRDs by default?

Edit: looks like not - so are we really sure no one is relying on this?

Yes, it is. I know because we found several watch-cache-related flakes testing custom resources.

#46967, #64796, #78713

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 1, 2019

I'm really hoping this is true. Is watch cache enabled for CRDs by default?
Edit: looks like not - so are we really sure no one is relying on this?

There is no in-tree code relying on it. We can't entirely eliminate the possibility there is out-of-tree code relying on it. The conditions they'd have to meet are quite specific: (1) they've disabled the watch cache, or are using events, and (2) they're performing a List(... ResourceVersion: SomeSpecificRevision) request and (3) they cannot tolerate receiving a more recent revision than the one they requested.

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 1, 2019

/retest

@smarterclayton
Copy link
Contributor

So if we had to bring back exact semantics? Say we were wrong, we break a large chunk of people, they need a fix. How do we make them whole?

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 2, 2019

So if we had to bring back exact semantics? Say we were wrong, we break a large chunk of people, they need a fix. How do we make them whole?

Worst case it that it's widespread (i.e. we really misread the situation), in which case we could revert the part of this change that applies to lists. We'd then have to regroup and figure out how to deprecate the old behavior and migrate.

Slightly less improbable is that there are a limited set of use cases, but they're important and cannot easily migrate on to "minimum RV" semantics. In this case I suppose we could introduce a separate param to GetOptions to support exact RV semantics. I'd really like to avoid doing that, but it would be an option..

@smarterclayton
Copy link
Contributor

I updated the release note, please double check it for me to see if you agree.

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 2, 2019

I updated the release note, please double check it for me to see if you agree.

Release note looks great. Thanks!

@smarterclayton
Copy link
Contributor

/lgtm
/approve

I hope I don't regret this.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 2, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 2, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.