Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update announcing-etcd-3.4.md #16205

Merged
merged 1 commit into from
Sep 13, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/en/blog/_posts/2019-08-30-announcing-etcd-3.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ etcd v3.4 includes a number of performance improvements for large scale Kubernet

In particular, etcd experienced performance issues with a large number of concurrent read transactions even when there is no write (e.g. `“read-only range request ... took too long to execute”`). Previously, the storage backend commit operation on pending writes blocks incoming read transactions, even when there was no pending write. Now, the commit [does not block reads](https://github.com/etcd-io/etcd/pull/9296) which improve long-running read transaction performance.

We further made [backend read transactions fully concurrent](https://github.com/etcd-io/etcd/pull/10523). Previously, ongoing long-running read transactions block writes and upcoming reads. With this change, write throughput is increased by 70% and P99 write latency is reduced by 90% in the presence of long-running reads. We also ran [Kubernetes 5000-node scalability test on GCE](https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1130745634945503235) with this change and observed similar improvements. For example, in the very beginning of the test where there are a lot of long-running “LIST pods”, the P99 latency of “POST clusterrolebindings” is [reduced by 97.4%](https://github.com/etcd-io/etcd/pull/10523#issuecomment-499262001). This non-blocking read transaction is now [used for compaction](https://github.com/etcd-io/etcd/pull/11034), which, combined with the reduced compaction batch size, reduces the P99 server request latency during compaction.
We further made [backend read transactions fully concurrent](https://github.com/etcd-io/etcd/pull/10523). Previously, ongoing long-running read transactions block writes and upcoming reads. With this change, write throughput is increased by 70% and P99 write latency is reduced by 90% in the presence of long-running reads. We also ran [Kubernetes 5000-node scalability test on GCE](https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1130745634945503235) with this change and observed similar improvements. For example, in the very beginning of the test where there are a lot of long-running “LIST pods”, the P99 latency of “POST clusterrolebindings” is [reduced by 97.4%](https://github.com/etcd-io/etcd/pull/10523#issuecomment-499262001).

More improvements have been made to lease storage. We enhanced [lease expire/revoke performance](https://github.com/etcd-io/etcd/pull/9418) by storing lease objects more efficiently, and made [lease look-up operation non-blocking](https://github.com/etcd-io/etcd/pull/9229) with current lease grant/revoke operation. And etcd v3.4 introduces [lease checkpoint](https://github.com/etcd-io/etcd/pull/9924) as an experimental feature to persist remaining time-to-live values through consensus. This ensures short-lived lease objects are not auto-renewed after leadership election. This also prevents lease object pile-up when the time-to-live value is relatively large (e.g. [1-hour TTL never expired in Kubernetes use case](https://github.com/kubernetes/kubernetes/issues/65497)).

Expand Down