Early exit auth check on lease puts #16005

tjungblu · 2023-06-05T06:54:14Z

Mitigates #15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

jmhbnz

Hey @tjungblu nice work looking into this!

Any chance you could replicate the metrics in the linked issue and post here so we could compare before and after?

serathius · 2023-06-05T08:39:45Z

server/etcdserver/apply/apply_auth.go

@@ -125,6 +125,11 @@ func (aa *authApplierV3) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevoke
 func (aa *authApplierV3) checkLeasePuts(leaseID lease.LeaseID) error {
 	l := aa.lessor.Lookup(leaseID)
 	if l != nil {
+		// early return for most-common scenario of either disabled auth or admin user
+		if aa.as.IsAdminPermitted(&aa.authInfo) == nil {
+			return nil


Do we have a test covering this case?

Neither apply.go, nor apply_auth.go have unit test coverage- there are no unit tests as far as I can tell.

My suggestion would be to add ArePutsPermitted method that will for the range and add tests for it.

serathius · 2023-06-05T08:41:07Z

server/etcdserver/apply/apply_auth.go

@@ -125,6 +125,11 @@ func (aa *authApplierV3) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevoke
 func (aa *authApplierV3) checkLeasePuts(leaseID lease.LeaseID) error {
 	l := aa.lessor.Lookup(leaseID)
 	if l != nil {
+		// early return for most-common scenario of either disabled auth or admin user
+		if aa.as.IsAdminPermitted(&aa.authInfo) == nil {


It's not obvious from proposed check that it covers case that auth is disabled. From code design it seems strange for me that authApplierV3 will be used at all if auth is disabled.

I was surprised too, it even goes through an rw mutex everytime it checks...

#15993 (comment)

I reckon we should discuss a bigger refactoring for the auth system, potentially?

Maybe, depends on whether we want to backport this fix. This seems like a trivial but important change. Appliers changed a lot between v3.5 and main.

From code design it seems strange for me that authApplierV3 will be used at all if auth is disabled.

I reckon we should discuss a bigger refactoring for the auth system, potentially?

Because the existing chain of applier is static. If we want to get rid of authApplier completely when auth isn't enabled, then we need to make the chain dynamic. We don't have to do it, please feel free to evaluate the effort and impact separately if anyone is interested.

It's not obvious from proposed check that it covers case that auth is disabled.

It's a common "issue" in all existing (*authStore) IsXXXPermitted(...) error methods, when auth isn't enabled, they return nil.

depends on whether we want to backport this fix

I think we need to backport the fix.

we need to make the chain dynamic.

I haven't researched too deeply yet, but the whole system is already dynamic because the whole auth goes through raft (even enable/disable). I would've expected the enable/disable methods to be a configuration for startup, not a dynamic apply request. Hence the static implementation, I guess.

I'll open a discussion issue up as a feature request, will do some benchmarks in the meantime around the impact of either approach.

turns out we're already swapping out the applier for corruption and nospace alerts:
https://github.com/etcd-io/etcd/blob/release-3.5/server/etcdserver/apply.go#L752-L760

ofc the whole thing isn't locked 🗡️

The link you provide just replaces the whole applier chain with applierV3Capped or applierV3Corrupt. But you can't change (swap out) part of the chain: authApplier -> quotaApplier -> backendApplier.

It's similar to string in golang, it's immutable. You can't modify part of a string; instead, you can only replace the whole string.

Just as I mentioned previously, we don't have to make the applier chain dynamic. But of course, it's open to discussion in a separate session.

No doubt on the immutability of the chain, but the pointer to the applier struct needs to be guarded by a mutex when you swap it out from another goroutine?

https://github.com/etcd-io/etcd/blob/release-3.5/server/etcdserver/server.go#L252

let me try to tease the race detector on this case more specifically. Not that it matters much, if you're corrupted/OOS it's game over anyway, it just tickled my inner race detector.

ahrtr · 2023-06-05T12:31:50Z

The PR looks good to me. @tjungblu did you compare the performance before and after applying this PR?

tjungblu · 2023-06-05T13:42:56Z

running this entirely local with the test supplied from the issue, I can see the p99 latency to be consistently good:

here's the same result without the patch right afterwards in comparison:

The p99 looks very low in absolute by prometheus because of the scraping interval and one minute resolution, it actually degrades almost immediately into 100ms+ latency after 10k entries:

MAX latency: 103.208013ms, entries: 11405

which is basically a few seconds into the unit test.

Just for completeness with a comparison of the same metric used in the issue report also stays flat over the whole execution time:

Even though I'm not entirely sure whether it makes sense to sum over the rate duration sum here.

ahrtr · 2023-06-06T01:54:25Z

server/etcdserver/apply/apply_auth.go

@@ -125,6 +125,11 @@ func (aa *authApplierV3) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevoke
 func (aa *authApplierV3) checkLeasePuts(leaseID lease.LeaseID) error {
 	l := aa.lessor.Lookup(leaseID)
 	if l != nil {
+		// early return for most-common scenario of either disabled auth or admin user
+		if aa.as.IsAdminPermitted(&aa.authInfo) == nil {


Make it explicit that we check the returned error

Suggested change

if aa.as.IsAdminPermitted(&aa.authInfo) == nil {

if err := aa.as.IsAdminPermitted(&aa.authInfo); err == nil {

yep, let me update the existing commit

Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

ahrtr

LGTM

Thanks @tjungblu

It's a straightforward change, and also a low-hang fruit.

Could you please double check also backport this PR to 3.5 and 3.4 if needed? @tjungblu Of course, waiting for other maintainers approve and merge this PR.

tjungblu · 2023-06-06T09:08:13Z

@marseel I assume you would need this at the very least in the next 3.5.x release?

marseel · 2023-06-06T09:17:02Z

3.5.x release would be great.

tjungblu · 2023-06-06T09:17:51Z

cool, then I'll get to write a backport. Thanks everyone!

serathius · 2023-06-06T09:43:21Z

One think before merging. @tjungblu I know we don't have tests, however I would feel much safer for backports if we could add some even trivial test.

Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Backport of etcd-io#16005 Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

tjungblu · 2023-06-06T09:50:09Z

Totally makes sense, let me try to add one right here as a separate commit. I'll cherry-pick it down for the backport PRs then.

serathius

Can we add a trivial test to make backports safer?

tjungblu · 2023-06-06T13:35:36Z

Added a new test, it's a little more involved to mock out the required parts to test this properly. Anybody, if there's a slightly easier way to enable that test (while fixing the perf issue and not introducing cyclical dependencies) - please let me know.

tjungblu · 2023-06-15T08:37:42Z

@serathius can we move this forward? Is the testcase sufficient for you or too much refactoring already?

serathius · 2023-06-15T09:02:55Z

server/auth/store_mock.go

@@ -14,7 +14,39 @@



Don't move test file to normal file. Don't share mocks outside of package.

serathius · 2023-06-15T09:05:40Z

server/etcdserver/apply/apply_auth_test.go

+)
+
+func TestCheckLeasePutsKeys(t *testing.T) {
+	aa := authApplierV3{as: auth.NewAuthStore(zaptest.NewLogger(t), auth.NewBackendMock(), &auth.TokenNop{}, 10)}


Don't borrow mocks just use normal backend.

serathius · 2023-06-15T09:09:07Z

server/auth/nop.go

@@ -18,18 +18,18 @@ import (
 	"context"
 )

-type tokenNop struct{}
+type TokenNop struct{}


Don't expose internal testing code from package! Use proper TokenProvider in external tests.

server/etcdserver/apply/apply_auth_test.go

serathius · 2023-06-15T13:15:31Z

Overall looks very good! Tests you added are awesome! Left some small comments.

This contains a slight refactoring to expose enough information to write meaningful tests for auth applier v3. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

tjungblu · 2023-06-15T13:58:25Z

fixed the goimport manually, I reckon make fix doesn't actually do anything?
edit: nvm, the linter just went out of memory 🤣

serathius · 2023-06-15T14:51:25Z

fixed the goimport manually, I reckon make fix doesn't actually do anything?

Not all make fix-* are implemented for all make verify-* so yea, make fix needs some attention. cc @jmhbnz

Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Backport of etcd-io#16005 Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

jmhbnz reviewed Jun 5, 2023

View reviewed changes

serathius reviewed Jun 5, 2023

View reviewed changes

tjungblu added the area/performance label Jun 5, 2023

ahrtr reviewed Jun 6, 2023

View reviewed changes

Early exit auth check on lease puts

dfbe203

Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

tjungblu force-pushed the putauthshort branch from 014b17d to dfbe203 Compare June 6, 2023 08:24

ahrtr added backport/v3.4 backport/v3.5 labels Jun 6, 2023

ahrtr approved these changes Jun 6, 2023

View reviewed changes

serathius approved these changes Jun 6, 2023

View reviewed changes

jmhbnz approved these changes Jun 6, 2023

View reviewed changes

tjungblu mentioned this pull request Jun 6, 2023

[3.5] Early exit auth check on lease puts #16019

Merged

tjungblu mentioned this pull request Jun 6, 2023

[3.4] Early exit auth check on lease puts #16020

Merged

serathius requested changes Jun 6, 2023

View reviewed changes

tjungblu force-pushed the putauthshort branch 2 times, most recently from 7d43b78 to 1d473d5 Compare June 6, 2023 14:02

tjungblu mentioned this pull request Jun 8, 2023

Auth performance and maintenance improvements #16036

Open

serathius reviewed Jun 15, 2023

View reviewed changes

tjungblu force-pushed the putauthshort branch from 1d473d5 to 27a3e2a Compare June 15, 2023 13:00

serathius reviewed Jun 15, 2023

View reviewed changes

server/etcdserver/apply/apply_auth_test.go Outdated Show resolved Hide resolved

serathius reviewed Jun 15, 2023

View reviewed changes

server/etcdserver/apply/apply_auth_test.go Outdated Show resolved Hide resolved

tjungblu force-pushed the putauthshort branch from 27a3e2a to d53679c Compare June 15, 2023 13:48

Add first unit test for authApplierV3

84a9af1

This contains a slight refactoring to expose enough information to write meaningful tests for auth applier v3. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

tjungblu force-pushed the putauthshort branch from d53679c to 84a9af1 Compare June 15, 2023 13:58

serathius approved these changes Jun 15, 2023

View reviewed changes

serathius added the stage/merge-when-tests-green label Jun 15, 2023

serathius merged commit cb3730a into etcd-io:main Jun 16, 2023

tjungblu deleted the putauthshort branch June 16, 2023 06:55

serathius mentioned this pull request Jun 16, 2023

etcdserver: add tests for apply_auth.go #16086

Merged

serathius mentioned this pull request Sep 18, 2023

Put operation with lease takes linear apply time depending on number of keys already attached to lease #15993

Closed

serathius mentioned this pull request Oct 12, 2023

Plan release v3.5.10 #16733

Closed

	if aa.as.IsAdminPermitted(&aa.authInfo) == nil {
	if err := aa.as.IsAdminPermitted(&aa.authInfo); err == nil {

Early exit auth check on lease puts #16005

Early exit auth check on lease puts #16005

Conversation

tjungblu commented Jun 5, 2023

jmhbnz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr commented Jun 5, 2023

tjungblu commented Jun 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr left a comment

Choose a reason for hiding this comment

tjungblu commented Jun 6, 2023

marseel commented Jun 6, 2023

tjungblu commented Jun 6, 2023

serathius commented Jun 6, 2023

tjungblu commented Jun 6, 2023

serathius left a comment

Choose a reason for hiding this comment

tjungblu commented Jun 6, 2023

tjungblu commented Jun 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius commented Jun 15, 2023

tjungblu commented Jun 15, 2023 • edited Loading

serathius commented Jun 15, 2023

tjungblu commented Jun 15, 2023 •

edited

Loading