-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early exit auth check on lease puts #16005
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tjungblu nice work looking into this!
Any chance you could replicate the metrics in the linked issue and post here so we could compare before and after?
@@ -125,6 +125,11 @@ func (aa *authApplierV3) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevoke | |||
func (aa *authApplierV3) checkLeasePuts(leaseID lease.LeaseID) error { | |||
l := aa.lessor.Lookup(leaseID) | |||
if l != nil { | |||
// early return for most-common scenario of either disabled auth or admin user | |||
if aa.as.IsAdminPermitted(&aa.authInfo) == nil { | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test covering this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither apply.go, nor apply_auth.go have unit test coverage- there are no unit tests as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to add ArePutsPermitted
method that will for the range and add tests for it.
@@ -125,6 +125,11 @@ func (aa *authApplierV3) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevoke | |||
func (aa *authApplierV3) checkLeasePuts(leaseID lease.LeaseID) error { | |||
l := aa.lessor.Lookup(leaseID) | |||
if l != nil { | |||
// early return for most-common scenario of either disabled auth or admin user | |||
if aa.as.IsAdminPermitted(&aa.authInfo) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious from proposed check that it covers case that auth is disabled. From code design it seems strange for me that authApplierV3
will be used at all if auth is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised too, it even goes through an rw mutex everytime it checks...
I reckon we should discuss a bigger refactoring for the auth system, potentially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, depends on whether we want to backport this fix. This seems like a trivial but important change. Appliers changed a lot between v3.5 and main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From code design it seems strange for me that authApplierV3 will be used at all if auth is disabled.
I reckon we should discuss a bigger refactoring for the auth system, potentially?
Because the existing chain of applier is static. If we want to get rid of authApplier completely when auth isn't enabled, then we need to make the chain dynamic. We don't have to do it, please feel free to evaluate the effort and impact separately if anyone is interested.
It's not obvious from proposed check that it covers case that auth is disabled.
It's a common "issue" in all existing (*authStore) IsXXXPermitted(...) error
methods, when auth isn't enabled, they return nil.
depends on whether we want to backport this fix
I think we need to backport the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make the chain dynamic.
I haven't researched too deeply yet, but the whole system is already dynamic because the whole auth goes through raft (even enable/disable). I would've expected the enable/disable methods to be a configuration for startup, not a dynamic apply request. Hence the static implementation, I guess.
I'll open a discussion issue up as a feature request, will do some benchmarks in the meantime around the impact of either approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out we're already swapping out the applier for corruption and nospace alerts:
https://github.com/etcd-io/etcd/blob/release-3.5/server/etcdserver/apply.go#L752-L760
ofc the whole thing isn't locked 🗡️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link you provide just replaces the whole applier chain with applierV3Capped
or applierV3Corrupt
. But you can't change (swap out) part of the chain: authApplier -> quotaApplier -> backendApplier
.
It's similar to string
in golang, it's immutable. You can't modify part of a string; instead, you can only replace the whole string.
Just as I mentioned previously, we don't have to make the applier chain dynamic. But of course, it's open to discussion in a separate session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No doubt on the immutability of the chain, but the pointer to the applier struct needs to be guarded by a mutex when you swap it out from another goroutine?
https://github.com/etcd-io/etcd/blob/release-3.5/server/etcdserver/server.go#L252
let me try to tease the race detector on this case more specifically. Not that it matters much, if you're corrupted/OOS it's game over anyway, it just tickled my inner race detector.
The PR looks good to me. @tjungblu did you compare the performance before and after applying this PR? |
@@ -125,6 +125,11 @@ func (aa *authApplierV3) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevoke | |||
func (aa *authApplierV3) checkLeasePuts(leaseID lease.LeaseID) error { | |||
l := aa.lessor.Lookup(leaseID) | |||
if l != nil { | |||
// early return for most-common scenario of either disabled auth or admin user | |||
if aa.as.IsAdminPermitted(&aa.authInfo) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it explicit that we check the returned error
if aa.as.IsAdminPermitted(&aa.authInfo) == nil { | |
if err := aa.as.IsAdminPermitted(&aa.authInfo); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, let me update the existing commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marseel I assume you would need this at the very least in the next 3.5.x release? |
3.5.x release would be great. |
cool, then I'll get to write a backport. Thanks everyone! |
One think before merging. @tjungblu I know we don't have tests, however I would feel much safer for backports if we could add some even trivial test. |
Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Backport of etcd-io#16005 Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Backport of etcd-io#16005 Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
Totally makes sense, let me try to add one right here as a separate commit. I'll cherry-pick it down for the backport PRs then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a trivial test to make backports safer?
Added a new test, it's a little more involved to mock out the required parts to test this properly. Anybody, if there's a slightly easier way to enable that test (while fixing the perf issue and not introducing cyclical dependencies) - please let me know. |
7d43b78
to
1d473d5
Compare
@serathius can we move this forward? Is the testcase sufficient for you or too much refactoring already? |
server/auth/store_mock.go
Outdated
@@ -14,7 +14,39 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't move test file to normal file. Don't share mocks outside of package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
) | ||
|
||
func TestCheckLeasePutsKeys(t *testing.T) { | ||
aa := authApplierV3{as: auth.NewAuthStore(zaptest.NewLogger(t), auth.NewBackendMock(), &auth.TokenNop{}, 10)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't borrow mocks just use normal backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
server/auth/nop.go
Outdated
@@ -18,18 +18,18 @@ import ( | |||
"context" | |||
) | |||
|
|||
type tokenNop struct{} | |||
type TokenNop struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't expose internal testing code from package! Use proper TokenProvider in external tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
Overall looks very good! Tests you added are awesome! Left some small comments. |
This contains a slight refactoring to expose enough information to write meaningful tests for auth applier v3. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
fixed the goimport manually, I reckon |
Not all |
Mitigates etcd-io#15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method. Backport of etcd-io#16005 Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
Mitigates #15993 by not checking each key individually for permission when auth is entirely disabled or admin user is calling the method.
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.