Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to cancel PKI tidy operations, pause between tidying certs #16958

Merged
merged 8 commits into from
Aug 31, 2022

Conversation

cipherboy
Copy link
Contributor

This adds two features to the PKI tidy operation that become useful with auto-tidy:

  1. The ability to cancel running tidy operations, freeing up resources on the cluster.
  2. The ability to pause between processing individual certificates. When handling the revoked certificates, this allows the operation to release the revoked-store lock, allowing new revocations to continue. This can reduce the resource utilization on a cluster during a tidy operation, but needs to be tuned carefully so not to have too much lock contention but also not cause the tidy to take too long.

When tidy operations take a long time to execute (and especially when
executing them automatically), having the ability to cancel them becomes
useful to reduce strain on Vault clusters (and let them be rescheduled
at a later time).

To this end, we add the /tidy-cancel write endpoint.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
By setting pause_duration, operators can have a little control over the
resource utilization of a tidy operation. While the list of certificates
remain in memory throughout the entire operation, a pause is added
between processing certificates and the revocation lock is released.
This allows other operations to occur during this gap and potentially
allows the tidy operation to consume less resources per unit of time
(due to the sleep -- though obviously consumes the same resources over
the time of the operation).

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
@cipherboy cipherboy marked this pull request as ready for review August 31, 2022 16:59
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
builtin/logical/pki/path_tidy.go Show resolved Hide resolved
builtin/logical/pki/path_tidy.go Outdated Show resolved Hide resolved
@@ -164,6 +194,11 @@ func (b *backend) startTidyOperation(req *logical.Request, config *tidyConfig) {
}
}

// Check for cancel before continuing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also schedule if case a cancel comes in after the doTidy func but before we release tidyCASGuard? -> defer atomic.StoreUint32(b.tidyCancelCAS, 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything, I think we just need a atomic.CompareAndSwapUint32(b.tidyCancelCAS, 1, 0) at the top of this function before getting the tidyCASGuard. otherwise, I don't think it really matters, IMO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that would work as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that's correct because it ensures that if a cancel somehow came in after we finished, we'd want any new ones to start fresh).

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Copy link
Contributor

@stevendpclark stevendpclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cipherboy cipherboy enabled auto-merge (squash) August 31, 2022 18:35
@cipherboy cipherboy merged commit 76d89fd into main Aug 31, 2022
@cipherboy cipherboy deleted the cipherboy-add-tidy-cancel branch December 1, 2022 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants