-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s: Put brokers in maintenance mode before deleting orphan's pod #7530
k8s: Put brokers in maintenance mode before deleting orphan's pod #7530
Conversation
1bd32aa
to
11470aa
Compare
11470aa
to
11db9ee
Compare
8fc043f
to
d6df2bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I wonder if we can cover this with some tests...
3ca5de5
to
86045b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some small comments
return &RequeueAfterError{RequeueAfter: RequeueDuration, Msg: "wait for pod restart"} | ||
} | ||
|
||
//nolint:goerr113 // out of scope for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR: let's finally disable this linter 🙏 :D
86045b8
to
13dd38f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice improvements!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment on the code, the rest looks good.
|
||
switch { | ||
case br.Maintenance.Draining: | ||
case br.Maintenance.Errors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you intended to fallthrough
here...
13dd38f
to
8d89529
Compare
eae69eb
8d89529
to
eae69eb
Compare
During rolling update, before this change, Redpanda operator was calculating the difference between running pod specification and stateful set pod template. If the specification did not match the pod was deleted. From release v22.1.1 operator is configuring each broker with pod lifecycle hooks. In the PreStop hook the script will try to put broker into maintenance mode for 120 seconds before POD is terminated. Redpanda could not finish within 120 seconds to put one broker into maintenance mode. This PR improves the situation by putting maintenance mode before POD is deleted. The `EnableMaintanaceMode` function is called multiple times until `Broker` function returns correct status. The assumption is that REST admin API maintenance mode endpoint is idempotent. When pod is successfully deleted statefulset would reschedule the pod with correct pod specification. redpanda-data#4125 redpanda-data#3023
eae69eb
to
3c34855
Compare
During rolling update, before this change, Redpanda operator was calculating the difference between running pod specification and stateful set pod template. If the specification did not match the pod was deleted. From release v22.1.1 operator is configuring each broker with pod lifecycle hooks. In the PreStop hook the script will try to put broker into maintenance mode for 120 seconds before POD is terminated. Redpanda could not finish within 120 seconds to put one broker into maintenance mode.
This PR improves the situation by putting maintenance mode before POD is deleted. The
putInMaintenanceMode
function is called multiple times untilBroker
function returns correct status. The assumption is that REST admin API maintenance mode endpoint is idempotent.When pod is successfully deleted statefulset would reschedule the pod with correct pod specification.
Ref
#4125
#3023
On top of:
#7528
Backports Required
UX Changes
Before deleting the POD maintenance mode is configured (not within 120 second lifecycle hook).
Release Notes
Improvements