Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: Put brokers in maintenance mode before deleting orphan's pod #7530

Conversation

RafalKorepta
Copy link
Contributor

@RafalKorepta RafalKorepta commented Nov 27, 2022

During rolling update, before this change, Redpanda operator was calculating the difference between running pod specification and stateful set pod template. If the specification did not match the pod was deleted. From release v22.1.1 operator is configuring each broker with pod lifecycle hooks. In the PreStop hook the script will try to put broker into maintenance mode for 120 seconds before POD is terminated. Redpanda could not finish within 120 seconds to put one broker into maintenance mode.

This PR improves the situation by putting maintenance mode before POD is deleted. The putInMaintenanceMode function is called multiple times until Broker function returns correct status. The assumption is that REST admin API maintenance mode endpoint is idempotent.

When pod is successfully deleted statefulset would reschedule the pod with correct pod specification.

Ref

#4125
#3023

On top of:
#7528

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

Before deleting the POD maintenance mode is configured (not within 120 second lifecycle hook).

Release Notes

Improvements

  • Before deleting the POD maintenance mode is configured (not within 120 second lifecycle hook)

@RafalKorepta RafalKorepta requested a review from a team as a code owner November 27, 2022 11:00
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/put-in-maintanance-mode branch 3 times, most recently from 1bd32aa to 11470aa Compare November 29, 2022 23:30
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/put-in-maintanance-mode branch from 11470aa to 11db9ee Compare December 21, 2022 22:32
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/put-in-maintanance-mode branch 2 times, most recently from 8fc043f to d6df2bd Compare January 2, 2023 11:15
Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I wonder if we can cover this with some tests...

src/go/k8s/pkg/resources/statefulset_update.go Outdated Show resolved Hide resolved
src/go/k8s/pkg/resources/statefulset_update.go Outdated Show resolved Hide resolved
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/put-in-maintanance-mode branch 3 times, most recently from 3ca5de5 to 86045b8 Compare January 3, 2023 09:18
alenkacz
alenkacz previously approved these changes Jan 3, 2023
Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some small comments

return &RequeueAfterError{RequeueAfter: RequeueDuration, Msg: "wait for pod restart"}
}

//nolint:goerr113 // out of scope for this PR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR: let's finally disable this linter 🙏 :D

src/go/k8s/pkg/resources/statefulset_update.go Outdated Show resolved Hide resolved
src/go/k8s/pkg/utils/kubernetes.go Outdated Show resolved Hide resolved
alenkacz
alenkacz previously approved these changes Jan 3, 2023
Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

joejulian
joejulian previously approved these changes Jan 4, 2023
Copy link
Contributor

@joejulian joejulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvements!

Copy link
Member

@nicolaferraro nicolaferraro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment on the code, the rest looks good.


switch {
case br.Maintenance.Draining:
case br.Maintenance.Errors:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you intended to fallthrough here...

@RafalKorepta RafalKorepta dismissed stale reviews from joejulian and alenkacz via 8d89529 January 4, 2023 10:02
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/put-in-maintanance-mode branch from 13dd38f to 8d89529 Compare January 4, 2023 10:02
nicolaferraro
nicolaferraro previously approved these changes Jan 4, 2023
alenkacz
alenkacz previously approved these changes Jan 4, 2023
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/put-in-maintanance-mode branch from 8d89529 to eae69eb Compare January 4, 2023 17:30
During rolling update, before this change, Redpanda operator was calculating
the difference between running pod specification and stateful set pod template.
If the specification did not match the pod was deleted. From release v22.1.1
operator is configuring each broker with pod lifecycle hooks. In the PreStop
hook the script will try to put broker into maintenance mode for 120 seconds
before POD is terminated. Redpanda could not finish within 120 seconds to put
one broker into maintenance mode.

This PR improves the situation by putting maintenance mode before POD is
deleted. The `EnableMaintanaceMode` function is called multiple times until
`Broker` function returns correct status. The assumption is that REST admin API
maintenance mode endpoint is idempotent.

When pod is successfully deleted statefulset would reschedule the pod with
correct pod specification.

redpanda-data#4125
redpanda-data#3023
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/put-in-maintanance-mode branch from eae69eb to 3c34855 Compare January 5, 2023 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants