Add 'maintenance mode' for redpanda nodes #2093

jcsp · 2021-08-18T16:41:40Z

Now that we're doing more complicated cluster management features like leadership rebalancing, it becomes increasingly troublesome to have unresponsive nodes in the system:

they disrupt or break collective algorithms, e.g. leadership rebalancing sees down nodes as nodes with no leaders & therefore great places to attempt to migrate leadership to.
they generate log noise from errors connecting

When the administrator is intentionally stopping nodes, we should let them inform the cluster and thereby avoid these issues.

When entering maintenance mode, nodes should give up leaderships (an abdicate admin API is added in #1936).

In raft, we should still send heartbeats to nodes in maintenance mode, to enable them to catch up before being brought back into normal service. However, it would be nice to avoid emitting connection errors to the log for nodes in maintenance mode -- this might be something to implement in the RPC layer.

The text was updated successfully, but these errors were encountered:

jcsp · 2021-08-19T15:35:29Z

Related: #2092

jcsp · 2021-11-19T15:47:47Z

Duplicated by #3020

dotnwat · 2021-11-19T15:50:53Z

Duplicated by #3020

Oh, thanks I didn't think to search first :/ I think there are some bits in here worth moving over to 3020 too. I can do that.

jcsp added kind/enhance New feature or request area/redpanda labels Aug 18, 2021

jcsp mentioned this issue Nov 19, 2021

Add per-node maintenance mode #3020

Closed

jcsp closed this as completed Nov 19, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'maintenance mode' for redpanda nodes #2093

Add 'maintenance mode' for redpanda nodes #2093

jcsp commented Aug 18, 2021 •

edited

Loading

jcsp commented Aug 19, 2021

jcsp commented Nov 19, 2021

dotnwat commented Nov 19, 2021

Add 'maintenance mode' for redpanda nodes #2093

Add 'maintenance mode' for redpanda nodes #2093

Comments

jcsp commented Aug 18, 2021 • edited Loading

jcsp commented Aug 19, 2021

jcsp commented Nov 19, 2021

dotnwat commented Nov 19, 2021

jcsp commented Aug 18, 2021 •

edited

Loading