Skip to content

Commit

Permalink
tests: mitigate MaintenanceTest failures
Browse files Browse the repository at this point in the history
The real fix will be to make the leader balancer aware
of maintenance mode, but the test has become much more
unstable since recent leader balancer changes to do
more movements concurrently, so its worth mitigating
that.

The workaround is to set a short mute timeout so that
muting nodes has no real effect, and a short idle timeout
so that post-maintenance leader movements happen promptly.

Related: redpanda-data#4772
  • Loading branch information
jcsp authored and BenPope committed Jul 13, 2022
1 parent 503ee29 commit dca8797
Showing 1 changed file with 16 additions and 1 deletion.
17 changes: 16 additions & 1 deletion tests/rptest/tests/maintenance_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,22 @@ class MaintenanceTest(RedpandaTest):
TopicSpec(partition_count=20, replication_factor=3))

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
super().__init__(
*args,
extra_rp_conf={
# Leader balancer configuration changes are a workaround
# to https://github.com/redpanda-data/redpanda/issues/4772

# Faster leader balancer iteration to get partitions moved
# back to nodes leaving maintenance mode promptly.
'leader_balancer_idle_timeout': 5000,

# Mute timeout shorter than idle timeout: effectvely disable
# node muting. This enables nodes leaving maintenance mode
# to get leaderships moved to them promptly.
'leader_balancer_mute_timeout': 1000,
},
**kwargs)
self.admin = Admin(self.redpanda)
self.rpk = RpkTool(self.redpanda)
self._use_rpk = True
Expand Down

0 comments on commit dca8797

Please sign in to comment.