Skip to content

Commit

Permalink
tests: mitigate MaintenanceTest failures
Browse files Browse the repository at this point in the history
The real fix will be to make the leader balancer aware
of maintenance mode, but the test has become much more
unstable since recent leader balancer changes to do
more movements concurrently, so its worth mitigating
that.

The workaround is to set a short mute timeout so that
muting nodes has no real effect, and a short idle timeout
so that post-maintenance leader movements happen promptly.

Related: redpanda-data#4772
  • Loading branch information
jcsp committed Jul 8, 2022
1 parent acb4e2e commit 2c0ca27
Showing 1 changed file with 16 additions and 1 deletion.
17 changes: 16 additions & 1 deletion tests/rptest/tests/maintenance_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,22 @@ class MaintenanceTest(RedpandaTest):
TopicSpec(partition_count=20, replication_factor=3))

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
super().__init__(
*args,
extra_rp_conf={
# Leader balancer configuration changes are a workaround
# to https://github.com/redpanda-data/redpanda/issues/4772

# Faster leader balancer iteration to get partitions moved
# back to nodes leaving maintenance mode promptly.
'leader_balancer_idle_timeout': 5000,

# Mute timeout shorter than idle timeout: effectvely disable
# node muting. This enables nodes leaving maintenance mode
# to get leaderships moved to them promptly.
'leader_balancer_mute_timeout': 1000,
},
**kwargs)
self.admin = Admin(self.redpanda)
self.rpk = RpkTool(self.redpanda)
self._use_rpk = True
Expand Down

0 comments on commit 2c0ca27

Please sign in to comment.