cluster: Stale leadership in partition_leaders_table on leader (failure in ClusterConfigTest.test_restart
)
#3486
Labels
ClusterConfigTest.test_restart
)
#3486
This is manifesting as a failure of ClusterConfigTest.test_restart because that test checks for convergence of config versions, and config versions only get updated if nodes can see a controller leader. This might be destabilizing other tests too, if they have timeouts that rely on controller leader being available within a certain time.
https://buildkite.com/vectorized/redpanda/builds/6142#3eb5ad1e-c519-4c16-b39c-5355ff4cf590
After an election has succeeded, the metadata dissemination service's ticker is still using the content of node health reports to set leadership. If the last node in the list of node health reports is saying leader=null, then this continuously overrides the local partition leader table until the next round of health reports come in.
This behavior was introduced in #3355
The text was updated successfully, but these errors were encountered: