Skip to content

Commit

Permalink
cluster: fix shutdown hang in health_monitor_backend
Browse files Browse the repository at this point in the history
If refresh_cluster_health_cache was waiting on _refresh_mutex
while ::stop ran, and another fiber had a refresh in progress,
then ::stop cancels the other fiber's refresh + the first fiber
proceeds to try and refresh again, holding the gate open
while ::stop is waiting for it to close.

Fixes redpanda-data#5178

(cherry picked from commit d32c9a0)
  • Loading branch information
jcsp authored and vbotbuildovich committed Aug 1, 2022
1 parent 08e95c3 commit 6e75078
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions src/v/cluster/health_monitor_backend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ ss::future<> health_monitor_backend::stop() {
_leadership_notification_handle);

auto f = _gate.close();
_refresh_mutex.broken();
abort_current_refresh();
_tick_timer.cancel();

Expand Down Expand Up @@ -426,6 +427,9 @@ health_monitor_backend::maybe_refresh_cluster_health(
err.message());
co_return err;
}
} catch (const ss::broken_semaphore&) {
// Refresh was waiting on _refresh_mutex during shutdown
co_return errc::shutting_down;
} catch (const ss::timed_out_error&) {
vlog(
clusterlog.info,
Expand Down

0 comments on commit 6e75078

Please sign in to comment.