From e9903d51e996d443ff03c1e831278328d0a85bdf Mon Sep 17 00:00:00 2001 From: Andrew Wong Date: Fri, 29 Jul 2022 15:29:46 -0700 Subject: [PATCH] heartbeat_manager: reduce log severity when missing partition The heartbeat manager logs an error message when it attempts parses a bad heartbeat response that includes a partition it doesn't know about. A similar message is logged at the debug level when handling a good response that includes a partition it doesn't know about, since the following is a valid series of events: 1. replica R becomes leader of topic partition P 2. R sends out heartbeat requests to P's followers 3. an admin deletes P 4. the controller leader sends the requests to delete P 5. R shuts down its consensus 6. the reply to P's heartbeat is received by R, but the partition no longer exsts This sequence is possible regardless whether or not the request was successfully sent out. With the current logging, we log an error e.g. if before step 6, the request timed out, and we process the response. This commit makes this log line less severe, instead logging at the warn level but with some context explaining when we might expect the log. --- src/v/raft/heartbeat_manager.cc | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/v/raft/heartbeat_manager.cc b/src/v/raft/heartbeat_manager.cc index 266bab861278..b13435ef0216 100644 --- a/src/v/raft/heartbeat_manager.cc +++ b/src/v/raft/heartbeat_manager.cc @@ -281,7 +281,11 @@ void heartbeat_manager::process_reply( for (auto& [g, req_meta] : groups) { auto it = _consensus_groups.find(g); if (it == _consensus_groups.end()) { - vlog(hbeatlog.error, "cannot find consensus group:{}", g); + vlog( + hbeatlog.warn, + "cannot find consensus group:{}, may have been moved or " + "deleted", + g); continue; }