From e9903d51e996d443ff03c1e831278328d0a85bdf Mon Sep 17 00:00:00 2001
From: Andrew Wong <awong@redpanda.com>
Date: Fri, 29 Jul 2022 15:29:46 -0700
Subject: [PATCH] heartbeat_manager: reduce log severity when missing partition

The heartbeat manager logs an error message when it attempts parses a
bad heartbeat response that includes a partition it doesn't know about.
A similar message is logged at the debug level when handling a good
response that includes a partition it doesn't know about, since the
following is a valid series of events:

1. replica R becomes leader of topic partition P
2. R sends out heartbeat requests to P's followers
3. an admin deletes P
4. the controller leader sends the requests to delete P
5. R shuts down its consensus
6. the reply to P's heartbeat is received by R, but the partition no
   longer exsts

This sequence is possible regardless whether or not the request was
successfully sent out. With the current logging, we log an error e.g. if
before step 6, the request timed out, and we process the response.

This commit makes this log line less severe, instead logging at the warn
level but with some context explaining when we might expect the log.
---
 src/v/raft/heartbeat_manager.cc | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/v/raft/heartbeat_manager.cc b/src/v/raft/heartbeat_manager.cc
index 266bab861278..b13435ef0216 100644
--- a/src/v/raft/heartbeat_manager.cc
+++ b/src/v/raft/heartbeat_manager.cc
@@ -281,7 +281,11 @@ void heartbeat_manager::process_reply(
         for (auto& [g, req_meta] : groups) {
             auto it = _consensus_groups.find(g);
             if (it == _consensus_groups.end()) {
-                vlog(hbeatlog.error, "cannot find consensus group:{}", g);
+                vlog(
+                  hbeatlog.warn,
+                  "cannot find consensus group:{}, may have been moved or "
+                  "deleted",
+                  g);
                 continue;
             }