Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: many redundant config status log messages may be written if follower is not seeing updates promptly #4923

Closed
jcsp opened this issue May 25, 2022 · 0 comments · Fixed by #4924
Assignees
Labels
area/controller kind/bug Something isn't working

Comments

@jcsp
Copy link
Contributor

jcsp commented May 25, 2022

reconcile_status emits a set_status rpc to the leader any time it sees that the contents of its status table do not match its own self-described status. If it is not receiving updates from the leader promptly, it may emit many messages like this in a loop.

The leader should check if any set_status calls from a follower are no-ops, and drop them, rather than writing multiple identical messages to the controller log.

@jcsp jcsp self-assigned this May 25, 2022
jcsp added a commit to jcsp/redpanda that referenced this issue May 25, 2022
If a follower isn't seeing controller log updates
promptly, it may issue many set_status RPCs while
it's waiting.  The controller leader should not
turn all of these into log writes: if the status
of the node already matches what it is reporting,
then do not write anything.

Fixes redpanda-data#4923
jcsp added a commit to jcsp/redpanda that referenced this issue May 25, 2022
On a healthy system, we do want to send set_status
RPCs as soon as we're ready.  However, if the controller
log updates are not being seen promptly, this would lead
to the follower spamming the controller leader with
very many set_status RPCs in a tight loop.

Nodes will still send their status immediately when
a config change occurs: this change only effects the
behaviour if _another_ config change occurs while
it is reporting status from the first change: in this
case the follower will wait 5 seconds before sending
its next status RPC.

Related redpanda-data#4923
@jcsp jcsp added kind/bug Something isn't working area/controller labels May 26, 2022
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 27, 2022
If a follower isn't seeing controller log updates
promptly, it may issue many set_status RPCs while
it's waiting.  The controller leader should not
turn all of these into log writes: if the status
of the node already matches what it is reporting,
then do not write anything.

Fixes redpanda-data#4923

(cherry picked from commit 0a2b991)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 27, 2022
On a healthy system, we do want to send set_status
RPCs as soon as we're ready.  However, if the controller
log updates are not being seen promptly, this would lead
to the follower spamming the controller leader with
very many set_status RPCs in a tight loop.

Nodes will still send their status immediately when
a config change occurs: this change only effects the
behaviour if _another_ config change occurs while
it is reporting status from the first change: in this
case the follower will wait 5 seconds before sending
its next status RPC.

Related redpanda-data#4923

(cherry picked from commit 3b1e56e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant