Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cluster state stats should reset every time cluster manager node steps down as active leader #10928

Open
shwetathareja opened this issue Oct 25, 2023 · 3 comments
Labels
bug Something isn't working Cluster Manager

Comments

@shwetathareja
Copy link
Member

Describe the bug
The Cluster state stats were introduced as part of #10670 . These stats are emitted only by the active cluster manager which is the leader. So, lets say there are 3 cluster manager nodes - n1,n2,n3.

Scenarios

  1. n1 is the leader so it is emitting the stats, lets say at this point update count = 5, failure count = 1
  2. n1 steps down and n2 becomes the leader for the first time, now update count will start from 1 which is expected as these are per node stats.
  3. n2 steps down and n1 becomes leader, it will start appending to the update count which was set to 5 instead of start from 1.

Expected Behavior:
The cluster state stats should reset every time cluster manager node steps down as active leader.

@shwetathareja shwetathareja added bug Something isn't working untriaged Cluster Manager and removed untriaged labels Oct 25, 2023
@shwetathareja
Copy link
Member Author

@amkhar to take a look.

@soosinha
Copy link
Member

@shwetathareja
If we reset the stats when the active leader steps down, we will not the previous success and failure stats. In particular, since the failure stat would be zero, we would not known if there was any failure.

The current behavior of remote upload stats retaining the stats is in line with regular cluster state stats which also does the same.
See a sample cluster state stats object below

"cluster_state_stats" : {
          "overall" : {
            "update_count" : 1285,
            "total_time_in_millis" : 175661,
            "failed_count" : 0
          }

This update count and failed count also do not change when active leader steps down.

Let us know if we should still track this as a bug.

@soosinha
Copy link
Member

soosinha commented Aug 8, 2024

Since this behavior of remote upload stats is the same for cluster state stats, we should be changing the behavior for both if needed in order to keep the logic consistent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants