Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not restart cluster when scaling up #4964

Closed
wants to merge 3 commits into from

Conversation

alenkacz
Copy link
Contributor

@alenkacz alenkacz commented May 30, 2022

Cover letter

Currently when scaling up, we always update seed servers and that update triggered restart of the whole cluster. That's unnecessary, it's only important that there's at least one node up from all seeds, there does not have to be all listed.

This PR changes the current behavior so the algorithm computing hashes to not include seeds - so change in seeds will never trigger restart. Considering the way how clusters are scaled up and down right now, this is correct behavior.

We cannot backport this as it triggers restart of the whole cluster because it changes the existing hashes.

Fixes #ISSUE-NUMBER, Fixes #ISSUE-NUMBER, ...

Release notes

Improvements

  • Increasing number of replicas won't trigger restart of the whole cluster. Updating to this version will trigger restart as the algorithm for computation of hash triggering restart changed.

@alenkacz alenkacz force-pushed the av/scale-up-restart branch 4 times, most recently from 29d33a5 to c0fe3fb Compare May 30, 2022 09:06
@alenkacz alenkacz marked this pull request as ready for review May 30, 2022 13:16
@alenkacz alenkacz requested a review from a team as a code owner May 30, 2022 13:16
Copy link
Member

@nicolaferraro nicolaferraro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good 👍
I think there are some other fields that change with replicas that we need to unhash

@@ -120,6 +120,7 @@ func (c *GlobalConfiguration) GetNodeConfigurationHash() (string, error) {
clone := *c
// clean any cluster property from config before serializing
clone.ClusterConfiguration = nil
removeIgnoredFields(&clone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably we don't need to clone the object, but the code above seems to suggest we do a deep copy, while in practice it's doing a shallow one and the object is probably being modified

func removeIgnoredFields(clone *GlobalConfiguration) {
// ignore seeds for hash computation so that changes in this field don't
// trigger cluster restats
clone.NodeConfiguration.Redpanda.SeedServers = []config.SeedServer{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are other cases where we iterate over the replicas to create urls, such as for panda proxy and schema registry

existingCluster.Spec.Replicas = &repliasP1
Expect(k8sClient.Update(context.Background(), &existingCluster)).Should(Succeed())
newConfigMapHash := sts.Annotations[res.ConfigMapHashAnnotationKey]
Expect(newConfigMapHash).Should(Equal(configMapHash))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better using Consistently(func ...) in this case, even if it takes more time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the sts is not reloaded

@joejulian
Copy link
Contributor

Closing this as it will not be relevant for the v2 operator: https://github.com/redpanda-data/redpanda/tree/operator-v2

@joejulian joejulian closed this Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s kind/enhance New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants