Track persistent volume in k8s environement #4965

RafalKorepta · 2022-05-30T07:45:05Z

Cover letter

Change Redpanda ID calculation based on the Redpanda data
folder mounted to the POD. If user would like to choose faster
local storage, every time local disk is wiped out due to node
change the Redpanda ID will be regenerated.

This PR changes seed server list calculation where seed server
list will be clean only when there will be no data and no running
other seed server nodes.

Release notes

Init container in kubernetes deployment will track Redpanda ID using
persistent volumes

Improvements

Seed server list will be removed only in POD with ordinal 0 when cluster
has more than 1 replicas and none other Replicas is running.

First FP

Fix linter issues with file permission.

REF:
#333

For Redpanda pods that are newly created Redpanda ID will be randomly generated For Redpanda pods that has data in persistent volume the Redpanda ID will become statefulset ordinal number to not introducing breaking change. For Redpanda pods that have .redpanda_id file in the Redpanda data persistent volume the content of that file will be used for node.ID configuration.

If Redpanda data doesn't have any content and pod ordinal numer is 0, then initialize Redpanda seed server list to an empty list. In any other case seed server list should be point to all brokers.

Due to the fact that configurator is used in kubernetes enviornment the host name reflects POD name. As Redpanda operator is using statefulset all PODs will have ordinal number as suffix.

As cluster is bootstrapping the node with pod ordinal 0 might lose its data folder, the seed servers list should not be empty if cluster already was formed. There might be a race condition where pod with ordinal 0 might form one node cluster and as it becomes a leader it might overwrite other nodes ring0 raft group history. To prevent such case the seed server list will be cleared only if no other node in the cluster is working (pod in running phase) and reports as live (v1/brokers list where is_alive is set to true).

In rare case int64 random Redpanda ID might collide with already existing ID. Currently ID can not be reused.

alenkacz

just left some initial questions

alenkacz · 2022-05-30T09:09:40Z

src/go/k8s/cmd/configurator/main.go

@@ -42,6 +56,9 @@ const (
 	externalConnectivityAddressTypeEnvVar = "EXTERNAL_CONNECTIVITY_ADDRESS_TYPE"
 	hostPortEnvVar                        = "HOST_PORT"
 	proxyHostPortEnvVar                   = "PROXY_HOST_PORT"
+	dataDirPath                           = "DATA_DIR_PATH"
+
+	redpandaIDs = "redoanda-ids"


alenkacz · 2022-05-30T09:10:55Z

src/go/k8s/cmd/configurator/main.go

+	}
+
+	err = calculateRedpandaID(func() int {
+		return int(rand.NewSource(time.Now().Unix()).Int63())


so it's random and not incrementing as we discussed somewhere on slack? I would strongly preferred to do that :)

alenkacz · 2022-05-30T09:12:51Z

src/go/k8s/cmd/configurator/main.go

+		return nil
+	}
+
+	empty, err := IsRedpandaDataFolderEmpty(c.dataDirPath)


could you add comment explaining this change? so if data folder is empty it has empty seed dirs? So that it forms new cluster or ... ? Also I think this would not work on 2 node clusters

nicolaferraro

Great start! I'm a bit unsure about the last 2 commits that introduce calls to k8s API and admin API from the configurator. Maybe this works, but there may be dragons (like deadlocks) if we go with this approach..

As far as I understood, those changes are needed to fix the following problems:

Node uniqueness: for which we can also think to make stronger UIDs, like mixing ordinal + timestamp + entropy-based rng
Seed server on startup: for which we could try to find a solution with the core team

nicolaferraro · 2022-05-30T08:06:18Z

src/go/k8s/cmd/configurator/main.go

@@ -336,6 +340,10 @@ func checkEnvVars() (configuratorConfig, error) {
 			value: &hostPort,
 			name:  hostPortEnvVar,
 		},
+		{


Nit: indentation seems wrong

nicolaferraro · 2022-05-30T08:11:28Z

src/go/k8s/cmd/configurator/main.go

@@ -153,6 +159,62 @@ func main() {
 	log.Printf("Configuration saved to: %s", c.configDestination)
 }

+func calculateRedpandaID(
+	cfg *config.Config, c configuratorConfig, initialID brokerID,


Nit: I'd call the argument podOrdinal for more clarity

nicolaferraro · 2022-05-30T08:30:20Z

src/go/k8s/cmd/configurator/main.go

+			return nil
+		}
+
+		redpandaID := int(rand.NewSource(time.Now().Unix()).Int63())


I'm not sure about this. .Unix() gives seconds elapsed from 1970 and there's a high chance that two pods will start within the same second when a fresh clusters with 3 or more instances is created. I'd also remove the randomization part if the uniqueness is based on the seed.. Iirc same seed leads to same result, also in different machines. Maybe including the hostIndex helps with the uniqueness, like newID := time.Now().UnixMilli() * 1e4 + podOrdinal

Agree with your proposal.

nicolaferraro · 2022-05-30T08:53:35Z

src/go/k8s/cmd/configurator/main.go

+		return fmt.Errorf("checking Redpanda data folder content (%s): %w", c.dataDirPath, err)
+	}
+
+	if index == 0 && empty {


I'm thinking to this scenario:

Init container starts, configurator writes .redpanda_id file

Main pod starts and crashes

Configurator now is restarted and finds .redpanda_id, so it uses the full seed_server list

Is the cluster able to form a quorum anyway?

If Redpanda crashes only container will be restarted not whole POD.

Your concern is still valid that someone could delete pod when it didn't form cluster.

I need to think more about it.

nicolaferraro · 2022-05-30T09:07:41Z

src/go/k8s/cmd/configurator/main.go

+		if pod.Name == c.hostName {
+			continue
+		}
+		if pod.Status.Phase == corev1.PodRunning {


I understand this is a problem we need to avoid, but I wonder if we can find a different way. I've seen e.g. cases where all nodes crash during an upgrade, which can still lead the first one to think he's alone. Isn't Redpanda supposed to handle these situations?

apparently Redpanda might form 2 clusters -> #333

nicolaferraro · 2022-05-30T09:08:50Z

src/go/k8s/cmd/configurator/main.go

+			}
+
+			for _, br := range brokers {
+				if br.NodeId == 0 {


nodeId is no longer the podOrdinal, so this check may never hold

You are right. Fixed in next push.

nicolaferraro · 2022-05-30T09:11:46Z

src/go/k8s/cmd/configurator/main.go

@@ -56,6 +57,8 @@ const (
 	hostPortEnvVar                        = "HOST_PORT"
 	proxyHostPortEnvVar                   = "PROXY_HOST_PORT"
 	dataDirPath                           = "DATA_DIR_PATH"
+
+	redpandaIDs = "redoanda-ids"


nicolaferraro · 2022-05-30T09:33:31Z

src/go/k8s/cmd/configurator/main.go

+			cm.Data = map[string]string{
+				redpandaIDs: "",
+			}
+			err = k8sClient.Create(ctx, cm)


Have you considered reversing the roles here. I.e. managing the -ids configmap from the operator and reading it in the configurator.

The IDS would be created uniquely by the operator itself and also the configurator could use that cm to get info on other replicas. The nodes could still treat it as a "suggestion" and still report a different nodeId in the node_config...

Just thinking loud...

Operator should track every PV that was bound to each POD and map each Redpanda ID to PV.

We don't have this information before statefuls set control schedules new POD. Then we can update configmap and configurator need to wait for this ID.

RafalKorepta · 2022-05-30T11:08:02Z

This PR isn't solving the seed server problem and Redpanda ID can be reused even if data folder is lost.

Rafal Korepta added 3 commits May 26, 2022 02:02

k8s: Reword volumes manipulation

21af042

k8s: Mount Redpanda data volume to configurator

d556729

k8s: Add Redpanda data dir path to configurator

478fbca

RafalKorepta requested review from alenkacz, nicolaferraro and 0xdiba May 30, 2022 07:45

RafalKorepta requested a review from a team as a code owner May 30, 2022 07:45

github-actions bot added the area/k8s label May 30, 2022

Rafal Korepta added 5 commits May 30, 2022 09:50

k8s: Set SeedServer based on the Redpanda data disk content

fa739f2

If Redpanda data doesn't have any content and pod ordinal numer is 0, then initialize Redpanda seed server list to an empty list. In any other case seed server list should be point to all brokers.

k8s: Rename hostIndex to POD ordinal

40df3c6

Due to the fact that configurator is used in kubernetes enviornment the host name reflects POD name. As Redpanda operator is using statefulset all PODs will have ordinal number as suffix.

k8s: Detect Redpanda ID collision

f7fc467

In rare case int64 random Redpanda ID might collide with already existing ID. Currently ID can not be reused.

RafalKorepta force-pushed the rk/gh-2096/track-persistent-volume branch from ede0168 to f7fc467 Compare May 30, 2022 07:52

alenkacz reviewed May 30, 2022

View reviewed changes

nicolaferraro requested changes May 30, 2022

View reviewed changes

RafalKorepta closed this May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track persistent volume in k8s environement #4965

Track persistent volume in k8s environement #4965

RafalKorepta commented May 30, 2022 •

edited

Loading

alenkacz left a comment

alenkacz May 30, 2022

alenkacz May 30, 2022

alenkacz May 30, 2022

nicolaferraro left a comment

nicolaferraro May 30, 2022

nicolaferraro May 30, 2022

nicolaferraro May 30, 2022

RafalKorepta May 30, 2022

nicolaferraro May 30, 2022

RafalKorepta May 30, 2022

nicolaferraro May 30, 2022

RafalKorepta May 30, 2022

nicolaferraro May 30, 2022

RafalKorepta May 30, 2022

nicolaferraro May 30, 2022

nicolaferraro May 30, 2022

RafalKorepta May 30, 2022

RafalKorepta commented May 30, 2022

Track persistent volume in k8s environement #4965

Track persistent volume in k8s environement #4965

Conversation

RafalKorepta commented May 30, 2022 • edited Loading

Cover letter

Release notes

Improvements

First FP

alenkacz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolaferraro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RafalKorepta commented May 30, 2022

RafalKorepta commented May 30, 2022 •

edited

Loading