Skip to content
This repository has been archived by the owner on May 18, 2020. It is now read-only.

Nodetool drain for C* #21

Merged
merged 3 commits into from
Jul 26, 2017
Merged

Conversation

pavolloffay
Copy link
Member

Fixes part of https://github.com/uber/jaeger/issues/175.

@mwringe @jpkrohling could you please review?

Copy link
Collaborator

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use a shell script similar to what Hawkular Metrics does.

@pavolloffay
Copy link
Member Author

@jpkrohling how to you want to get that shell script to the official c* docker image?

@jpkrohling
Copy link
Collaborator

I see a few ways, but you might find other (better?) ways as well. A few thoughts:

  1. Contribute it directly to the Cassandra project, so that this script is available to all on the base image
  2. Use it as CockroackDB uses, embedding the script within the command: https://git.io/v7Ork . Note that they are using it as the actual command, but the same idea could be applied to a preStop hook. The origin metrics script isn't that big, so, this is still doable.
  3. Create it as a secret or as entry of a config map and mount it on the image

The first option is preferable, but if you need this feature quick, then you might want to use one of the other ones.

@pavolloffay
Copy link
Member Author

Why is there this?
["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]
I see that k8s c* is using it, but why?
https://github.com/kubernetes/examples/blob/master/cassandra/cassandra-statefulset.yaml#L40

About 1. I don't think it would happen it's basically one liner and it depends on what you want to do with the output (or even if you want to do something). We currently do not define variable CASSANDRA_DATA_VOLUME as hawkular-metrics does so we cannot store the output of that command anywhere, we should just run it.

I'm more for the simplest solution.

@jpkrohling
Copy link
Collaborator

Why is there this?

It looks like you disconnected when I talked to @mwringe about this. It comes originally from the Kubernetes Cassandra StatefulSet example and we are not sure why it's there.

Even though nodetool drain is a one liner, the Origin Metrics code does have some logic for locking, so that no two nodetool drain would happen at the same time on the same node.

I'm also for the simplest solution :) Do you think just adding nodetool drain to the preStop would suffice?

@pavolloffay
Copy link
Member Author

the Origin Metrics code does have some logic for locking,

Where is this logic? Can we do the same locking?

@jpkrohling
Copy link
Collaborator

jpkrohling commented Jul 26, 2017

Where is this logic? Can we do the same locking?

https://github.com/openshift/origin-metrics/blob/master/cassandra/cassandra-prestop.sh

EDIT: sorry, I read the code too fast :) I thought it was a locking mechanism, but it clearly isn't

@pavolloffay
Copy link
Member Author

pavolloffay commented Jul 26, 2017

java kill hack was introduced here, I think if we do nodetool drain then it's not necessary
kubernetes/kubernetes#39199

Some other references about nodetool drain:

@@ -24,7 +24,7 @@ mkdir $HOME/.kube || true
touch $HOME/.kube/config

export KUBECONFIG=$HOME/.kube/config
sudo -E ./minikube start --vm-driver=none --use-vendored-driver
Copy link
Member Author

@pavolloffay pavolloffay Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They removed --use-vendored-driver in the last version which was released yesterday.

We can use a specific minikube version to avoid failures like this. (but I prefer to use latest for now).

@mwringe
Copy link

mwringe commented Jul 26, 2017

About:
["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]

I suspect this is here because in previous versions of Kubernetes, the prestop hook would block. So having this as part of the prestop hook would wait until Cassandra has fully exited before the pod exits. Without this, Cassandra would be killed after a timeout (default 10s?) which could be cause data corruption.

But this is no longer the case and it can no longer be assumed to work this way. Even with a prestop hook, your pod will be terminated after a certain timeout (default 30s?).

The way you handle it now is to specify a large terminationGracePeriodSeconds value, which will give your application more time to fully shutdown (which is what the pr adds).

I believe all you want now for a prestop hook is to just call nodetool drain. Having it in a separate script is nice and makes it easier to output information to file so that if there is a problem it can be debugged later.

@pavolloffay
Copy link
Member Author

@mwringe thanks for the explanation 👍

@jpkrohling what is missing to merge? I don't think that there is a benefit in having a separate script for this. We currently don't have a place to publish the result of it.

Copy link
Collaborator

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pavolloffay pavolloffay merged commit 65863f3 into jaegertracing:master Jul 26, 2017
@pavolloffay
Copy link
Member Author

@jpkrohling thanks I will open the same for OC

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants