fixing Cassandra shutdown example to avoid data corruption #39199

deimosfr · 2016-12-23T13:20:12Z

Hi,

I was playing with Cassandra example stored in the Kubernetes project and I encountered issues on shutdown (not anytime). After checking it looks like the shutdown of a node is brutal and data corruption may occur during a flush on disk. To avoid that, I'm suggesting a hook to gracefully shutdown Cassandra before stopping the container.

Here are logs of corruption after a pod delete:

/10.2.76.4:[-8699848499000118463, -8567123670484406873, -8496767951391579058, -8426990834929543369, -7697118318683556771, -6942779781591907873, -6795880495022459877, -6496399078175245235, -5450122121479522544, -5002551029990001224, -4914532712178218138, -4884518674849288097, -3667338763252443465, -3316742521554936832, -2844544359955291760, -1291351295404368159, -794348397160283083, -705240847455001090, -652995206518489298, -284127251294286231, 173240967232234690, 616476682204879844, 826670457841382100, 1815369334084765465, 4431706613761077084, 4743606016174161647, 5637469692783959686, 5802957011124852712, 6759688243703331970, 7679657413128857702, 7713766696628426028, 9098158217036036188]

ERROR 16:23:06 Exception in thread Thread[CompactionExecutor:2,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /cassandra_data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/mc-2-big-Data.db
	at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator.computeNext(BigTableScanner.java:351) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator.computeNext(BigTableScanner.java:265) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.io.sstable.format.big.BigTableScanner.hasNext(BigTableScanner.java:245) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:232) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:184) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:82) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) ~[apache-cassandra-3.9.jar:3.9]

It works well for me now and do not have data corruption anymore.

k8s-ci-robot · 2016-12-23T13:20:12Z

Hi @deimosfr. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

If you have questions or suggestions related to this bot's behavior, please file an issue against the kubernetes/test-infra repository.

k8s-reviewable · 2016-12-23T13:20:18Z

This change is

brendandburns · 2017-01-02T00:53:28Z

@k8s-bot ok to test

LGTM.

Thanks!

k8s-ci-robot · 2017-01-02T00:59:13Z

Jenkins verification failed for commit 22b936649d8acdfe9c9a42a07f81d07daed3445b. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

brendandburns · 2017-01-04T03:57:09Z

@deimosfr ugh, looks like the switch from 2016 to 2017 confused some automation. Can you rebase?

thanks (and sorry!)

--brendan

deimosfr · 2017-01-04T10:45:10Z

@brendandburns done, please let me know if it's ok for you now

0xmichalis · 2017-01-06T17:36:37Z

@deimosfr this needs another rebase. It seems you have pulled a bunch of preexisting commits somehow.

0xmichalis · 2017-01-13T09:42:41Z

@jonashuckestein you guys may be interested in this.

deimosfr · 2017-01-18T08:45:57Z

@Kargakis is something missing to merge ?

0xmichalis · 2017-01-18T10:02:58Z

@chrislovecnm can you also have a look? I would expect termination signaling to be handled by the kubelets and a higher terminationGracePeriod in the StatefulSet would make more sense to me but I am not familiar with c*.

brendandburns · 2017-01-21T04:15:10Z

/lgtm

Sorry for the delay.

k8s-ci-robot · 2017-01-21T05:00:33Z

Jenkins Kubemark GCE e2e failed for commit c165e90. Full PR test history.

cc @deimosfr, your PR dashboard

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

0xmichalis · 2017-01-21T14:23:22Z

@k8s-bot kubemark e2e test this

k8s-github-robot · 2017-01-23T07:46:50Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-01-23T08:30:11Z

Automatic merge from submit-queue (batch tested with PRs 39199, 37273, 29183, 39638, 40199)

pavolloffay · 2017-07-26T08:09:21Z

examples/storage/cassandra/cassandra-statefulset.yaml

+        lifecycle:
+          preStop:
+            exec:
+              command: ["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]


This has been merged ages ago but isn't it better to call nodetool drain? It should stop accepting new data and flush on disk.

cc @chrislovecnm @jondubois

Good idea, nodetool drain is better to ensure all data have been flushed

I have opened #49618 to fix this

…p-drain Automatic merge from submit-queue (batch tested with PRs 47724, 49984, 49785, 49803, 49618) Cassandra example, use nodetool drain in preStop Related to kubernetes#39199 (comment)

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 23, 2016

k8s-github-robot assigned brendandburns Dec 23, 2016

k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Dec 23, 2016

brendandburns added release-note Denotes a PR that will be considered when it comes time to generate release notes. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed release-note-label-needed labels Jan 2, 2017

k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 4, 2017

k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 4, 2017

k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 8, 2017

fixing Cassandra shutdown example to avoid data corruption

c165e90

k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 12, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2017

k8s-github-robot merged commit c08758e into kubernetes:master Jan 23, 2017

pavolloffay mentioned this pull request Jul 26, 2017

Nodetool drain for C* jaegertracing/jaeger-kubernetes#21

Merged

pavolloffay reviewed Jul 26, 2017

View reviewed changes

pavolloffay mentioned this pull request Jul 26, 2017

Cassandra example, use nodetool drain in preStop #49618

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing Cassandra shutdown example to avoid data corruption #39199

fixing Cassandra shutdown example to avoid data corruption #39199

deimosfr commented Dec 23, 2016 •

edited

Loading

k8s-ci-robot commented Dec 23, 2016

k8s-reviewable commented Dec 23, 2016

brendandburns commented Jan 2, 2017

k8s-ci-robot commented Jan 2, 2017

brendandburns commented Jan 4, 2017

deimosfr commented Jan 4, 2017

0xmichalis commented Jan 6, 2017

0xmichalis commented Jan 13, 2017

deimosfr commented Jan 18, 2017

0xmichalis commented Jan 18, 2017

brendandburns commented Jan 21, 2017

k8s-ci-robot commented Jan 21, 2017

0xmichalis commented Jan 21, 2017

k8s-github-robot commented Jan 23, 2017

k8s-github-robot commented Jan 23, 2017

pavolloffay Jul 26, 2017

deimosfr Jul 26, 2017

pavolloffay Jul 26, 2017

fixing Cassandra shutdown example to avoid data corruption #39199

fixing Cassandra shutdown example to avoid data corruption #39199

Conversation

deimosfr commented Dec 23, 2016 • edited Loading

k8s-ci-robot commented Dec 23, 2016

k8s-reviewable commented Dec 23, 2016

brendandburns commented Jan 2, 2017

k8s-ci-robot commented Jan 2, 2017

brendandburns commented Jan 4, 2017

deimosfr commented Jan 4, 2017

0xmichalis commented Jan 6, 2017

0xmichalis commented Jan 13, 2017

deimosfr commented Jan 18, 2017

0xmichalis commented Jan 18, 2017

brendandburns commented Jan 21, 2017

k8s-ci-robot commented Jan 21, 2017

0xmichalis commented Jan 21, 2017

k8s-github-robot commented Jan 23, 2017

k8s-github-robot commented Jan 23, 2017

pavolloffay Jul 26, 2017

Choose a reason for hiding this comment

deimosfr Jul 26, 2017

Choose a reason for hiding this comment

pavolloffay Jul 26, 2017

Choose a reason for hiding this comment

deimosfr commented Dec 23, 2016 •

edited

Loading