tests: Deflake test_migrating_consume_offsets #5074

bharathv · 2022-06-08T19:37:33Z

Cover letter

Fixes synchronization around failure injection and node restarts.
Looped through the test 50 times locally and didn't
run into any issues

Fixes ##5034

Release notes

none

force-push Amends the commit message to add more detail.

bharathv · 2022-06-08T19:38:51Z

@mmaslankaprv you added this test, you might have some context here, FYI.

dotnwat · 2022-06-08T22:08:14Z

tests/rptest/tests/consumer_offsets_migration_test.py

+        # or restarts. We only want to do one of those at a time to avoid conflicts.
+        busy_nodes = set()
+        # synchronize access to busy_nodes set.
+        busy_nodes_lock = threading.RLock()


Fixes synchronization around failure injection and
node restarts.

could you add a little context into the commit message about what was wrong and how it was fixed?

Looped through the test 50 times locally and didn't
run into any issues

this is great context to add to the PR cover letter, but doesn't add much value in the commit message.

I updated the commit message and did a force push.

Fixes synchronization around failure injection and node restarts. The test is structured as follows. 1. start a cluster without consumer_offsets support 2. Run an async failure injection loop that can potentially do a SIGKILL on a random redpanda process. 3. Do a consumer groups test on a sample topic. 4. Restart the cluster to enable the new __consumer_offsets topic feature. 5. Verify that __consumer_offsets functionality works as expected. It turns out that (2) and (4) are conflicting with each other due to poor synchronization. Due to some racy code these operations are interleaved resulting in failure injector attempting to kill a non-existent PID. This patch makes sure that only one of these two operations run at any given time. A given node is marked 'busy' in a thread-safe way if either of the operations is running and the other thread has to wait.

bharathv

Test failure seems related to #2501

bharathv · 2022-06-08T22:38:49Z

tests/rptest/tests/consumer_offsets_migration_test.py

+        # or restarts. We only want to do one of those at a time to avoid conflicts.
+        busy_nodes = set()
+        # synchronize access to busy_nodes set.
+        busy_nodes_lock = threading.RLock()


I updated the commit message and did a force push.

dotnwat · 2022-06-08T22:49:05Z

I updated the commit message and did a force push.

thanks! it's excellent ✏️

dotnwat

code looks good to me. started a 10x repeat on CI for a quick check. should wait to see how that turns out

mmaslankaprv · 2022-06-09T06:28:03Z

looks great

bharathv · 2022-06-09T23:27:57Z

1/10 runs failed with ci-repeat-10 and the failure is in an unrelated test. tracked in #5079

<testcase name="test_dead_group_recovery.static_members=False" classname="ConsumerGroupTest" time="46.33625674247742" status="fail" assertions="">

I think this is good to go.

bharathv · 2022-06-11T20:20:47Z

/backport v22.1.x

bharathv requested review from dotnwat and NyaliaLui as code owners June 8, 2022 19:37

bharathv requested review from rystsov and ztlpn June 8, 2022 19:39

bharathv force-pushed the test_migrating_consume_offsets branch from ab7297b to 68013af Compare June 8, 2022 19:44

dotnwat reviewed Jun 8, 2022

View reviewed changes

bharathv force-pushed the test_migrating_consume_offsets branch from 68013af to 1fb9dec Compare June 8, 2022 22:36

bharathv commented Jun 8, 2022

View reviewed changes

dotnwat added the ci-repeat-10 repeat tests 10x concurrently to check for flakey tests; self-cancelling label Jun 8, 2022

vbotbuildovich removed the ci-repeat-10 repeat tests 10x concurrently to check for flakey tests; self-cancelling label Jun 8, 2022

dotnwat approved these changes Jun 8, 2022

View reviewed changes

bharathv mentioned this pull request Jun 8, 2022

Redpanda node failed to stop in ConsumerOffsetsMigrationTest.test_migrating_consume_offsets.failures=True.cpus=1 #4670

Closed

bharathv added the ci-repeat-10 repeat tests 10x concurrently to check for flakey tests; self-cancelling label Jun 9, 2022

vbotbuildovich removed the ci-repeat-10 repeat tests 10x concurrently to check for flakey tests; self-cancelling label Jun 9, 2022

bharathv merged commit e52ce8d into redpanda-data:dev Jun 11, 2022

vbotbuildovich mentioned this pull request Jun 11, 2022

[v22.1.x] tests: Deflake test_migrating_consume_offsets #5095

Merged

bharathv mentioned this pull request Jun 11, 2022

No such process error in ConsumerOffsetsMigrationTest.test_migrating_consume_offsets.failures=True.cpus=1 #5034

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: Deflake test_migrating_consume_offsets #5074

tests: Deflake test_migrating_consume_offsets #5074

bharathv commented Jun 8, 2022 •

edited

Loading

bharathv commented Jun 8, 2022

dotnwat Jun 8, 2022

bharathv Jun 8, 2022

bharathv left a comment

bharathv Jun 8, 2022

dotnwat commented Jun 8, 2022

dotnwat left a comment •

edited

Loading

mmaslankaprv commented Jun 9, 2022

bharathv commented Jun 9, 2022

bharathv commented Jun 11, 2022

tests: Deflake test_migrating_consume_offsets #5074

tests: Deflake test_migrating_consume_offsets #5074

Conversation

bharathv commented Jun 8, 2022 • edited Loading

Cover letter

Release notes

bharathv commented Jun 8, 2022

dotnwat Jun 8, 2022

Choose a reason for hiding this comment

bharathv Jun 8, 2022

Choose a reason for hiding this comment

bharathv left a comment

Choose a reason for hiding this comment

bharathv Jun 8, 2022

Choose a reason for hiding this comment

dotnwat commented Jun 8, 2022

dotnwat left a comment • edited Loading

Choose a reason for hiding this comment

mmaslankaprv commented Jun 9, 2022

bharathv commented Jun 9, 2022

bharathv commented Jun 11, 2022

bharathv commented Jun 8, 2022 •

edited

Loading

dotnwat left a comment •

edited

Loading