tests: Add random action injector #4404

abhijat · 2022-04-25T05:17:29Z

Cover letter

Adds random action injector utilities, aimed to be used for e2e tests. The failures could be process failures, node decommission, leadership transfer etc, controlled by a context manager.

The action injector runs on a thread and periodically introduces changes on randomly selected nodes in the cluster.

Features

The new context can be used as following

with random_process_kills(self.redpanda):
    do_something()

Changes in force push

clean up unused code based on review comments
simplify test cases based on review comments

Changes in force push

clean up result handling

Changes in force push:

resolve conflicts with upstream dev

Changes in force push

fold everything back into two logical commits
remove changes which were only import reordering

Changes in force push

system restoration code removed
fix assertion to only check if action was triggered by test

tests/rptest/tests/e2e_shadow_indexing_test.py

graphcareful

Really nice job, LGTM

jcsp

Structure looks great.

Some comments - I think I might have commented on code that isn't used yet (e.g. the decom/recom actions)

tests/rptest/services/redpanda.py

tests/rptest/tests/e2e_shadow_indexing_test.py

tests/rptest/services/action_injector.py

jcsp · 2022-04-26T17:03:23Z

Thank you for adding release notes, but those are usually used for customer-facing release documentation, so we generally don't mention changes to our internal test code.

tests/rptest/services/action_injector.py

abhijat · 2022-04-27T08:26:36Z

@jcsp @rystsov I will focus this PR on the action which kills processes. I have reduced the NodeDecommission and LeadershipTransfer actions to skeleton classes, and will focus on concrete implementation for them in subsequent PRs after more research based on the comments here.

If it is better that I should remove the skeleton classes altogether, please let me know.

graphcareful · 2022-04-27T14:44:40Z

One comment about our PR flow, when iterating we don't push commits on top of already reviewed commits, we just edit/rebase existing commits. If you rebase with --keep-base reviewers will be able to see the diff of the PR between force pushes.

jcsp

LGTM when CI passes: I like the overall structure + am okay with having the not-yet-implemented actions in there as stubs.

Needs a re-check from @rystsov

rystsov

It's better to remove the code we don't use. Code has gravity so the more we push the harder it change later; also it creates more load on the reviewer and if we never follow up it's instant dead code.

Also please follow @graphcareful advice and rewrite the history, we tend to care about the shape of the commit history - https://github.com/redpanda-data/redpanda/blob/dev/CONTRIBUTING.md#commit-history

tests/rptest/services/action_injector.py

tests/rptest/services/admin.py

tests/rptest/tests/e2e_shadow_indexing_test.py

abhijat · 2022-04-28T11:13:59Z

One comment about our PR flow, when iterating we don't push commits on top of already reviewed commits, we just edit/rebase existing commits. If you rebase with --keep-base reviewers will be able to see the diff of the PR between force pushes.

@graphcareful I have used force push with --keep-base for the latest commit editing the last approved commit, please let me know if the history looks more acceptable now.

I have also added links to description of force push in the cover letter after discussing with Evgeny so it is easier for reviewers.

abhijat · 2022-05-06T04:24:56Z

@rystsov please review, I have cleaned up the restoration code and replaced log with boolean that the test can use to assert internal state of the thread.

rystsov

Looks good!

tests/rptest/services/action_injector.py

rystsov · 2022-05-06T08:39:36Z

Some of the tests are failing, please for each failing test check if it's a know flaky test if it's then add a link to this build to the flaky issue and comment this PR to reassure that the failing tests aren't related to this PR.

For new failing tests you should investigate if it's related to this PR. If they are - fix the PR, if they aren't then you should open ci-failure tagged issues.

abhijat · 2022-05-06T11:01:55Z

new issues #4601 and #4602 for ci failures seen

a new context manager is added which runs a background thread, injecting actions into a redpanda cluster, and optionally reversing them.

abhijat · 2022-05-06T11:05:34Z

@rystsov fixed the name and added links to CI failures

abhijat · 2022-05-06T14:45:03Z

latest failure is instance of #4373

https://buildkite.com/redpanda/redpanda/builds/9834#c0ff37ff-72d1-4d72-9411-a8da30c66c1e/1547-7658

jcsp

This is good to go from my pov

abhijat · 2022-05-07T07:36:10Z

/backport v22.1.x

vbotbuildovich · 2022-05-07T07:37:26Z

Failed to run cherry-pick command. see workflow
I executed the below command:

git cherry-pick -x 551045ec62d06c97825e804602aa2478f9d0f382 889747877f0cdc0c16ef82e9b5499a83e595c912

abhijat requested review from dotnwat and NyaliaLui as code owners April 25, 2022 05:17

abhijat marked this pull request as draft April 25, 2022 05:17

jcsp reviewed Apr 25, 2022

View reviewed changes

tests/rptest/tests/e2e_shadow_indexing_test.py Outdated Show resolved Hide resolved

abhijat force-pushed the add-random-failure-injector branch 2 times, most recently from d95c822 to 4911872 Compare April 26, 2022 11:50

abhijat mentioned this pull request Apr 26, 2022

tests: context manager to add random failures #4441

Closed

abhijat marked this pull request as ready for review April 26, 2022 15:22

piyushredpanda requested a review from graphcareful April 26, 2022 15:48

abhijat added the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label Apr 26, 2022

graphcareful previously approved these changes Apr 26, 2022

View reviewed changes

jcsp reviewed Apr 26, 2022

View reviewed changes

rystsov suggested changes Apr 26, 2022

View reviewed changes

abhijat dismissed graphcareful’s stale review via e966021 April 27, 2022 04:45

vbotbuildovich removed the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label Apr 27, 2022

abhijat force-pushed the add-random-failure-injector branch from 1fa228d to bfff79c Compare April 27, 2022 07:06

abhijat force-pushed the add-random-failure-injector branch from 31df1a8 to eb9ecb7 Compare April 27, 2022 08:31

abhijat requested review from rystsov, jcsp and graphcareful April 27, 2022 09:23

abhijat added the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label Apr 27, 2022

vbotbuildovich removed the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label Apr 27, 2022

jcsp previously approved these changes Apr 27, 2022

View reviewed changes

rystsov suggested changes Apr 28, 2022

View reviewed changes

abhijat dismissed jcsp’s stale review via d5f909d April 28, 2022 11:05

abhijat force-pushed the add-random-failure-injector branch from 7a0ccb7 to d5f909d Compare April 28, 2022 11:05

abhijat force-pushed the add-random-failure-injector branch 2 times, most recently from 9f90dc6 to 3d95f85 Compare May 5, 2022 09:46

abhijat requested a review from rystsov May 5, 2022 09:47

abhijat force-pushed the add-random-failure-injector branch from 3d95f85 to 3d6686f Compare May 5, 2022 09:52

abhijat added the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label May 5, 2022

vbotbuildovich removed the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label May 6, 2022

rystsov previously approved these changes May 6, 2022

View reviewed changes

tests/rptest/services/action_injector.py Outdated Show resolved Hide resolved

abhijat added 2 commits May 6, 2022 16:33

tests: adds action injection context manager

551045e

a new context manager is added which runs a background thread, injecting actions into a redpanda cluster, and optionally reversing them.

tests: uses action injection in e2e SI test

8897478

abhijat dismissed rystsov’s stale review via 8897478 May 6, 2022 11:03

abhijat force-pushed the add-random-failure-injector branch from 3d6686f to 8897478 Compare May 6, 2022 11:03

abhijat requested review from rystsov and jcsp May 6, 2022 11:05

abhijat mentioned this pull request May 6, 2022

tests: adds support for node decommission, leadership transfer to context managers #4610

Closed

jcsp approved these changes May 6, 2022

View reviewed changes

abhijat merged commit d7595a6 into redpanda-data:dev May 7, 2022

abhijat mentioned this pull request May 20, 2022

[v22.1.x] tests: Add random action injector #4836

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: Add random action injector #4404

tests: Add random action injector #4404

abhijat commented Apr 25, 2022 •

edited

Loading

graphcareful left a comment

jcsp left a comment

jcsp commented Apr 26, 2022

abhijat commented Apr 27, 2022

graphcareful commented Apr 27, 2022

jcsp left a comment

rystsov left a comment

abhijat commented Apr 28, 2022

abhijat commented May 6, 2022

rystsov left a comment

rystsov commented May 6, 2022

abhijat commented May 6, 2022

abhijat commented May 6, 2022

abhijat commented May 6, 2022

jcsp left a comment

abhijat commented May 7, 2022

vbotbuildovich commented May 7, 2022

tests: Add random action injector #4404

tests: Add random action injector #4404

Conversation

abhijat commented Apr 25, 2022 • edited Loading

Cover letter

Features

graphcareful left a comment

Choose a reason for hiding this comment

jcsp left a comment

Choose a reason for hiding this comment

jcsp commented Apr 26, 2022

abhijat commented Apr 27, 2022

graphcareful commented Apr 27, 2022

jcsp left a comment

Choose a reason for hiding this comment

rystsov left a comment

Choose a reason for hiding this comment

abhijat commented Apr 28, 2022

abhijat commented May 6, 2022

rystsov left a comment

Choose a reason for hiding this comment

rystsov commented May 6, 2022

abhijat commented May 6, 2022

abhijat commented May 6, 2022

abhijat commented May 6, 2022

jcsp left a comment

Choose a reason for hiding this comment

abhijat commented May 7, 2022

vbotbuildovich commented May 7, 2022

abhijat commented Apr 25, 2022 •

edited

Loading