-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Add random action injector #4404
tests: Add random action injector #4404
Conversation
d95c822
to
4911872
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice job, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structure looks great.
Some comments - I think I might have commented on code that isn't used yet (e.g. the decom/recom actions)
Thank you for adding release notes, but those are usually used for customer-facing release documentation, so we generally don't mention changes to our internal test code. |
1fa228d
to
bfff79c
Compare
@jcsp @rystsov I will focus this PR on the action which kills processes. I have reduced the NodeDecommission and LeadershipTransfer actions to skeleton classes, and will focus on concrete implementation for them in subsequent PRs after more research based on the comments here. If it is better that I should remove the skeleton classes altogether, please let me know. |
31df1a8
to
eb9ecb7
Compare
One comment about our PR flow, when iterating we don't push commits on top of already reviewed commits, we just edit/rebase existing commits. If you rebase with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM when CI passes: I like the overall structure + am okay with having the not-yet-implemented actions in there as stubs.
Needs a re-check from @rystsov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to remove the code we don't use. Code has gravity so the more we push the harder it change later; also it creates more load on the reviewer and if we never follow up it's instant dead code.
Also please follow @graphcareful advice and rewrite the history, we tend to care about the shape of the commit history - https://github.com/redpanda-data/redpanda/blob/dev/CONTRIBUTING.md#commit-history
7a0ccb7
to
d5f909d
Compare
@graphcareful I have used force push with I have also added links to description of force push in the cover letter after discussing with Evgeny so it is easier for reviewers. |
9f90dc6
to
3d95f85
Compare
3d95f85
to
3d6686f
Compare
@rystsov please review, I have cleaned up the restoration code and replaced log with boolean that the test can use to assert internal state of the thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Some of the tests are failing, please for each failing test check if it's a know flaky test if it's then add a link to this build to the flaky issue and comment this PR to reassure that the failing tests aren't related to this PR. For new failing tests you should investigate if it's related to this PR. If they are - fix the PR, if they aren't then you should open ci-failure tagged issues. |
a new context manager is added which runs a background thread, injecting actions into a redpanda cluster, and optionally reversing them.
3d6686f
to
8897478
Compare
@rystsov fixed the name and added links to CI failures |
latest failure is instance of #4373 https://buildkite.com/redpanda/redpanda/builds/9834#c0ff37ff-72d1-4d72-9411-a8da30c66c1e/1547-7658 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to go from my pov
/backport v22.1.x |
Failed to run cherry-pick command. see workflow
|
Cover letter
Adds random action injector utilities, aimed to be used for e2e tests. The failures could be process failures, node decommission, leadership transfer etc, controlled by a context manager.
The action injector runs on a thread and periodically introduces changes on randomly selected nodes in the cluster.
Features
Changes in force push
Changes in force push
Changes in force push:
Changes in force push
Changes in force push