Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction #14372

Merged
merged 1 commit into from
Jun 15, 2024

Conversation

andrross
Copy link
Member

The problem is that this test would decommission one of six nodes. The tear down logic of the test would attempt to assert on the health of the cluster by randomly selecting a node and requesting the cluster health. If this random check happened to select the node that was decommissioned, then the test would fail. The fix is to recommission the node at the end of the test.

Also, the "recommission node and assert cluster health" logic was used in multiple places and could be refactored out to a helper method.

Resolves #14290
Resolves #12197

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 77a9432: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Collaborator

@reta reta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @andrross !

The problem is that this test would decommission one of six nodes. The
tear down logic of the test would attempt to assert on the health of the
cluster by randomly selecting a node and requesting the cluster health.
If this random check happened to select the node that was
decommissioned, then the test would fail. The fix is to recommission
the node at the end of the test.

Also, the "recommission node and assert cluster health" logic was used
in multiple places and could be refactored out to a helper method.

Resolves opensearch-project#14290
Resolves opensearch-project#12197

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross
Copy link
Member Author

Pushed an update to fix spotless errors

Copy link
Contributor

❕ Gradle check result for af6d75a: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.gateway.RecoveryFromGatewayIT.testMultipleReplicaShardAssignmentWithDelayedAllocationAndDifferentNodeStartTimeInBatchMode

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jun 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.74%. Comparing base (b15cb0c) to head (af6d75a).
Report is 432 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14372      +/-   ##
============================================
+ Coverage     71.42%   71.74%   +0.31%     
- Complexity    59978    62165    +2187     
============================================
  Files          4985     5118     +133     
  Lines        282275   291829    +9554     
  Branches      40946    42188    +1242     
============================================
+ Hits         201603   209359    +7756     
- Misses        63999    65233    +1234     
- Partials      16673    17237     +564     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reta reta merged commit 0d38d14 into opensearch-project:main Jun 15, 2024
31 checks passed
@reta reta added the backport 2.x Backport to 2.x branch label Jun 15, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 15, 2024
…#14372)

The problem is that this test would decommission one of six nodes. The
tear down logic of the test would attempt to assert on the health of the
cluster by randomly selecting a node and requesting the cluster health.
If this random check happened to select the node that was
decommissioned, then the test would fail. The fix is to recommission
the node at the end of the test.

Also, the "recommission node and assert cluster health" logic was used
in multiple places and could be refactored out to a helper method.

Resolves #14290
Resolves #12197

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 0d38d14)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
reta pushed a commit that referenced this pull request Jun 15, 2024
…#14372) (#14376)

The problem is that this test would decommission one of six nodes. The
tear down logic of the test would attempt to assert on the health of the
cluster by randomly selecting a node and requesting the cluster health.
If this random check happened to select the node that was
decommissioned, then the test would fail. The fix is to recommission
the node at the end of the test.

Also, the "recommission node and assert cluster health" logic was used
in multiple places and could be refactored out to a helper method.

Resolves #14290
Resolves #12197


(cherry picked from commit 0d38d14)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@andrross andrross deleted the aad-test-fix branch June 15, 2024 17:38
harshavamsi pushed a commit to harshavamsi/OpenSearch that referenced this pull request Jul 12, 2024
…opensearch-project#14372)

The problem is that this test would decommission one of six nodes. The
tear down logic of the test would attempt to assert on the health of the
cluster by randomly selecting a node and requesting the cluster health.
If this random check happened to select the node that was
decommissioned, then the test would fail. The fix is to recommission
the node at the end of the test.

Also, the "recommission node and assert cluster health" logic was used
in multiple places and could be refactored out to a helper method.

Resolves opensearch-project#14290
Resolves opensearch-project#12197

Signed-off-by: Andrew Ross <andrross@amazon.com>
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Jul 24, 2024
…opensearch-project#14372) (opensearch-project#14376)

The problem is that this test would decommission one of six nodes. The
tear down logic of the test would attempt to assert on the health of the
cluster by randomly selecting a node and requesting the cluster health.
If this random check happened to select the node that was
decommissioned, then the test would fail. The fix is to recommission
the node at the end of the test.

Also, the "recommission node and assert cluster health" logic was used
in multiple places and could be refactored out to a helper method.

Resolves opensearch-project#14290
Resolves opensearch-project#12197

(cherry picked from commit 0d38d14)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: kkewwei <kkewwei@163.com>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
…opensearch-project#14372)

The problem is that this test would decommission one of six nodes. The
tear down logic of the test would attempt to assert on the health of the
cluster by randomly selecting a node and requesting the cluster health.
If this random check happened to select the node that was
decommissioned, then the test would fail. The fix is to recommission
the node at the end of the test.

Also, the "recommission node and assert cluster health" logic was used
in multiple places and could be refactored out to a helper method.

Resolves opensearch-project#14290
Resolves opensearch-project#12197

Signed-off-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autocut backport 2.x Backport to 2.x branch bug Something isn't working Cluster Manager flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.
Projects
Status: ✅ Done
2 participants