Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport] Handle shard over allocation during partial zone/rack or independent … #1268

Merged
merged 7 commits into from
Feb 7, 2022

Conversation

Bukhtawar
Copy link
Collaborator

…node failures (#1149)

The changes ensure that in the event of a partial zone failure, the surviving nodes in the minority zone don't get overloaded with shards, this is governed by a skewness limit.

Signed-off-by: Bukhtawar Khan bukhtawa@amazon.com

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…node failures (opensearch-project#1149)

The changes ensure that in the event of a partial zone failure, the surviving nodes in the minority zone don't get overloaded with shards, this is governed by a skewness limit.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 43052f4

@opensearch-ci-bot
Copy link
Collaborator

✅   DCO Check Passed 43052f4

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 43052f4

@dblock
Copy link
Member

dblock commented Sep 28, 2021

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 43052f4
Log 585

Reports 585

@dblock
Copy link
Member

dblock commented Sep 28, 2021

This needs to be rebased before retrying gradle check, thx.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   DCO Check Passed 0f12b99

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 0f12b99

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 0f12b99

@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@dblock
Copy link
Member

dblock commented Jan 6, 2022

Sorry, needs another rebase. I'll merge right away.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c8b89b1f787cd4be3551e56975ee72be94367df9
Log 2247

Reports 2247

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 898ceb8e97c4884f6eb2c0834db911a6ccf700a0
Log 2248

Reports 2248

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 0f12b99
Log 2249

Reports 2249

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure b8169f7
Log 2250

Reports 2250

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure b02104d
Log 2251

Reports 2251

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure ee8acdd
Log 2252

Reports 2252

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 69dc1b5
Log 2253

Reports 2253

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c57fd23
Log 2256

Reports 2256

@Bukhtawar
Copy link
Collaborator Author

@dblock Looks unrelated, let me know if this PR needs more work

Tests with failures:
 - org.opensearch.transport.netty4.OpenSearchLoggingHandlerIT.testLoggingHandler

13 tests completed, 1 failed, 1 skipped

@dblock
Copy link
Member

dblock commented Feb 7, 2022

OpenSearchLoggingHandlerIT

I believe this was fixed in #1900, if it fails again in the same way make sure to rebase.

@dblock
Copy link
Member

dblock commented Feb 7, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c57fd23
Log 2261

Reports 2261

@Bukhtawar
Copy link
Collaborator Author

Tests with failures:
 - org.opensearch.discovery.SnapshotDisruptionIT.testDisruptionAfterShardFinalization

2633 tests completed, 1 failed, 4 skipped

@dblock
Copy link
Member

dblock commented Feb 7, 2022

For any random test failure:

  1. See if it has happened before by searching through issues, and add a link to that issue.
  2. Open an issue if not 1, I just did for this one, [BUG] org.opensearch.discovery.SnapshotDisruptionIT.testDisruptionAfterShardFinalization intermittent failure #2062
  3. Restart tests :(

@dblock
Copy link
Member

dblock commented Feb 7, 2022

start gradle check

@dblock
Copy link
Member

dblock commented Feb 7, 2022

We have 23 open issues on flakey integration tests, https://github.com/opensearch-project/OpenSearch/labels/Flakey%20Random%20Test%20Failure, a bunch had been fixed, but help definitely wanted.

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success c57fd23
Log 2262

Reports 2262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants