Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix boundary condition in indexing pressure test #5168

Merged

Conversation

andrross
Copy link
Member

@andrross andrross commented Nov 9, 2022

This updates the boundary condition in an assertion in two tests in ShardIndexingPressureConcurrentExecutionTests. I could reliably reproduce errors here by running:

./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"

On every error the value that failed was exactly 0.95 and failed the less than check. The change here is to accept 0.95, and also refactor the test to give a better error message on failure.

Issues Resolved

Closes #2241
Closes #4215

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@andrross andrross marked this pull request as ready for review November 9, 2022 00:19
@andrross andrross requested review from a team and reta as code owners November 9, 2022 00:19
(double) (NUM_THREADS * 15) / shardIndexingPressure.shardStats()
.getIndexingPressureShardStats(shardId1)
.getCurrentPrimaryAndCoordinatingLimits() > 0.75
.getCurrentPrimaryAndCoordinatingLimits(),
allOf(greaterThan(0.75), lessThanOrEqualTo(0.95))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@getsaurabh02 To be honest I don't fully understand this assertion. Can you verify it is safe/correct to change this assertion from "less than 0.95" to "less than or equal to 0.95"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional memory allocation per shard is governed by its peak memory utilization threshold of the current allocation. This is configurable setting and defaults to 95%.

Once the current allocation is greater than this threshold, the new shard limits are calculated and allocated. Since this condition checks for current value to be greater than, it should be safe to assume that the current limit can reach until the 95% (0.95) utilization. Hence "less than or equal to 0.95" should be good.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Gradle Check (Jenkins) Run Completed with:

(double) (NUM_THREADS * 15) / shardIndexingPressure.shardStats()
.getIndexingPressureShardStats(shardId1)
.getCurrentPrimaryAndCoordinatingLimits() > 0.75
.getCurrentPrimaryAndCoordinatingLimits(),
allOf(greaterThan(0.75), lessThanOrEqualTo(0.95))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional memory allocation per shard is governed by its peak memory utilization threshold of the current allocation. This is configurable setting and defaults to 95%.

Once the current allocation is greater than this threshold, the new shard limits are calculated and allocated. Since this condition checks for current value to be greater than, it should be safe to assume that the current limit can reach until the 95% (0.95) utilization. Hence "less than or equal to 0.95" should be good.

@andrross
Copy link
Member Author

andrross commented Nov 9, 2022

Ironically a different shard indexing pressure test failed:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.index.ShardIndexingPressureIT.testShardIndexingPressureTrackingDuringBulkWrites" -Dtests.seed=567CA44C57BDC19B -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sr-Latn-ME -Dtests.timezone=Pacific/Palau -Druntime.java=19

org.opensearch.index.ShardIndexingPressureIT > testShardIndexingPressureTrackingDuringBulkWrites FAILED
    java.lang.AssertionError: 
    Expected: <22616L>
         but: was <22954L>
        at __randomizedtesting.SeedInfo.seed([567CA44C57BDC19B:DC58B8B44B77E953]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.index.ShardIndexingPressureIT.testShardIndexingPressureTrackingDuringBulkWrites(ShardIndexingPressureIT.java:209)

#5176

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Gradle Check (Jenkins) Run Completed with:

This updates the boundary condition in an assertion in two tests in
ShardIndexingPressureConcurrentExecutionTests. I could reliably
reproduce errors here by running:

```
./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"
```

On every error the value that failed was exactly 0.95 and failed the
less than check. The change here is to accept 0.95, and also refactor
the test to give a better error message on failure.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Gradle Check (Jenkins) Run Completed with:

@Poojita-Raj Poojita-Raj merged commit 3423f44 into opensearch-project:main Nov 9, 2022
@andrross andrross added backport 2.x Backport to 2.x branch backport 1.x labels Nov 9, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 9, 2022
This updates the boundary condition in an assertion in two tests in
ShardIndexingPressureConcurrentExecutionTests. I could reliably
reproduce errors here by running:

```
./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"
```

On every error the value that failed was exactly 0.95 and failed the
less than check. The change here is to accept 0.95, and also refactor
the test to give a better error message on failure.

Signed-off-by: Andrew Ross <andrross@amazon.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 3423f44)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 9, 2022
This updates the boundary condition in an assertion in two tests in
ShardIndexingPressureConcurrentExecutionTests. I could reliably
reproduce errors here by running:

```
./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"
```

On every error the value that failed was exactly 0.95 and failed the
less than check. The change here is to accept 0.95, and also refactor
the test to give a better error message on failure.

Signed-off-by: Andrew Ross <andrross@amazon.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 3423f44)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@andrross andrross added the backport 2.4 Backport to 2.4 branch label Nov 9, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 9, 2022
This updates the boundary condition in an assertion in two tests in
ShardIndexingPressureConcurrentExecutionTests. I could reliably
reproduce errors here by running:

```
./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"
```

On every error the value that failed was exactly 0.95 and failed the
less than check. The change here is to accept 0.95, and also refactor
the test to give a better error message on failure.

Signed-off-by: Andrew Ross <andrross@amazon.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 3423f44)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request Nov 9, 2022
This updates the boundary condition in an assertion in two tests in
ShardIndexingPressureConcurrentExecutionTests. I could reliably
reproduce errors here by running:

```
./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"
```

On every error the value that failed was exactly 0.95 and failed the
less than check. The change here is to accept 0.95, and also refactor
the test to give a better error message on failure.

Signed-off-by: Andrew Ross <andrross@amazon.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 3423f44)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Poojita-Raj pushed a commit that referenced this pull request Nov 9, 2022
This updates the boundary condition in an assertion in two tests in
ShardIndexingPressureConcurrentExecutionTests. I could reliably
reproduce errors here by running:

```
./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"
```

On every error the value that failed was exactly 0.95 and failed the
less than check. The change here is to accept 0.95, and also refactor
the test to give a better error message on failure.

Signed-off-by: Andrew Ross <andrross@amazon.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 3423f44)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Poojita-Raj pushed a commit that referenced this pull request Nov 9, 2022
This updates the boundary condition in an assertion in two tests in
ShardIndexingPressureConcurrentExecutionTests. I could reliably
reproduce errors here by running:

```
./gradlew ':server:test' -Dtests.iters=10000 --tests "org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimits"
```

On every error the value that failed was exactly 0.95 and failed the
less than check. The change here is to accept 0.95, and also refactor
the test to give a better error message on failure.

Signed-off-by: Andrew Ross <andrross@amazon.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 3423f44)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@andrross andrross deleted the shard-indexing-pressure-test branch December 16, 2022 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.x backport 2.x Backport to 2.x branch backport 2.4 Backport to 2.4 branch skip-changelog
Projects
None yet
3 participants