Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] [Segment Replication] Backport PR's : #3525 #3533 #3540 #3943 #3963 From main branch #4181

Conversation

Rishikesh1159
Copy link
Member

@Rishikesh1159 Rishikesh1159 commented Aug 9, 2022

mch2 and others added 4 commits August 9, 2022 22:51
…oject#3525.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@Rishikesh1159 Rishikesh1159 requested review from a team and reta as code owners August 9, 2022 23:28
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

Rishikesh1159 and others added 3 commits August 10, 2022 01:26
…rdAlreadyReplicating() (opensearch-project#3943)

* Fixing flaky test failure happening for testShardAlreadyReplicating()

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
…Replications() (opensearch-project#3963)

* Fixing flaky test failure happening for testShardAlreadyReplicating()

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
…ard class.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@Rishikesh1159 Rishikesh1159 changed the title [Backport 2.x] [Segment Replication] Backport PR's : #3525 #3533 #3540 From main branch [Backport 2.x] [Segment Replication] Backport PR's : #3525 #3533 #3540 #3943 #3963 From main branch Aug 10, 2022
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2022

Codecov Report

Merging #4181 (7f6da3f) into 2.x (7062283) will increase coverage by 0.01%.
The diff coverage is 76.64%.

@@             Coverage Diff              @@
##                2.x    #4181      +/-   ##
============================================
+ Coverage     70.45%   70.46%   +0.01%     
- Complexity    56690    56769      +79     
============================================
  Files          4563     4569       +6     
  Lines        273024   273348     +324     
  Branches      40059    40083      +24     
============================================
+ Hits         192364   192628     +264     
- Misses        64521    64524       +3     
- Partials      16139    16196      +57     
Impacted Files Coverage Δ
.../main/java/org/opensearch/index/engine/Engine.java 73.70% <ø> (+0.48%) ⬆️
...va/org/opensearch/index/engine/InternalEngine.java 74.17% <ø> (+0.88%) ⬆️
...rg/opensearch/indices/recovery/RecoveryTarget.java 70.80% <ø> (-0.03%) ⬇️
...search/indices/recovery/RecoveryTargetHandler.java 100.00% <ø> (ø)
...rch/indices/recovery/RetryableTransportClient.java 81.81% <ø> (ø)
.../replication/checkpoint/ReplicationCheckpoint.java 42.42% <0.00%> (+6.06%) ⬆️
...ckpoint/SegmentReplicationCheckpointPublisher.java 100.00% <ø> (ø)
...replication/common/ReplicationFailedException.java 12.50% <12.50%> (ø)
...s/replication/SegmentReplicationSourceService.java 56.81% <44.00%> (-24.27%) ⬇️
...s/replication/SegmentReplicationTargetService.java 71.92% <60.00%> (-6.12%) ⬇️
... and 498 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -2657,7 +2708,6 @@ public long getLocalCheckpoint() {
* Also see {@link #getLocalCheckpoint()}.
*/
public long getProcessedLocalCheckpoint() {
assert indexSettings.isSegRepEnabled();
Copy link
Member Author

@Rishikesh1159 Rishikesh1159 Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert has been removed because it is failing many unit tests in SegmentReplicationTargetServiceTests.java and SegmentReplicationIndexShardTests.java. The unit tests in these classes don't use index setting as SEGMENT, so when an assert is done on if segRepEnabled() all of the shards fails the assert and don't perform necessary unit tests.

This assert has been added to 2.x branch in this backport PR. As the original PR had method getProcessedLocalCheckpoint() in Engine.java as abstract which is a breaking. To avoid breaking breaking changes we made few changes to original PR while backporting. And this assert is part of those changes. It is not necessary to have this assert as it will any how be removed in 3.0 opensearch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the SEGMENT index setting in test classes. Can we backport #4163 if it helps ?

Copy link
Member Author

@Rishikesh1159 Rishikesh1159 Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might help actually, but let me check by backporting your PR locally and add back assert statement to see if all of the unit tests pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding back assert and use SEGMENT replication as index setting in unit tests. But got this error:

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.indices.replication.OngoingSegmentReplicationsTests.testMultipleReplicasUseSameCheckpoint" -Dtests.seed=A0971E691B2879E0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ms-MY -Dtests.timezone=Asia/Kolkata -Druntime.java=17
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([A0971E691B2879E0:A4046FDAE5B7632]:0)
        at org.opensearch.index.shard.IndexShard.getProcessedLocalCheckpoint(IndexShard.java:2711)
        at org.opensearch.indices.replication.OngoingSegmentReplicationsTests.setUp(OngoingSegmentReplicationsTests.java:73)

so there are multiple places we have to makes changes to make this work and I don't think this PR is right place to do all changes. We can do them while backporting other PR's. So for now I am removing assert statement

…ation in SegmentReplicationSourceHandlerTests and SegmentReplicationTargetServiceTests.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Rishikesh1159
Copy link
Member Author

Rishikesh1159 commented Aug 10, 2022

Flaky test:

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.indices.replication.OngoingSegmentReplicationsTests.testMultipleReplicasUseSameCheckpoint" -Dtests.seed=A0971E691B2879E0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ms-MY -Dtests.timezone=Asia/Kolkata -Druntime.java=17
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([A0971E691B2879E0:A4046FDAE5B7632]:0)
        at org.opensearch.index.shard.IndexShard.getProcessedLocalCheckpoint(IndexShard.java:2711)
        at org.opensearch.indices.replication.OngoingSegmentReplicationsTests.setUp(OngoingSegmentReplicationsTests.java:73)

Actually this is not flaky test. this happened due to changes in recent commit. Will fix this in next commit

@dreamer-89
Copy link
Member

dreamer-89 commented Aug 10, 2022

Flaky test:

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.indices.replication.OngoingSegmentReplicationsTests.testMultipleReplicasUseSameCheckpoint" -Dtests.seed=A0971E691B2879E0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ms-MY -Dtests.timezone=Asia/Kolkata -Druntime.java=17
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([A0971E691B2879E0:A4046FDAE5B7632]:0)
        at org.opensearch.index.shard.IndexShard.getProcessedLocalCheckpoint(IndexShard.java:2711)
        at org.opensearch.indices.replication.OngoingSegmentReplicationsTests.setUp(OngoingSegmentReplicationsTests.java:73)

Another appearance in #4041. Created #4184 to track it as it is failing more oftenly on CI.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Rishikesh1159
Copy link
Member Author

Rishikesh1159 commented Aug 10, 2022

Flaky test:

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.indices.replication.OngoingSegmentReplicationsTests.testMultipleReplicasUseSameCheckpoint" -Dtests.seed=A0971E691B2879E0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ms-MY -Dtests.timezone=Asia/Kolkata -Druntime.java=17
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([A0971E691B2879E0:A4046FDAE5B7632]:0)
        at org.opensearch.index.shard.IndexShard.getProcessedLocalCheckpoint(IndexShard.java:2711)
        at org.opensearch.indices.replication.OngoingSegmentReplicationsTests.setUp(OngoingSegmentReplicationsTests.java:73)

Another appearance in #4041. Created #4184 to track it as it is failing more oftenly on CI.

Sorry @dreamer-89 , I thought this is a flaky test but not. This happened because of adding "assert statement" back that we discussed in above comments. I will remove the assert to fix this

…t replication in SegmentReplicationSourceHandlerTests and SegmentReplicationTargetServiceTests."

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
This reverts commit 8c5753b.
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@tlfeng tlfeng added backport PRs or issues specific to backporting features or enhancments v2.3.0 'Issues and PRs related to version v2.3.0' feature New feature or request distributed framework labels Aug 11, 2022
@Rishikesh1159 Rishikesh1159 merged commit 97113b9 into opensearch-project:2.x Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport PRs or issues specific to backporting features or enhancments distributed framework feature New feature or request v2.3.0 'Issues and PRs related to version v2.3.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants