Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] o.o.i.s.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew is flaky #10303

Open
dblock opened this issue Oct 2, 2023 · 7 comments · Fixed by #11111
Open
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote

Comments

@dblock
Copy link
Member

dblock commented Oct 2, 2023

Describe the bug

#10256 (comment)
https://build.ci.opensearch.org/job/gradle-check/26576/


REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew" -Dtests.seed=714C3129261F73E5 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=und -Dtests.timezone=Etc/GMT-3 -Druntime.java=20

org.opensearch.index.shard.RemoteIndexShardTests > testRepicaCleansUpOldCommitsWhenReceivingNew FAILED
    java.lang.AssertionError: 
    Expected: a collection with size <1>
         but: collection size was <0>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.index.shard.IndexShardTestCase.assertDocCount(IndexShardTestCase.java:1227)
        at org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew(RemoteIndexShardTests.java:307)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=3503, name=opensearch[org.opensearch.index.shard.RemoteIndexShardTests][generic][T#3], state=RUNNABLE, group=TGRP-RemoteIndexShardTests]

        Caused by:
        java.lang.AssertionError
            at __randomizedtesting.SeedInfo.seed([714C3129261F73E5]:0)
            at org.junit.Assert.fail(Assert.java:87)
            at org.junit.Assert.assertTrue(Assert.java:42)
            at org.junit.Assert.assertTrue(Assert.java:53)
            at org.opensearch.index.shard.IndexShardTestCase$4.onReplicationDone(IndexShardTestCase.java:1558)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$SegmentReplicationListener.onDone(SegmentReplicationTargetService.java:464)
            at org.opensearch.indices.replication.common.ReplicationTarget.markAsDone(ReplicationTarget.java:146)
            at org.opensearch.indices.replication.common.ReplicationCollection.markAsDone(ReplicationCollection.java:221)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onResponse(SegmentReplicationTargetService.java:516)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onResponse(SegmentReplicationTargetService.java:512)
            at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$startReplication$3(SegmentReplicationTarget.java:179)
            at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
            at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126)
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
            at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341)
            at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120)
            at org.opensearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:82)
            at org.opensearch.action.StepListener.whenComplete(StepListener.java:95)
            at org.opensearch.indices.replication.SegmentReplicationTarget.startReplication(SegmentReplicationTarget.java:177)
            at org.opensearch.indices.replication.SegmentReplicationTargetService.start(SegmentReplicationTargetService.java:512)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$ReplicationRunner.doRun(SegmentReplicationTargetService.java:498)
            at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
            at java.****/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
            at java.****/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
            at java.****/java.lang.Thread.run(Thread.java:1623)
@ashking94
Copy link
Member

#10939 (comment)

@peternied
Copy link
Member

Impacted #11068, logs.

@reta
Copy link
Collaborator

reta commented Jan 5, 2024

The issue is no fixed: https://build.ci.opensearch.org/job/gradle-check/31797/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testRepicaCleansUpOldCommitsWhenReceivingNew/

org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew

java.lang.AssertionError: Replication should complete successfully expected:<0> but was:<1>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.opensearch.index.shard.IndexShardTestCase.replicateSegments(IndexShardTestCase.java:1669)
	at org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew(RemoteIndexShardTests.java:312)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1583)
	Suppressed: java.lang.AssertionError: 
Expected: <[doc{id='1 seqNo=0 primaryTerm=45 version=1 source= {}}]>
     but: was <[]>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:964)
		at org.junit.Assert.assertThat(Assert.java:930)
		at org.opensearch.index.replication.OpenSearchIndexLevelReplicationTestCase$ReplicationGroup.close(OpenSearchIndexLevelReplicationTestCase.java:607)
		at org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew(RemoteIndexShardTests.java:304)
		... 38 more

@BhumikaSaini-Amazon
Copy link
Contributor

...

WARNING: System::setSecurityManager will be removed in a future r│
elease                                                           │
                                                                 │
BUILD SUCCESSFUL in 22s                                          │
53 actionable tasks: 1 executed, 52 up-to-date                   │
=================================================================│
==                                                               │
23530                                                            │
=================================================================│
==                                                               │
=======================================                          │
OpenSearch Build Hamster says Hello!                             │
  Gradle Version        : 8.7                                    │
  OS Info               : <redacted>                             │
  JDK Version           : 11 (Amazon Corretto JDK)               │
  JAVA_HOME             : /usr/lib/jvm/java-11-amazon-corretto   │
  Random Testing Seed   : 9C9CDE0BBD909DF0                       │
  In FIPS 140 mode      : false                                  │
=======================================                          │
                                                                 │
> Task :server:test                                              │
Apr 01, 2024 10:39:28 AM sun.util.locale.provider.LocaleProviderA│
dapter <clinit>                                                  │
WARNING: COMPAT locale provider will be removed in a future relea│
se

...

Unable to repro this on the latest main branch even after 23k+ unique master seeds. I don't see any recent reports for this issue either.

I will check if this could be somehow related to concurrent runs in Jenkins.

@BhumikaSaini-Amazon
Copy link
Contributor

WARNING: Please consider reporting this to the maintainers of org│
.gradle.api.internal.tasks.testing.worker.TestWorker             │
WARNING: System::setSecurityManager will be removed in a future r│
elease                                                           │
                                                                 │
BUILD SUCCESSFUL in 14s                                          │
53 actionable tasks: 1 executed, 52 up-to-date                   │
=================================================================│
==                                                               │
45825                                                            │
=================================================================│
==                                                               │
=======================================                          │
OpenSearch Build Hamster says Hello!                             │
  Gradle Version        : 8.7                                    │
  OS Info               : Linux 5.10.210-

No repro in 45k+ iters. Nothing unusual in Jenkins.

@BhumikaSaini-Amazon BhumikaSaini-Amazon closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
@chishui
Copy link
Contributor

chishui commented Jul 25, 2024

Still see this issue happening:
https://build.ci.opensearch.org/job/gradle-check/42426/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testRepicaCleansUpOldCommitsWhenReceivingNew/

java.lang.AssertionError: Replication should complete successfully expected:<0> but was:<1>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.opensearch.index.shard.IndexShardTestCase.replicateSegments(IndexShardTestCase.java:1702)
	at org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew(RemoteIndexShardTests.java:313)

@linuxpi
Copy link
Collaborator

linuxpi commented Jul 25, 2024

[Storage Triage - attendees 1 2 3 4 5 6 7 8]

Thanks for filing this issue, please feel free to raise PR to add a fix!

@linuxpi linuxpi removed the untriaged label Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote
Projects
Status: Ready To Be Picked
Development

Successfully merging a pull request may close this issue.

10 participants