Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract cluster management for integration tests into JUnit test rule out of OpenSearchIntegTestCase #11877

Merged
merged 5 commits into from
Jan 18, 2024

Conversation

reta
Copy link
Collaborator

@reta reta commented Jan 12, 2024

Description

Extract cluster management for integration tests into JUnit test rules our of OpenSearchIntegTestCase. @neetikasinghal this is follow up on the work you have done for parameterizing integration test cases. We now have removed the test cluster management logic from the OpenSearchIntegTestCase to dedicated JUnit rule, which gave us an opportunity implement two favours of cluster setting parametrization:

  • ParameterizedDynamicSettingsOpenSearchIntegTestCase (was implemented by ParameterizedOpenSearchIntegTestCase ): the test clusters settings are applied dynamically before each test method
  • ParameterizedStaticSettingsOpenSearchIntegTestCase (was not supported): the clusters settings are applied at startup time

CC @andrross should open up the road for parameterized segment replication test cases

Related Issues

N/A

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@reta reta changed the title Extract cluster management for integration tests into JUnit test rules our of OpenSearchIntegTestCase [WIP] Extract cluster management for integration tests into JUnit test rules our of OpenSearchIntegTestCase Jan 12, 2024
Copy link
Contributor

github-actions bot commented Jan 12, 2024

Compatibility status:

Checks if related components are compatible with change 359e5c6

Incompatible components

Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/cross-cluster-replication.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git]

Copy link
Contributor

❌ Gradle check result for 6912b3c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 97f4803: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 1739955: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 1739955: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator Author

reta commented Jan 15, 2024

❌ Gradle check result for 1739955: FAILURE

#10006

@reta reta changed the title [WIP] Extract cluster management for integration tests into JUnit test rules our of OpenSearchIntegTestCase [WIP] Extract cluster management for integration tests into JUnit test rules out of OpenSearchIntegTestCase Jan 15, 2024
@reta reta changed the title [WIP] Extract cluster management for integration tests into JUnit test rules out of OpenSearchIntegTestCase [WIP] Extract cluster management for integration tests into JUnit test rule out of OpenSearchIntegTestCase Jan 15, 2024
Copy link
Contributor

❌ Gradle check result for 1739955: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator Author

reta commented Jan 15, 2024

❌ Gradle check result for 1739955: FAILURE

#9191
#10749
#10755

Copy link
Contributor

❕ Gradle check result for 1739955: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.classMethod
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshAfterCommit

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jan 15, 2024

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (5dd4b61) 71.38% compared to head (359e5c6) 71.36%.
Report is 9 commits behind head on main.

Files Patch % Lines
...h/aggregations/bucket/FastFilterRewriteHelper.java 87.40% 4 Missing and 13 partials ⚠️
...egations/bucket/composite/CompositeAggregator.java 88.37% 3 Missing and 2 partials ⚠️
...gregations/bucket/composite/InternalComposite.java 0.00% 2 Missing ⚠️
...ions/bucket/histogram/DateHistogramAggregator.java 84.61% 0 Missing and 2 partials ⚠️
.../bucket/histogram/AutoDateHistogramAggregator.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11877      +/-   ##
============================================
- Coverage     71.38%   71.36%   -0.03%     
- Complexity    59354    59425      +71     
============================================
  Files          4923     4923              
  Lines        279095   279178      +83     
  Branches      40559    40581      +22     
============================================
- Hits         199228   199222       -6     
- Misses        63292    63442     +150     
+ Partials      16575    16514      -61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for 37c6bcf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 22d3199: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for e428d4c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator Author

reta commented Jan 16, 2024

❌ Gradle check result for e428d4c: FAILURE

#9191
#10983
#10154

Copy link
Contributor

❌ Gradle check result for e428d4c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator Author

reta commented Jan 16, 2024

❌ Gradle check result for e428d4c: FAILURE

#9191

Copy link
Contributor

❌ Gradle check result for e428d4c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator Author

reta commented Jan 17, 2024

❌ Gradle check result for e428d4c: FAILURE

#1703

Copy link
Contributor

❕ Gradle check result for e428d4c: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteStoreStatsIT.testStatsResponseAllShards
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@neetikasinghal
Copy link
Contributor

thanks for explaining, then for what use-cases should ParameterizedDynamicSettingsOpenSearchIntegTestCase be used as it can be confusing from its name.

There are javadocs for each class (and ParameterizedOpenSearchIntegTestCase), is it helpful or there are gaps that are unclear? Would appreciate your feedback, thank you.

done

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Copy link
Contributor

❕ Gradle check result for 2bef064: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testReadRange
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reta, in general I think this is great and a better way to structure the test logic within the JUnit framework.

Copy link
Contributor

❕ Gradle check result for 359e5c6: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@reta
Copy link
Collaborator Author

reta commented Jan 18, 2024

@peternied how does it look to you?

@peternied
Copy link
Member

@peternied how does it look to you?

Thanks for checking in @reta I don't have time for a second pass today, but I'm not concerned about merging with @andrross 's approval. I'll create issues or a follow up PR if there is anything I'd like to see addressed.

@andrross andrross added the backport 2.x Backport to 2.x branch label Jan 18, 2024
@andrross andrross merged commit c267c5a into opensearch-project:main Jan 18, 2024
35 of 40 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-11877-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c267c5affa52bb3d176b81f87c9175090f7469e5
# Push it to GitHub
git push --set-upstream origin backport/backport-11877-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-11877-to-2.x.

reta added a commit to reta/OpenSearch that referenced this pull request Jan 18, 2024
… out of OpenSearchIntegTestCase (opensearch-project#11877)

* Extract cluster management for integration tests into JUnit test rule out of OpenSearchIntegTestCase

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Branched off parametrized tests into static / dynamic settings flavors

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Refactor internalCluster() method to return an Optional value

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit c267c5a)
reta added a commit that referenced this pull request Jan 19, 2024
… out of OpenSearchIntegTestCase (#11877) (#11941)

* Extract cluster management for integration tests into JUnit test rule out of OpenSearchIntegTestCase

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Branched off parametrized tests into static / dynamic settings flavors

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Refactor internalCluster() method to return an Optional value

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit c267c5a)
@Rishikesh1159
Copy link
Member

Rishikesh1159 commented Jan 23, 2024

@reta seems like we are not running all parameter cases of a parameter factory when cluster scope is default or SCOPE.SUITE. Instead we are only running the first case of parameter factory multiple times (second case is missing).

Example Use Case

In above example, when we run test class, we are only running the first case of CLUSTER_CONCURRENT_SEGMENT_SEARCH_SETTING.getKey(), false) twice. This is because we are setting the parameter values in @override nodeSettings(). But this method is called only once when the cluster is being build here. And in case of SCOPE.SUITE we only build cluster once here.

So when same test method tries to run with second/different parameters, the node settings are never overridden again with new parameter values because the cluster is already present. Node settings are overridden only when new cluster is built. So this way test is running again with same settings passed in first parameter values.

Current flow of test runs with multiple parameters and scope set to SCOPE.SUITE or default:

Build Cluster -> run testMethod1() with param1 -> run testMethod2() with param1 -> run testMethod1() with param1 -> run testMethod2() with param1 -> Remove cluster

Instead correct flow for using static settings should be :

Build Cluster -> run testMethod1() with param1 -> run testMethod2() with param1 -> Remove Cluster -> Build cluster again -> run testMethod1() with param2 -> run testMethod2() with param2-> Remove cluster

@reta
Copy link
Collaborator Author

reta commented Jan 23, 2024

@Rishikesh1159 thanks for bringing it up

So when same test method tries to run with second/different parameters, the node settings are never overridden again with new parameter values because the cluster is already present. Node settings are overridden only when new cluster is built. So this way test is running again with same settings passed in first parameter values.

The cluster should be shut down once the change of parameters is detected, so the new node settings should be picked up, I will double check shortly

@andrross
Copy link
Member

@Rishikesh1159 initializeSuiteScope() should be invoked before each method invocation, and when it detects that parameters have changed it should rebuild the cluster. Can you look into whether that is happening in the case you're testing?

@Rishikesh1159
Copy link
Member

Rishikesh1159 commented Jan 23, 2024

@andrross Yes initializeSuiteScope() is being invoked on every test method invocation. But the problem is the conditional check here if (isSuiteScopedTest(targetClass)).

We have three type of scopes:

  1. @ClusterScope(scope = Scope.SUITE) - example
  2. @ClusterScope(scope = Scope.TEST) - example
  3. @OpenSearchIntegTestCase.SuiteScopeTestCase - example

The conditional check was only detecting 3rd type of scope. But majority of tests are 1st type including this example (Use Case)

@reta
Copy link
Collaborator Author

reta commented Jan 23, 2024

@Rishikesh1159 initializeSuiteScope() should be invoked before each method invocation, and when it detects that parameters have changed it should rebuild the cluster. Can you look into whether that is happening in the case you're testing?

@andrross the @Rishikesh1159's explanation (#11877 (comment)) is correct, I was thinking exactly the way you described but it turned out initializeSuiteScope() is not the same as @ClusterScope(scope = Scope.SUITE), it should be fixed by #12000, thanks folks

@neetikasinghal
Copy link
Contributor

neetikasinghal commented Jan 23, 2024

@reta seems like we are not running all parameter cases of a parameter factory when cluster scope is default or SCOPE.SUITE. Instead we are only running the first case of parameter factory multiple times (second case is missing).

Example Use Case

In above example, when we run test class, we are only running the first case of CLUSTER_CONCURRENT_SEGMENT_SEARCH_SETTING.getKey(), false) twice. This is because we are setting the parameter values in @override nodeSettings(). But this method is called only once when the cluster is being build here. And in case of SCOPE.SUITE we only build cluster once here.

So when same test method tries to run with second/different parameters, the node settings are never overridden again with new parameter values because the cluster is already present. Node settings are overridden only when new cluster is built. So this way test is running again with same settings passed in first parameter values.

Current flow of test runs with multiple parameters and scope set to SCOPE.SUITE or default:

Build Cluster -> run testMethod1() with param1 -> run testMethod2() with param1 -> run testMethod1() with param1 -> run testMethod2() with param1 -> Remove cluster

Instead correct flow for using static settings should be :

Build Cluster -> run testMethod1() with param1 -> run testMethod2() with param1 -> Remove Cluster -> Build cluster again -> run testMethod1() with param2 -> run testMethod2() with param2-> Remove cluster

thanks @Rishikesh1159, when I was debugging one of the tests today I realized that even with concurrent search parameter set to true, it wasnt taking the concurrent search path. This explains on why it was the case.

@reta
Copy link
Collaborator Author

reta commented Jan 23, 2024

thanks @Rishikesh1159, when I was debugging one of the tests today I realized that even with concurrent search parameter set to true, it wasnt taking the concurrent search path. This explains on why it was the case.

Sorry about that @neetikasinghal , the issue slipped in

peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
… out of OpenSearchIntegTestCase (opensearch-project#11877)

* Extract cluster management for integration tests into JUnit test rule out of OpenSearchIntegTestCase

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Branched off parametrized tests into static / dynamic settings flavors

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Refactor internalCluster() method to return an Optional value

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
… out of OpenSearchIntegTestCase (opensearch-project#11877)

* Extract cluster management for integration tests into JUnit test rule out of OpenSearchIntegTestCase

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Branched off parametrized tests into static / dynamic settings flavors

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Refactor internalCluster() method to return an Optional value

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
… out of OpenSearchIntegTestCase (opensearch-project#11877)

* Extract cluster management for integration tests into JUnit test rule out of OpenSearchIntegTestCase

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Branched off parametrized tests into static / dynamic settings flavors

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Refactor internalCluster() method to return an Optional value

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants