Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify remote directory cleanup after snapshot delete to … #12672

Merged

Conversation

harishbhakuni
Copy link
Contributor

@harishbhakuni harishbhakuni commented Mar 14, 2024

…avoid concurrent cleanup task runs for same shard.

Description

During snapshot deletion, we need to trigger remote directory cleanup after every release lock operation for indices which are already deleted and removed from cluster state. We trigger cleanup task after every release lock operation.
if there are multiple shard blobs of same shard we will trigger multiple async cleanup task for same shard directory. since remote store directory cleanup logic is not thread safe. we might end up into situations where both cleanup tasks were not able to trigger cleanup successfully.
So as part of this PR, if there is already an ongoing cleanup task for a shard, and another thread also tries to trigger the cleanup, we will just skip the next task. This might end up into some zombie data in remote store, but since the cleanup itself happens asynchronously if the cleanup fails in between it will also end up into zombie data in remote store which is needed to be handled separately.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for f5d5d78: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 14, 2024

Compatibility status:

Checks if related components are compatible with change 5d7ab9f

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

✅ Gradle check result for d0ae7d8: SUCCESS

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.40%. Comparing base (b15cb0c) to head (5d7ab9f).
Report is 31 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12672      +/-   ##
============================================
- Coverage     71.42%   71.40%   -0.02%     
- Complexity    59978    60051      +73     
============================================
  Files          4985     4989       +4     
  Lines        282275   282569     +294     
  Branches      40946    40982      +36     
============================================
+ Hits         201603   201775     +172     
- Misses        63999    64070      +71     
- Partials      16673    16724      +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for 787db62: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harishbhakuni
Copy link
Contributor Author

Gradle Check is failing due to falky tests unrelated to this change mentioned here:
#12675

Copy link
Contributor

❌ Gradle check result for 787db62: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harishbhakuni
Copy link
Contributor Author

failling test org.opensearch.ratelimitting.admissioncontrol.AdmissionControlSingleNodeTests.testAdmissionControlRejectionMonitorOnlyMode is flaky: #12664

Copy link
Contributor

❌ Gradle check result for 787db62: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Harish Bhakuni added 2 commits March 14, 2024 16:31
…urrent cleanup task runs for same shard.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Copy link
Contributor

✅ Gradle check result for 5d7ab9f: SUCCESS

@sohami sohami merged commit d0467b3 into opensearch-project:main Mar 15, 2024
30 checks passed
@sohami sohami added the backport 2.x Backport to 2.x branch label Mar 15, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 15, 2024
* Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

* Address PR Comments.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

---------

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Co-authored-by: Harish Bhakuni <hbhakuni@amazon.com>
(cherry picked from commit d0467b3)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sohami pushed a commit that referenced this pull request Mar 15, 2024
…#12682)

* Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard.



* Address PR Comments.



---------



(cherry picked from commit d0467b3)

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Harish Bhakuni <hbhakuni@amazon.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…rch-project#12672)

* Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

* Address PR Comments.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

---------

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Co-authored-by: Harish Bhakuni <hbhakuni@amazon.com>
ruai0511 pushed a commit to ruai0511/OpenSearch that referenced this pull request Mar 19, 2024
…rch-project#12672)

* Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

* Address PR Comments.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

---------

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Co-authored-by: Harish Bhakuni <hbhakuni@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…rch-project#12672)

* Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

* Address PR Comments.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

---------

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Co-authored-by: Harish Bhakuni <hbhakuni@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

2 participants