Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Add support to provide separate segment metadata repository #12993

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Apr 1, 2024

Description

  • Currently, when we configure remote store for a cluster, both data and metadata (of segment, translog and cluster state) is stored in the same store.
  • But the access pattern and consistency requirements for data and metadata are different.
  • For example, data files are uploaded and downloaded by providing a filename whereas metadata is fetched based on certain characteristics such as recency.
  • Due to these differences, it is possible that metadata can be stored in a different data store (for example, a key-value store) as we store the data files in a blob store.
  • In this PR, we provide support to provide a separate repository for segment metadata. This will be followed by separation of translog metadata repository.

Related Issues

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Apr 1, 2024

Compatibility status:

Checks if related components are compatible with change 95d18c7

Incompatible components

Incompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

github-actions bot commented Apr 1, 2024

❌ Gradle check result for 0dea8c1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 1, 2024

❌ Gradle check result for 940ce2e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 1, 2024

❌ Gradle check result for eb0918f: ABORTED

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 3, 2024

❌ Gradle check result for 87ce493: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 3, 2024

❌ Gradle check result for a844c8a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 4, 2024

❕ Gradle check result for d260b1c: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

github-actions bot commented Apr 4, 2024

✅ Gradle check result for 2852ca4: SUCCESS

Copy link

codecov bot commented Apr 4, 2024

Codecov Report

Attention: Patch coverage is 73.91304% with 18 lines in your changes missing coverage. Please review.

Project coverage is 71.44%. Comparing base (b15cb0c) to head (2852ca4).
Report is 373 commits behind head on main.

Current head 2852ca4 differs from pull request most recent head 95d18c7

Please upload reports for the commit 95d18c7 to get more accurate results.

Files Patch % Lines
...rch/node/remotestore/RemoteStoreNodeAttribute.java 43.75% 7 Missing and 2 partials ⚠️
...org/opensearch/cluster/metadata/IndexMetadata.java 76.92% 2 Missing and 1 partial ⚠️
...h/cluster/metadata/MetadataCreateIndexService.java 77.77% 1 Missing and 1 partial ⚠️
...c/main/java/org/opensearch/index/IndexService.java 0.00% 2 Missing ⚠️
...ndex/store/RemoteSegmentStoreDirectoryFactory.java 95.00% 0 Missing and 1 partial ⚠️
...in/java/org/opensearch/indices/IndicesService.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12993      +/-   ##
============================================
+ Coverage     71.42%   71.44%   +0.02%     
- Complexity    59978    60419     +441     
============================================
  Files          4985     5026      +41     
  Lines        282275   284475    +2200     
  Branches      40946    41202     +256     
============================================
+ Hits         201603   203237    +1634     
- Misses        63999    64389     +390     
- Partials      16673    16849     +176     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, with increased repositories whats the impact to node attributes and join validations?

Sachin Kale added 7 commits April 12, 2024 10:48
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Copy link
Contributor

❌ Gradle check result for 1391737: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sachin Kale <kalsac@amazon.com>
Copy link
Contributor

❌ Gradle check result for 95d18c7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

SETTING_REMOTE_SEGMENT_STORE_REPOSITORY,
SETTING_REMOTE_TRANSLOG_STORE_REPOSITORY
SETTING_REMOTE_SEGMENT_STORE_DATA_REPOSITORY,
SETTING_REMOTE_TRANSLOG_STORE_DATA_REPOSITORY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add the metadata repo as well to unmodified settings during snapshot restore?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We will need corresponding ITs as well to restore .

@@ -2702,7 +2702,7 @@ public void snapshotRemoteStoreIndexShard(
indexTotalNumberOfFiles,
indexTotalFileSize,
store.indexSettings().getUUID(),
store.indexSettings().getRemoteStoreRepository(),
store.indexSettings().getRemoteSegmentStoreDataRepository(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to add another parameter for metadata repository as well since lock files are in metadata repo. currently the remoteStoreRepository is used for both source data and lock files.

https://github.com/linuxpi/OpenSearch/blob/08e8325f03cd7f0c205c74bd9892677fb3cd4b59/server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java#L669-L675

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Thanks for catching this, let me make the required change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock files are stored separately stored and not in metadata repo .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock file path is generated with segment repository when data and metadata is the same repo.
Once we separate data and metadata, we need to associate lock files with one of these repos. As per the comment: #12993 (comment), it makes more sense to have lock files stored in metadata repo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to add sourceRemoteStoreMetadataRepository as well to RestoreSnapshotRequest .

@gbbafna gbbafna requested review from harishbhakuni and removed request for adnapibar and abbashus April 22, 2024 12:16
@@ -407,6 +407,7 @@ void recoverFromSnapshotAndRemoteStore(
threadPool
);
RemoteSegmentStoreDirectory sourceRemoteDirectory = (RemoteSegmentStoreDirectory) directoryFactory.newDirectory(
remoteStoreRepository,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, right now we pull this remote store repository value from shard level snapshot metadata, we need to make related changes there to update that with metadata directory now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, i don't like the idea of using same repository name here for metadata and data, we should maybe create a remote directory instance from directoryFactory which have access to metadata repository only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked more on this, actually we will need segments_N file for restoring snapshot, which is in data directory, so we will need to add both segment data and metadata repository info in snapshot metadata. unless we keep segments_N file in metadata directory as well.

@@ -903,6 +903,7 @@ public static void remoteDirectoryCleanup(
) {
try {
RemoteSegmentStoreDirectory remoteSegmentStoreDirectory = (RemoteSegmentStoreDirectory) remoteDirectoryFactory.newDirectory(
remoteStoreRepoForIndex,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well, as mentioned in above comment, we can create the same remoteDirectory instance which have access only to metadata repository.

@@ -407,6 +407,7 @@ void recoverFromSnapshotAndRemoteStore(
threadPool
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with the doc changes of this feature, we need to update doc for source_remote_store_repository which will be now used to override remote segment metadata directory: https://opensearch.org/docs/latest/api-reference/snapshots/restore-snapshot/

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants