Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Cluster Manager Async Shard Fetch Revamp #8098

Open
5 of 13 tasks
amkhar opened this issue Jun 16, 2023 · 1 comment
Open
5 of 13 tasks

[META] Cluster Manager Async Shard Fetch Revamp #8098

amkhar opened this issue Jun 16, 2023 · 1 comment
Assignees
Labels
Cluster Manager distributed framework enhancement Enhancement or improvement to existing feature or request Roadmap:Stability/Availability/Resiliency Project-wide roadmap label

Comments

@amkhar
Copy link
Contributor

amkhar commented Jun 16, 2023

Describe the revamp

Original Issue opened/ Project goal : #5098

RFC/Proposal - #5098 (comment)
Please read above issue in detail before starting any discussion, thanks.

Impact of the change

We'll track following as subtasks of this goal. I'll keep adding new issues, if we find more ways to improve this behaviour.

To Reproduce
Steps given in Draft PR #7269

Expected behavior
Cluster manager should be resilient to node restarts in high number.

@amkhar
Copy link
Contributor Author

amkhar commented Jul 31, 2023

PRs and draft PRs raised for above sub tasks

  1. Transport action request response class for primary Added transport action for bulk async shard fetch for primary shards #8218
  2. Transport action request response class for replica Add batch async shard fetch transport action for replica #8218 #8356
  3. PrimaryShardAllocator Refactoring to re-use existing code PrimaryShardAllocator refactor to abstract out shard state and method calls #9760
  4. AsyncShardFetch class refactoring to re-use existing code Batch Async Fetcher class changes #8742
  5. ReplicaShardAllocator Refactoring to re-use existing code Refactored the RSA to make it more extensible #10254
  6. BaseGatewayShardsAllocator class - draft PR (BaseGatewayShardAllocator changes for Assigning the batch of shards #8776)
  7. PrimaryShardBatchAllocator class - draft PR Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards #8916
  8. ReplicaShardBatchAllocator class - draft PR Created new ReplicaShardBatchAllocator #8992
  9. GatewayAllocator class - part 1 draft PR (Async Batch shards changes for GatewayAllocator #8746), part 2 draft PR (FetchData changes for primaries and replicas #8865)
  10. AllocationService - actual calling is being done here. partial draft PR (Allocation service changes for batch assignment #8888)
  11. Allocation Explain API - Fixed Allocation Explain API in batch mode #10348
  12. Refactor AsyncShardFetch cache structure to allow batch mode cache restructuring Abstract AsyncShardFetch cache to allow restructuring for other caching strategies #12441
  13. Add ShardBatchCache to handle batch transport action responses Add ShardBatchCache to support caching for TransportNodesListGatewayStartedShardsBatch #12504

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cluster Manager distributed framework enhancement Enhancement or improvement to existing feature or request Roadmap:Stability/Availability/Resiliency Project-wide roadmap label
Projects
Status: Now(This Quarter)
Status: New
Development

No branches or pull requests

6 participants