feat: add average replicas assign proposal #5225

ipsum-0320 · 2024-07-18T09:38:03Z

What type of PR is this?

/kind feature
/kind documentation

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes part of #5159

Special notes for your reviewer: @whitewindmills

Does this PR introduce a user-facing change?:

NONE

Other related issues: #4085

whitewindmills · 2024-07-18T11:48:50Z

/retest

codecov-commenter · 2024-07-18T12:02:11Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 98.24561% with 1 line in your changes missing coverage. Please review.

Project coverage is 34.22%. Comparing base (ef7d528) to head (68134e6).

Files with missing lines	Patch %	Lines
.../scheduler/core/spreadconstraint/group_clusters.go	98.24%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5225      +/-   ##
==========================================
+ Coverage   34.15%   34.22%   +0.06%     
==========================================
  Files         643      643              
  Lines       44503    44551      +48     
==========================================
+ Hits        15201    15248      +47     
- Misses      28145    28146       +1     
  Partials     1157     1157

Flag	Coverage Δ
unittests	`34.22% <98.24%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

whitewindmills · 2024-07-19T06:08:28Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

@@ -0,0 +1,263 @@
+---
+title: Karmada distributes the replicas evenly based on spread constraint


can you re-summarize the title? in fact, we did not consider spread constraints during the design.

whitewindmills · 2024-07-19T06:11:25Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+Therefore, we plan to introduce a new replica allocation strategy called AverageReplicas for Karmada's scheduler. This strategy will support evenly distributing the target replicas among the selected clusters, and it will:
+
+1. Adhere to spread constraints.
+2. Consider the available resources in the working clusters.


we usually call them member clusters.

whitewindmills · 2024-07-19T06:14:36Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+
+By adding the AverageReplicas replica allocation strategy, we ensure that replicas can be evenly distributed among the selected clusters as much as possible. This even distribution means:
+
+1. The allocation results adhere to the spread constraint (Selection phase).


I guess the point is to make sure that it's as evenly distributed as possible, not spread constraints.

I guess the point is to make sure that it's as evenly distributed as possible, not spread constraints.
@whitewindmills yeah, I only mentioned spread constraints in passing... I will remove this part.

whitewindmills · 2024-07-19T06:20:18Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+spec:
+	placement:
+		replicaScheduling: 
+			replicaDivisionPreference: Average


you need to fix the format of this yaml.
BTW, can you provide an example?

whitewindmills · 2024-07-19T06:24:49Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+可以看到，如果要添加一个 AverageReplicas 副本分配策略，可以考虑给 replicaDivisionPreference 添加一个新的取值 Average。
+-->
+
+Note that `replicaDivisionPreference == Average` will only be effective when `replicaSchedulingType == Divided`. Additionally, since `selectBestClustersByRegion` currently does not consider whether the selected clusters can accommodate the corresponding number of replica resources, **the AverageReplicas strategy is only applicable to `selectBestClustersByCluster`.**


I don't get it, we should only focus on the Assign phase, as for what the spread constraints are, we don't care.

whitewindmills · 2024-07-19T06:32:30Z

invite these people to help with the review.
/cc @RainbowMango @XiShanYongYe-Chang @chaunceyjiang @chaosi-zju @zhzhuang-zju

ipsum-0320 · 2024-07-19T06:33:40Z

@whitewindmills yeah, thanks, I have received your comments and will make some changes.

ipsum-0320 · 2024-07-19T08:54:57Z

/retest

karmada-bot · 2024-07-19T08:55:18Z

@ipsum-0320: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

chaunceyjiang · 2024-07-25T03:07:41Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+
+## Motivation
+
+By adding the AverageReplicas replica allocation strategy, Karmada can distribute the target replicas as evenly as possible across the member clusters. At the same time, this even distribution will adhere to spread constraints and consider the available resources in the member clusters.


If I set it to StaticWeight, and it's 1:1:1:1:1, can it achieve the average behavior you mentioned?

So I suggest you need to elaborate on the differences between your proposal and StaticWeight.

In fact, I have already considered this, as follows.

Therefore, we plan to introduce a new replica allocation strategy called AverageReplicas for Karmada's scheduler. This strategy will support evenly distributing the target replicas among the selected clusters, and it will:

Adhere to spread constraints.

Consider the available resources in the member clusters.

The StaticWeight does not consider spread constraints. There is some infomation about your question:

click me

How about adding that difference to the current proposal?

How about adding that difference to the current proposal?

yeah, i have added.

Looking forward to your feedback on #4805 (comment).

For now, I can see two approaches:

Enhance the StaticWeight

Introduce a new AverageReplicas as proposed by this PR.

No matter which solution we are going to take, it's a great chance for us to revisit the StaticWeight feature, and make a clear behavior.

warjiang · 2024-07-30T06:42:17Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+The logic here is essentially to determine how many replicas should be allocated to each cluster. According to the project's objectives, in the Assign phase, replicas should be evenly distributed among the selected clusters as much as possible. First, we need to calculate the ideal number of replicas each cluster should be allocated:
+
+1. Define the number of selected clusters as L。
+2. Calculate the average number of replicas each cluster should be allocated, defined as avg_rep = Replica / L。


How to solve the non-divisible scenario, like L=3 and Replica=8, then the avg_rep = 2, follow the algorithm metioned below, only avg_rep * L = 6 may be allocated.
I think we should set the sumPendingReplica=Replica - L * avg_rep = 2.

I have refined the explanation of this issue, thanks.

warjiang · 2024-07-30T06:46:33Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+   <!--
+   b.如果 AvailableReplica <= 该 Cluster 对应的 avg_rep，那么给该 Cluster 分配满额的 AvailableReplica 个实例，同时将剩余的待分配 pendingReplica，也就是 avg_rep - AvailableReplica，添加到变量 sumPendingReplica（这是一个全局变量，不与任何一个 Cluster 绑定）中，该变量初始时为零。
+   -->
+2. After completing the first round of traversal, first sort the unFullCluster by Cluster.Score, then start traversing the unFullCluster. For each cluster in it, allocate one replica, and decrement both the cluster's AvailableReplica and sumPendingReplica by 1. Then check if the cluster's AvailableReplica is zero, and if so, remove that cluster from the unFullCluster.


When i first view the proposal, I have misunderstanding because I cut the sentence not in a right way, should we use Cluster.Score instead of Cluster.Score, just a suggestion, not necessary.

Yes, I have noticed this issue. Thank you for your suggestion. @warjiang

ipsum-0320 · 2024-07-30T13:09:52Z

i have update the proposal, everyone is welcome to give me some feedback. /cc @whitewindmills @warjiang @RainbowMango @XiShanYongYe-Chang @chaunceyjiang @chaosi-zju @zhzhuang-zju

ipsum-0320 · 2024-07-30T13:10:11Z

/retest

karmada-bot · 2024-07-30T13:10:29Z

@ipsum-0320: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

whitewindmills · 2024-07-31T02:14:50Z

/ok-to-test

ipsum-0320 · 2024-08-02T08:50:07Z

/retest

whitewindmills · 2024-08-05T01:50:08Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+spec:
+  placement:
+    replicaScheduling: 
+      replicaDivisionPreference: Average


AverageReplicas sounds more appropriate.

OK, I have completed the edit.

whitewindmills · 2024-08-05T02:05:10Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+   <!--
+   b.如果 AvailableReplica <= 该 Cluster 对应的 avg_rep，那么给该 Cluster 分配满额的 AvailableReplica 个实例，同时将剩余的待分配 pendingReplica，也就是 avg_rep - AvailableReplica，添加到变量 sumPendingReplica（这是一个全局变量，不与任何一个 Cluster 绑定）中，该变量初始时为零。
+   -->
+2. After completing the first round of traversal, first sort the unFullCluster by Cluster.Score, then start traversing the unFullCluster. For each cluster in it, allocate one replica, and decrement both the cluster's AvailableReplica and sumPendingReplica by 1. Then check if the cluster's AvailableReplica is zero, and if so, remove that cluster from the unFullCluster.


if two clusters have the same score, we can prioritize the one with the higher number of available replicas.

Yeah, I have added the relevant explanation.

XiShanYongYe-Chang

Thanks for your proposal!

XiShanYongYe-Chang · 2024-08-05T03:19:03Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+通过添加 AverageReplicas 副本分配策略，Karmada 能够尽可能平均地将目标副本分配到工作集群中。同时，这种平均是遵守传播约束（spread constraint）且考虑工作集群中的可用资源的。
+-->
+
+Such an allocation strategy can significantly improve the system's disaster recovery capability. Even if one cluster fails, it will not have a major impact on the overall system. This is highly beneficial for ensuring high availability of the service.


How can it be significantly improved compared to the current strategy?

I modified the wording and gave relevant explanations.

XiShanYongYe-Chang · 2024-08-05T08:50:00Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+2.结合待分配的 Replica 计算平均每个 Cluster 要分配多少 replica，定义 avg_rep = Replica / L。
+-->
+
+After calculating the average number of replicas to be allocated to each cluster avg_rep, we start traversing the selected clusters and use the predefined TargetCluster struct to define the allocation results.


You can give the TargetCluster structure a code link, so we can easily understand what it looks like.

Yeah, I add the the link of TargetCluster.

XiShanYongYe-Chang · 2024-08-05T09:03:31Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+3.循环第二步，直至 sumPendingReplica 为 0，分配结束。
+-->
+
+Since we have ensured in the Select phase that the total AvailableReplica of the selected clusters is greater than the number of replicas to be allocated, there is no need to worry about resource insufficiency.


This requirement is met in the current Select phase. However, it does not mean that this requirement is always met in the future evolution. Therefore, the resource insufficiency still needs to be considered during the design.

yeah, I have already pointed this out.

XiShanYongYe-Chang · 2024-08-05T09:05:08Z

docs/proposals/scheduling/average-replicas-assign/average-replicas-assign.md

+由于在 Select 阶段中，我们已经保证了所选出的 Cluster 的 AvailableReplica 总和是大于待分配的 Replica 的，因此不必担心资源不足的问题。
+-->
+
+It should be clearly stated that if the result is not a whole number when calculating avg_rep (avg_rep = Replica / L), the avg_rep will be rounded down, and the remaining Replicas due to rounding will be added to the sumPendingReplica variable. For example, suppose Replica = 8 and L = 3, then avg_rep = 8 / 3 = 2.67. We would round down avg_rep to 2, meaning each Cluster is expected to be allocated 2 Replicas. Since 2 * 3 = 6, there are still 2 Replicas left unallocated. These two Replicas will then be added to sumPendingReplica, making sumPendingReplica = sumPendingReplica + 2.


This sentence can be explained in advance when the calculation method is described.

yeah, i have explained in advance.

whitewindmills · 2024-08-06T09:29:58Z

@RainbowMango @chaunceyjiang @warjiang
still need your review. do you have any comment?

Signed-off-by: ipsum <trueman.0320@zju.edu.cn> Signed-off-by: ipsum <ipsum@ipsumdeMacBook-Pro.local>

Signed-off-by: ipsum <ipsum@ipsumdeMacBook-Pro.local>

whitewindmills · 2024-08-09T03:09:50Z

ping @RainbowMango @chaunceyjiang @warjiang @XiShanYongYe-Chang

ipsum-0320 · 2024-08-14T07:01:14Z

@RainbowMango @chaunceyjiang @warjiang @XiShanYongYe-Chang Does anyone have any questions? 😁

XiShanYongYe-Chang · 2024-08-28T07:54:30Z

LGTM

We are about to release a major version (v1.11) soon (by the end of this month), and it seems that this feature will not make it into this version. We can start to quickly advance this feature in early September.

chaunceyjiang · 2024-08-28T14:07:38Z

/assign

XiShanYongYe-Chang · 2024-09-02T07:15:15Z

Hi @RainbowMango @whitewindmills @chaunceyjiang can we get on with this mission?

whitewindmills · 2024-09-03T02:00:29Z

I'm ok with it, and seems that it looks good to @XiShanYongYe-Chang.
ping @RainbowMango @chaunceyjiang

RainbowMango · 2024-09-03T02:40:44Z

I'll look at it today.

karmada-bot · 2024-09-19T05:53:13Z

Adding label do-not-merge/contains-merge-commits because PR contains merge commits, which are not allowed in this repository.
Use git rebase to reapply your commits on top of the target branch. Detailed instructions for doing so can be found here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

karmada-bot · 2024-09-19T05:53:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from chaunceyjiang. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot added kind/feature Categorizes issue or PR as related to a new feature. kind/documentation Categorizes issue or PR as related to documentation. labels Jul 18, 2024

karmada-bot requested review from Poor12 and Tingtal July 18, 2024 09:38

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 18, 2024

ipsum-0320 force-pushed the master branch from 80c63a7 to 082bb24 Compare July 18, 2024 09:40

whitewindmills reviewed Jul 19, 2024

View reviewed changes

karmada-bot requested review from chaosi-zju, chaunceyjiang, RainbowMango, XiShanYongYe-Chang and zhzhuang-zju July 19, 2024 06:32

ipsum-0320 force-pushed the master branch from 0f1b2c1 to aa35417 Compare July 19, 2024 08:55

chaunceyjiang reviewed Jul 25, 2024

View reviewed changes

warjiang reviewed Jul 30, 2024

View reviewed changes

ipsum-0320 force-pushed the master branch from 128ab46 to 60e8288 Compare July 30, 2024 13:06

whitewindmills reviewed Aug 5, 2024

View reviewed changes

XiShanYongYe-Chang reviewed Aug 5, 2024

View reviewed changes

ipsum-0320 and others added 4 commits August 6, 2024 17:57

feat: add average replicas assign proposal

9650926

Signed-off-by: ipsum <trueman.0320@zju.edu.cn> Signed-off-by: ipsum <ipsum@ipsumdeMacBook-Pro.local>

fix: update by comments

6969427

Signed-off-by: ipsum <trueman.0320@zju.edu.cn> Signed-off-by: ipsum <ipsum@ipsumdeMacBook-Pro.local>

feat: update comment

b73cb9b

Signed-off-by: ipsum <trueman.0320@zju.edu.cn> Signed-off-by: ipsum <ipsum@ipsumdeMacBook-Pro.local>

chore: update proposal

10637cb

Signed-off-by: ipsum <ipsum@ipsumdeMacBook-Pro.local>

ipsum-0320 force-pushed the master branch from 55a0df7 to 10637cb Compare August 6, 2024 09:57

whitewindmills mentioned this pull request Aug 19, 2024

Scheduler: support the ability to automatically assign replicas evenly #4805

Open

ipsum-0320 requested review from warjiang, XiShanYongYe-Chang and chaunceyjiang August 27, 2024 06:56

karmada-bot assigned chaunceyjiang Aug 28, 2024

Merge branch 'karmada-io:master' into master

3f71a53

karmada-bot added the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Sep 19, 2024

feat: Implement group score calculation

68134e6

karmada-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 19, 2024

ipsum-0320 closed this Sep 19, 2024

		@@ -0,0 +1,263 @@
		---
		title: Karmada distributes the replicas evenly based on spread constraint


		By adding the AverageReplicas replica allocation strategy, we ensure that replicas can be evenly distributed among the selected clusters as much as possible. This even distribution means:

		1. The allocation results adhere to the spread constraint (Selection phase).


		## Motivation

		By adding the AverageReplicas replica allocation strategy, Karmada can distribute the target replicas as evenly as possible across the member clusters. At the same time, this even distribution will adhere to spread constraints and consider the available resources in the member clusters.

feat: add average replicas assign proposal #5225

feat: add average replicas assign proposal #5225

Conversation

ipsum-0320 commented Jul 18, 2024 • edited Loading

whitewindmills commented Jul 18, 2024

codecov-commenter commented Jul 18, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whitewindmills commented Jul 19, 2024

ipsum-0320 commented Jul 19, 2024

ipsum-0320 commented Jul 19, 2024

karmada-bot commented Jul 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipsum-0320 commented Jul 30, 2024

ipsum-0320 commented Jul 30, 2024

karmada-bot commented Jul 30, 2024

whitewindmills commented Jul 31, 2024

ipsum-0320 commented Aug 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whitewindmills commented Aug 6, 2024

whitewindmills commented Aug 9, 2024

ipsum-0320 commented Aug 14, 2024

XiShanYongYe-Chang commented Aug 28, 2024

chaunceyjiang commented Aug 28, 2024

XiShanYongYe-Chang commented Sep 2, 2024

whitewindmills commented Sep 3, 2024

RainbowMango commented Sep 3, 2024

karmada-bot commented Sep 19, 2024

karmada-bot commented Sep 19, 2024

ipsum-0320 commented Jul 18, 2024 •

edited

Loading

codecov-commenter commented Jul 18, 2024 •

edited

Loading