Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support binpack policy #380

Merged
merged 3 commits into from
Jul 27, 2019
Merged

support binpack policy #380

merged 3 commits into from
Jul 27, 2019

Conversation

lminzhw
Copy link
Contributor

@lminzhw lminzhw commented Jul 23, 2019

Motivation

While running workloads on kubernetes which use accelerator devices the default scheduler spreads pods across nodes resulting in fragmentation of extended resources. This fragmentation prevents scheduling pods with larger device resource requirements, and they remain in the pending state.

Goals

  • Schedule Pods using BestFit Policy using RequestedToCapacityRatio Priority Function for extended Resources
  • Reduce Fragmentation of scarce resources on the Cluster

@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 23, 2019
@lminzhw
Copy link
Contributor Author

lminzhw commented Jul 23, 2019

@hzxuzhonghu @TommyLike @k82cn

Now, we have another XL PR. An e2e test case is included.

Please have a look on it.

@TravisBuddy
Copy link

Hey @lminzhw,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 52b4ca80-ad5b-11e9-aa77-bf7f12915990

@k82cn
Copy link
Member

k82cn commented Jul 24, 2019

/approve

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2019
@k82cn
Copy link
Member

k82cn commented Jul 24, 2019

/cc @hex108 @kinglion811

// All resource with weight should be calculated, because the weightSum need it,
// even the node have no this resource.
for name, weight := range weight.BinPackingResources {
weightSum += weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If score calculated at L202 is zero, whether we need to add weight to weightSumt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero is also a valid value for score, so I think add it to sum is also valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take GPU for example:
Node A has no GPU, so its GPU score will be 0, Node B has GPU capacity 6 and use 3, then it has a score for GPU resource. Suppose that Node A and B's other resource's scores are same, if we add weight in this case, Node B's total score will be higher than Node A, then it will choose Node B, is it expected? Node A might be a better choice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the resource is not exist on the node, we should also skip weight for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to choose NodeA, maybe we should make the GPU score in NodeB to be nagetive in this case?

For example, request need resource Ra/Rb, but the NodeA has resource Ra/Rb, NodeB has resource Ra/Rb/Rc.
Policy Current is the score now. Policy Skip will ignore the Sc/Wc in NodeA but could not guarantee that NodeA have a higher score than NodeB. Policy Negative would treat the Sc in NodeB as negative score.

policy      NodeA              NodeB
current     Sa+Sb+0/Wa+Wb+Wc   Sa+Sb+Sc/Wa+Wb+Wc
skip        Sa+Sb/Wa+Wb        Sa+Sb+Sc/Wa+Wb+Wc
negative    Sa+Sb+0/Wa+Wb+Wc   Sa+Sb-Sc/Wa+Wb+Wc

Copy link
Contributor

@hex108 hex108 Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any better idea for this now. Just some thought:

  • If request does not need resource C, then we do not calculate the score of this resource, it means that we compare Sa+Sb/Wa+Wb(for NodeA) with Sa+Sb/Wa+Wb(for Node B).

or

  • If NodeA does not has the resource, we regards its RequestedToCapacityRatio to 1 instead of 0, it means that it is full, 0 seems it is empty.

or

We could solve it in later PR after finding better solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1 is better to me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactor the code in BinPackingScore. Now, only resources that pod requested can affect the binpack score.

binpack.cpu: 5
binpack.memory: 1
binpack.resources: nvidia.com/gpu, example.com/foo
binpack.resources.nvidia.com/gpu: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/binpack.resources.nvidia.com/gpu/binpack.nvidia.com/gpu/? Then it is same as cpu/memory, it is just a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, all additional resource is listed in binpack.resources, so the key for weight of resource should have prefix of it.

pkg/scheduler/plugins/binpack/binpack.go Outdated Show resolved Hide resolved
@k82cn
Copy link
Member

k82cn commented Jul 26, 2019

LGTM, thanks :)

@hex108
Copy link
Contributor

hex108 commented Jul 27, 2019

Thanks! LGTM

@k82cn
Copy link
Member

k82cn commented Jul 27, 2019

/lgtm
/approve

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 27, 2019
@volcano-sh-bot volcano-sh-bot merged commit 7e8fec6 into volcano-sh:master Jul 27, 2019
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k82cn, lminzhw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants