tidy cluster-autoscaler go modules #3593

RainbowMango · 2020-10-09T07:59:41Z

What this PR does / why we need it:

This PR just tries to adjust the CA dependencies management mechanism in a normal way.
It includes:

tidy go.mod file, pin dependencies explicitly instead of always based on k/k master.
update vendor by the command go mod vendor.

For now, we generate go.mod based on k/k go.mod and go.mod-extra for the extral dependencies. This way is a little bit weird and sometimes also causes some confusion and inconvenience for users such as #3146, #2858.

In addition, I found most CA providers have to commit their SDKs directly to the codebase, like this provider.
It's not graceful as we need to add boilerplate to each go file in SDK, and it will be a disaster when we bumping the SDK version.

Special notes for your reviewer:
This PR is a first iteration towards improving the CA dependency management mechanism. After this PR we also need to update docs such as FAQ, and hack/update-vendor.sh.

k8s-ci-robot · 2020-10-09T08:00:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign vivekbagade after the PR has been reviewed.
You can assign the PR to them by writing /assign @vivekbagade in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

MaciekPytel · 2020-10-09T08:50:20Z

This seems to duplicate #3566. The reason for the current complicated setup is:

Historically we found out we often need to vendor a specific kubernetes commit rather than tag. This is because changes in kubernetes had often broke CA in one way or another and we needed to vendor a last-minute bugfix just before kubernetes release.
Kubernetes code is officially spread across multiple repos (k/k, k/client-go, k/api). The problem here is that vendoring on latest commits in those repos (in case when we need to vendor on commit as per point 1 above) can lead to conflicts between those repos. Kubernetes itself deals with that as follows: the source of truth for all those repos is https://github.com/kubernetes/kubernetes/tree/master/staging directory. All those repos are "vendored" by kubernetes by softlinking from vendor/ directory to staging/ directory (and the github repos are just mirrors of staging/ directory).

The current setup with update-vendor.sh duplicates what the kubernetes is doing and it uses staging/ repo as the source of truth for all repos in kubernetes org. So the problem we're facing is how to simplify vendor mechanism of CA while keeping the ability to vendor a specific k8s commit (and not just a tagged version). Some ideas have been proposed in #3566.

Note that this has little to do with the SDK problem - in principle provider specific dependencies could be vendored in in the current system by using go.mod-extra. The question here is how to easily strip any provider-specific dependencies when compiling CA with just one provider or forking the CA repo.

RainbowMango · 2020-10-09T10:06:17Z

This seems to duplicate #3566.

Oh...I didn't notice the PR. Glad to see that, I'll review it then.

Historically we found out we often need to vendor a specific kubernetes commit rather than tag. This is because changes in kubernetes had often broke CA in one way or another and we needed to vendor a last-minute bugfix just before kubernetes release.

I don't have much background in this, so I'm still confused. How could Kubernetes changes broke CA?
In my opinion, each release of CA pin to the specific release of Kubernetes. Let's say CA 1.16.6 compatible with Kubernetes 1.16.6. Even CA master doesn't need to always follow Kubernetes master.

RainbowMango · 2020-10-09T11:28:28Z

The question here is how to easily strip any provider-specific dependencies when compiling CA with just one provider or forking the CA repo.

How about split CA to different repos, just like cloud-provider does? THE current CA repo(or a new one) focuses on providing a common framework, and each provider has its implementation in the independent repo.

MaciekPytel · 2020-10-12T16:38:02Z

How could Kubernetes changes broke CA?

CA works by using vendored scheduler code to predict if pending pods would become schedulable if a new node was added. This way we don't have to play catch-up game and when a new scheduler feature is added or some behavior is changed we get it for free just by importing the right version of scheduler. The downside is that we rely on internal implementation of scheduler and we make assumptions about how it should behave. This is not a public API and in principle any random commit to scheduler may break CA. In fact this has happened many times in the past. Sig-scheduling have always been very helpful and they try to avoid breaking CA, but there is always a risk of some minor change affecting CA in subtle way.

How about split CA to different repos, just like cloud-provider does?

There were multiple proposals for this in the past (and there are many providers implemented as CA forks). I don't think we could split existing providers without regressions:

I don't think running in separate processes would work in very large clusters due to the volume of blocking calls to cloudprovider.
The split between provider and core is not super clear. You can customize CA behavior by implementing provider-specific processors that execute whatever custom logic is needed.

One proposed compromise would be to implement an 'external' provider that proxies all calls over grpc or similar. That should be enough to cover most use-cases, while leaving an option to implement "full" cloudprovider module if needed.

It's not graceful as we need to add boilerplate to each go file in SDK, and it will be a disaster when we bumping the SDK version.

No argument there. Would it help to temporarily add your SDK to https://github.com/kubernetes/autoscaler/blob/master/hack/boilerplate/boilerplate.py#L148 until we figure this out?

RainbowMango · 2020-10-13T11:48:00Z

Would it help to temporarily add your SDK to https://github.com/kubernetes/autoscaler/blob/master/hack/boilerplate/boilerplate.py#L148 until we figure this out?

Yes, this way I don't need to add boilerplate to each file, but I still need to change the import path init. Even though it's not perfect, but much better.

MaciekPytel · 2020-10-13T12:00:32Z

I feel like we're closing up on a solution for vendoring kubernetes in a better way in #3566, but it don't think it really helps your case all that much - it's not the problem with kubernetes vendoring that was the reason for inlining SDK (I also mentioned this in #3566 (comment)).

I'd be very interested to hear your ideas for improving that. My feeling is, however, that it may take some time to fix. If you want to remove your SDK from boilerplate check in the meantime just tag me on the PR.

fejta-bot · 2021-01-11T12:11:19Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot · 2021-01-11T12:11:28Z

@RainbowMango: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mwielgus · 2021-01-25T10:32:59Z

Closing due to inactivity. Feel free to reopen if needed.

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 9, 2020

k8s-ci-robot requested review from feiskyer and Jeffwan October 9, 2020 08:00

RainbowMango added 2 commits October 9, 2020 17:30

Tidy cluster-autoscaler go modules.

d33196c

Update vendor by go mod vendor.

fb8b715

RainbowMango force-pushed the pr_tidy_gomod branch from 83b806e to fb8b715 Compare October 9, 2020 09:32

RainbowMango marked this pull request as ready for review October 10, 2020 03:20

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2020

RainbowMango mentioned this pull request Oct 10, 2020

Simplify update-vendor script #3566

Closed

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 11, 2021

mwielgus closed this Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tidy cluster-autoscaler go modules #3593

tidy cluster-autoscaler go modules #3593

RainbowMango commented Oct 9, 2020

k8s-ci-robot commented Oct 9, 2020

MaciekPytel commented Oct 9, 2020

RainbowMango commented Oct 9, 2020

RainbowMango commented Oct 9, 2020 •

edited

Loading

MaciekPytel commented Oct 12, 2020

RainbowMango commented Oct 13, 2020

MaciekPytel commented Oct 13, 2020

fejta-bot commented Jan 11, 2021

k8s-ci-robot commented Jan 11, 2021

mwielgus commented Jan 25, 2021

tidy cluster-autoscaler go modules #3593

tidy cluster-autoscaler go modules #3593

Conversation

RainbowMango commented Oct 9, 2020

k8s-ci-robot commented Oct 9, 2020

MaciekPytel commented Oct 9, 2020

RainbowMango commented Oct 9, 2020

RainbowMango commented Oct 9, 2020 • edited Loading

MaciekPytel commented Oct 12, 2020

RainbowMango commented Oct 13, 2020

MaciekPytel commented Oct 13, 2020

fejta-bot commented Jan 11, 2021

k8s-ci-robot commented Jan 11, 2021

mwielgus commented Jan 25, 2021

RainbowMango commented Oct 9, 2020 •

edited

Loading