Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy cluster-autoscaler go modules #3593

Closed
wants to merge 2 commits into from

Conversation

RainbowMango
Copy link
Member

What this PR does / why we need it:

This PR just tries to adjust the CA dependencies management mechanism in a normal way.
It includes:

  • tidy go.mod file, pin dependencies explicitly instead of always based on k/k master.
  • update vendor by the command go mod vendor.

For now, we generate go.mod based on k/k go.mod and go.mod-extra for the extral dependencies. This way is a little bit weird and sometimes also causes some confusion and inconvenience for users such as #3146, #2858.

In addition, I found most CA providers have to commit their SDKs directly to the codebase, like this provider.
It's not graceful as we need to add boilerplate to each go file in SDK, and it will be a disaster when we bumping the SDK version.

Special notes for your reviewer:
This PR is a first iteration towards improving the CA dependency management mechanism. After this PR we also need to update docs such as FAQ, and hack/update-vendor.sh.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 9, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign vivekbagade after the PR has been reviewed.
You can assign the PR to them by writing /assign @vivekbagade in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MaciekPytel
Copy link
Contributor

This seems to duplicate #3566. The reason for the current complicated setup is:

  1. Historically we found out we often need to vendor a specific kubernetes commit rather than tag. This is because changes in kubernetes had often broke CA in one way or another and we needed to vendor a last-minute bugfix just before kubernetes release.
  2. Kubernetes code is officially spread across multiple repos (k/k, k/client-go, k/api). The problem here is that vendoring on latest commits in those repos (in case when we need to vendor on commit as per point 1 above) can lead to conflicts between those repos. Kubernetes itself deals with that as follows: the source of truth for all those repos is https://github.com/kubernetes/kubernetes/tree/master/staging directory. All those repos are "vendored" by kubernetes by softlinking from vendor/ directory to staging/ directory (and the github repos are just mirrors of staging/ directory).

The current setup with update-vendor.sh duplicates what the kubernetes is doing and it uses staging/ repo as the source of truth for all repos in kubernetes org. So the problem we're facing is how to simplify vendor mechanism of CA while keeping the ability to vendor a specific k8s commit (and not just a tagged version). Some ideas have been proposed in #3566.

Note that this has little to do with the SDK problem - in principle provider specific dependencies could be vendored in in the current system by using go.mod-extra. The question here is how to easily strip any provider-specific dependencies when compiling CA with just one provider or forking the CA repo.

@RainbowMango
Copy link
Member Author

This seems to duplicate #3566.

Oh...I didn't notice the PR. Glad to see that, I'll review it then.

Historically we found out we often need to vendor a specific kubernetes commit rather than tag. This is because changes in kubernetes had often broke CA in one way or another and we needed to vendor a last-minute bugfix just before kubernetes release.

I don't have much background in this, so I'm still confused. How could Kubernetes changes broke CA?
In my opinion, each release of CA pin to the specific release of Kubernetes. Let's say CA 1.16.6 compatible with Kubernetes 1.16.6. Even CA master doesn't need to always follow Kubernetes master.

@RainbowMango
Copy link
Member Author

RainbowMango commented Oct 9, 2020

The question here is how to easily strip any provider-specific dependencies when compiling CA with just one provider or forking the CA repo.

How about split CA to different repos, just like cloud-provider does? THE current CA repo(or a new one) focuses on providing a common framework, and each provider has its implementation in the independent repo.

@RainbowMango RainbowMango marked this pull request as ready for review October 10, 2020 03:20
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2020
@MaciekPytel
Copy link
Contributor

How could Kubernetes changes broke CA?

CA works by using vendored scheduler code to predict if pending pods would become schedulable if a new node was added. This way we don't have to play catch-up game and when a new scheduler feature is added or some behavior is changed we get it for free just by importing the right version of scheduler. The downside is that we rely on internal implementation of scheduler and we make assumptions about how it should behave. This is not a public API and in principle any random commit to scheduler may break CA. In fact this has happened many times in the past. Sig-scheduling have always been very helpful and they try to avoid breaking CA, but there is always a risk of some minor change affecting CA in subtle way.

How about split CA to different repos, just like cloud-provider does?

There were multiple proposals for this in the past (and there are many providers implemented as CA forks). I don't think we could split existing providers without regressions:

  1. I don't think running in separate processes would work in very large clusters due to the volume of blocking calls to cloudprovider.
  2. The split between provider and core is not super clear. You can customize CA behavior by implementing provider-specific processors that execute whatever custom logic is needed.

One proposed compromise would be to implement an 'external' provider that proxies all calls over grpc or similar. That should be enough to cover most use-cases, while leaving an option to implement "full" cloudprovider module if needed.

It's not graceful as we need to add boilerplate to each go file in SDK, and it will be a disaster when we bumping the SDK version.

No argument there. Would it help to temporarily add your SDK to https://github.com/kubernetes/autoscaler/blob/master/hack/boilerplate/boilerplate.py#L148 until we figure this out?

@RainbowMango
Copy link
Member Author

Would it help to temporarily add your SDK to https://github.com/kubernetes/autoscaler/blob/master/hack/boilerplate/boilerplate.py#L148 until we figure this out?

Yes, this way I don't need to add boilerplate to each file, but I still need to change the import path init. Even though it's not perfect, but much better.

@MaciekPytel
Copy link
Contributor

I feel like we're closing up on a solution for vendoring kubernetes in a better way in #3566, but it don't think it really helps your case all that much - it's not the problem with kubernetes vendoring that was the reason for inlining SDK (I also mentioned this in #3566 (comment)).

I'd be very interested to hear your ideas for improving that. My feeling is, however, that it may take some time to fix. If you want to remove your SDK from boilerplate check in the meantime just tag me on the PR.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 11, 2021
@k8s-ci-robot
Copy link
Contributor

@RainbowMango: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mwielgus
Copy link
Contributor

Closing due to inactivity. Feel free to reopen if needed.

@mwielgus mwielgus closed this Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants