Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to ignore namespaces #6788

Merged
merged 25 commits into from
Jul 15, 2024

Conversation

adrianmoisey
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is a replacement PR for #6428

It allows a user to specify a list of namespaces for the various VPA components to ignore.

Which issue(s) this PR fixes:

Fixes #6232

Special notes for your reviewer:

There are no tests at the moment. I'm just making the PR to get some early review to ensure that this is on the right track.

Does this PR introduce a user-facing change?

Add new `---ignored-vpa-object-namespaces` parameter to specify namespaces for the VPA to ignore

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

N/A

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 1, 2024
@k8s-ci-robot k8s-ci-robot requested a review from kgolab May 1, 2024 14:59
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 1, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @adrianmoisey!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @adrianmoisey. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 1, 2024
In the unlikely case that a nil value is passed in, this code would have
failed with an error
@@ -96,6 +96,15 @@ func selfRegistration(clientset *kubernetes.Clientset, caCert []byte, namespace,
sideEffects := admissionregistration.SideEffectClassNone
failurePolicy := admissionregistration.Ignore
RegisterClientConfig.CABundle = caCert
namespaceSelector := metav1.LabelSelector{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this changing any behavior if the list is empty? What about not setting any selector when no ignores are set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if just by having the all namespaces selector we impact cost or performance on the apiserver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cef6dcb

)

func main() {
klog.InitFlags(nil)
kube_flag.InitFlags()

if len(*vpaObjectNamespace) > 0 && len(*ignoredVpaObjectNamespaces) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably happen after the info log below to still give a log about the version used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 590b4fc

@@ -332,6 +344,12 @@ func filterVPAs(feeder *clusterStateFeeder, allVpaCRDs []*vpa_types.VerticalPodA
continue
}
}

if selectsNamespace(vpaCRD.ObjectMeta.Namespace, feeder.ignoredNamespaces) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the sourcing of VPAs already do the filtering? So the API server doesn't send namespaces we don't want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have a look at this, but I didn't find an easy solution to it.
The updater and recommender both seems to use a ListWatch in client-go (https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.1.1/vertical-pod-autoscaler/vendor/k8s.io/client-go/tools/cache/listwatch.go#L69-L100)
This seems to take either a single namespace or all namespaces. There doesn't seem to be a way to exclude any namespaces.

An alternative could be to setup a ListWatch per namespace, but then I imagine we'll need to do quite a big refactor and continuously watch for updates to namespaces, and add/remove the ListWatch as namespaces are added and removed. I also don't know if that sort of change would be better or worse, efficiency wise, compared to this current PR's approach.

My current experience with VPA/client-go is very minimal, so I didn't want to take such a large task that could potentially be the wrong direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked through client-go and couldn't find anything either.

Seems OK since the list of ignored namespaces is likely going to be pretty small?

Copy link
Member Author

@adrianmoisey adrianmoisey Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked through client-go and couldn't find anything either.

Seems OK since the list of ignored namespaces is likely going to be pretty small?

I imagine that list will be small.
In my use-case it will be only kube-system (and possibly istio-system)

@@ -108,6 +110,11 @@ const (
func main() {
klog.InitFlags(nil)
kube_flag.InitFlags()

if len(*vpaObjectNamespace) > 0 && len(*ignoredVpaObjectNamespaces) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think the version log should come first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 590b4fc

@@ -137,6 +149,10 @@ func (u *updater) RunOnce(ctx context.Context) {
vpas := make([]*vpa_api_util.VpaWithSelector, 0)

for _, vpa := range vpaList {
if selectsNamespace(vpa.Namespace, u.ignoredNamespaces) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, why are we even getting those VPAs here?

vpaObjectNamespace = flag.String("vpa-object-namespace", apiv1.NamespaceAll, "Namespace to search for VPA objects. Empty means all namespaces will be used.")
namespace = os.Getenv("NAMESPACE")
vpaObjectNamespace = flag.String("vpa-object-namespace", apiv1.NamespaceAll, "Namespace to search for VPA objects. Empty means all namespaces will be used.")
ignoredVpaObjectNamespaces = flag.String("ignored-vpa-object-namespaces", "", "Comma separated list of namespaces to ignore.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a shared library of sorts where we could put all this instead? It seems like there is no difference in code between the components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a shared library for these flags. The existing VPA components all declare their own flags.

},
},
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also be setting a nameSpaceSelector for the case where vpa-object-namespace is used to cause the webhook to operate only for that namespace? This way the pods for all other namespaces do not have to wait for an answer from the admission controller before they start (it will also reduce the load on the admission controller)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!
Fixed in 4073194

@adrianmoisey
Copy link
Member Author

@kwiesmueller I've replied to your comments. I just want to check if this path is worth going down? And if I should spend the time to write tests for this PR.

@adrianmoisey
Copy link
Member Author

@kwiesmueller any chance you have some time to look at this again?

@kwiesmueller
Copy link
Member

/assign @raywainman
Ray, can you have a look?

@raywainman
Copy link
Contributor

Looking... Apologies for the delay :)

@adrianmoisey
Copy link
Member Author

Hey @raywainman
I think this is ready for your review again. I'm feeling a more confident in it now that there are some tests. I've also done some manual testing on my local and it's all working as expected.

cc @voelzmo

@raywainman
Copy link
Contributor

Thanks Adrian! Will take a look today or tomorrow. Thanks for the fixes!

Copy link
Contributor

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good :) Just a few nits from my end!

@@ -90,7 +90,7 @@ func configTLS(cfg certsConfig, minTlsVersion, ciphers string, stop <-chan struc

// register this webhook admission controller with the kube-apiserver
// by creating MutatingWebhookConfiguration.
func selfRegistration(clientset kubernetes.Interface, caCert []byte, namespace, serviceName, url string, registerByURL bool, timeoutSeconds int32) {
func selfRegistration(clientset kubernetes.Interface, caCert []byte, namespace, serviceName, url string, registerByURL bool, timeoutSeconds int32, selectedNamespaces string, ignoredNamespaces string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do the split upstream, it will simplify testing and is a bit cleaner if we pass in the "final" data structure here.

Can we pass ignoredNamespaces []string here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had it like that, but changed it due to your comment here: #6788 (comment)

You can see this commit where I moved the split into the functions called: 7a1aea1

Now I'm wondering if I originally misunderstood your original comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7f2c93

@@ -103,6 +106,7 @@ func (m ClusterStateFeederFactory) Make() *clusterStateFeeder {
memorySaveMode: m.MemorySaveMode,
controllerFetcher: m.ControllerFetcher,
recommenderName: m.RecommenderName,
ignoredNamespaces: strings.Split(m.IgnoredNamespaces, ","),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, if we set the IgnoredNamespaces variable in ClusterStateFeederFactory to string[] then this becomes a bit cleaner and we don't need to split here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7f2c93

@@ -107,6 +111,7 @@ func NewUpdater(
status.AdmissionControllerStatusName,
statusNamespace,
),
ignoredNamespaces: strings.Split(ignoredNamespaces, ","),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's split upstream in main.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7f2c93

},
},
}
} else if len(selectedNamespaces) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this here? I thought this value was always going to contain a single namespace?

(I can see this is the assumption in the other places)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yes, you're right. Fixed in e14db24

This reverts commit 7a1aea1.

As per
kubernetes#6788 (comment)
and discussion in DM. The preference is to split the string inside the
main.go file.
@adrianmoisey
Copy link
Member Author

Something worth noting about this change. There are tests in vertical-pod-autoscaler/pkg/admission-controller/config_test.go which call the selfRegistration function. That function starts with a 10 second sleep.
So these tests now take 60 seconds to run:

?   	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/logic	[no test files]
?   	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource	[no test files]
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller	58.449s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod	1.475s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod/patch	2.680s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod/recommendation	2.272s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/vpa	3.090s

This is something that I think needs fixing at some stage, but I'm not sure on the best way to address it. Should the sleep time be passed into the function, allowing overriding the sleep time in tests? Or is there another way to override that call?

@adrianmoisey
Copy link
Member Author

@raywainman I've made the latest set of suggestions. I think we're really close to getting this done. When you have a moment can you review?

Copy link
Contributor

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2024
@raywainman
Copy link
Contributor

Something worth noting about this change. There are tests in vertical-pod-autoscaler/pkg/admission-controller/config_test.go which call the selfRegistration function. That function starts with a 10 second sleep. So these tests now take 60 seconds to run:

?   	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/logic	[no test files]
?   	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource	[no test files]
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller	58.449s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod	1.475s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod/patch	2.680s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod/recommendation	2.272s
ok  	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/vpa	3.090s

This is something that I think needs fixing at some stage, but I'm not sure on the best way to address it. Should the sleep time be passed into the function, allowing overriding the sleep time in tests? Or is there another way to override that call?

Just chatted on Slack about this. Passing in a timeout via an argument is probably the easiest way to fix this.

I wonder why we did this in the first place though - maybe API Server needs a bit of time to catch up or something.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2024
Since tests will take 10 seconds on each pass to run
@raywainman
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2024
@adrianmoisey
Copy link
Member Author

/assign voelzmo

pinging @voelzmo for approval

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2024
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 10, 2024
Copy link
Contributor

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

built the images and ran some manual tests, worked as expected. Thanks for following through!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 15, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianmoisey, voelzmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 15, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3a1c5b9 into kubernetes:master Jul 15, 2024
6 of 7 checks passed
@adrianmoisey adrianmoisey deleted the vpa-ignore-namespace branch July 15, 2024 12:21
@raywainman raywainman mentioned this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exclude some namespaces from VPA webhook
7 participants