-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to customize security group rules #392
Comments
It's an interesting question, for sure. My interpretation of the landscape was that this project was a "reasonable default" implementation for a kubernetes cluster on AWS, which in my mind implied picking a CNI provider & wiring it up. The other implementation strategy I considered was modifying the security group rules to allow "all traffic" between and within the controlplane and node SGs. IMO that's a different but still reasonable "default" spot to be, and it would leave the question of how to network largely up to the choices in the |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Some notes:
Security groups are just collections of rules. An instance can have multiple security groups; AWS internally merges them all into one group, which is the most permissive of all the groups. Currently, CAPA defines groups for specific machine roles (bastion host, control plane, node). The groups contain rules for the CNI plugin and stacked etcd. When an end users wants to use a CNI plugin other than Calico, or use external etcd, then those rules should not exist. If CAPA can define groups for components, instead of machine roles, then it should be possible for the groups to contain only the necessary set of rules. That implies that end users are free to add security groups with the guarantee that their groups will not break cluster functionality because of the way AWS' internal merge works. I strongly prefer this to asking end users to modify CAPA-reconciled security groups. If end users are allowed to modify CAPA-reconciled security groups, CAPA itself will need to guarantee that end users are not breaking cluster functionality. |
@dlipovetsky One potential downside to separating out everything into separate groups is that we could start running into AWS limits on the number of Security Groups that can be attached to an instance. Based on the docs, it looks like we are limited to a maximum of 5 security groups per network interface, which means if we would exceed that, then we would have to start allocating multiple ENIs on instance creation as well as managing the security groups properly across the multiple ENIs. |
@detiber Good point, thanks. |
Per the docs linked, I want to note that while the default limit is 5 security groups per network interface, the limit can be increased on request, though there's no guarantee it would be. End users should not have to request limit increases just to use CAPA's default security group configuration. But can end users be asked to request limit increases to be able to use CAP with custom security groups? |
Another user story: Prometheus is a popular solution for monitoring. The Node Exporter is a popular way of collecting host-level metrics for Prometheus. Frequently, Prometheus "scrapes" (queries) Node Exporter endpoints. One Node Exporter runs on every host, and it runs in the host's network namespace. Therefore Node Exporter serves metrics on the host's IP, on some unprivileged port (typically 9100). With the default security group rules, a Prometheus Pod running on one host cannot reach a Node Exporter running in a different host. I'm reaching out to the Prometheus folks to understand the consequence of running Node Exporter in the Pod network namespace and will report back here. Update: I found some evidence in the Prometheus helm chart for running Node Exporter in the Pod network namespace: helm/charts#12747. |
This will likely require API changes |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
This is a pain for us trying to add Windows support, since Calico only supports Windows in their proprietary version from Tigera. The main OSS CNI with good Windows support is Flannel, so we're stuck documenting how to create SGs manually. This will also likely break cluster deletion, since you won't be able to delete the VPC when security groups still exist inside of it. It feels like if CAPA is going to create any SGs it should be possible to control what they are. I'd be interested in working on adding support for this but it's not clear what the API would look like. |
@sedefsavas thanks for sharing! Are there any short term plans to implement this issue then? e.g. by accepting #2876 ? |
/remove-lifecycle frozen |
/milestone clear |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
@belgaied2 - i think we can use this issue for our discussion. Would you be able to add your use case here? |
Thanks @richardcase , Yes, this issue is of interest to us over at cluster-api-provider-rke2 since RKE2 relies on node registration mechanism that uses a custom port on control plane machines. Principle is as follows:
This means that, if our provider needs to integrate with CAPA, it needs to customize the securityGroup for the controlplane (which is created by CAPA). The reason why the existing feature of So, at best, we need a way to add a |
This seems reasonable to me. I can work on this: /assign |
@richardcase is there any ETA on the implementation? 🙏 |
/assign |
@cablunar already working on it |
/unassign |
Should this be closed if #3271 still hasn't been addressed from what I can tell? Of course feel free to correct me if I'm wrong! |
/kind feature
Describe the solution you'd like
#341 adds Calico rules to the security groups we apply to the cluster.
We also additionally include a bunch of "useful" defaults in the bootstrap cloudformation.
How do we manage the lifecycle of AWS resources which are critical to cluster operation but are being delivered by pluggable components?
Should we extend ProviderConfig for security rules?
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Also wondering if the bundler @justinsb and others are working on include the concept of cloud resources?
cc @sethp-nr
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: