Ability to customize security group rules #392

randomvariable · 2018-11-20T11:29:16Z

/kind feature

Describe the solution you'd like
#341 adds Calico rules to the security groups we apply to the cluster.
We also additionally include a bunch of "useful" defaults in the bootstrap cloudformation.

How do we manage the lifecycle of AWS resources which are critical to cluster operation but are being delivered by pluggable components?

Should we extend ProviderConfig for security rules?

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Also wondering if the bundler @justinsb and others are working on include the concept of cloud resources?

cc @sethp-nr

Environment:

Cluster-api-provider-aws version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

sethp-nr · 2018-11-20T18:09:38Z

It's an interesting question, for sure. My interpretation of the landscape was that this project was a "reasonable default" implementation for a kubernetes cluster on AWS, which in my mind implied picking a CNI provider & wiring it up.

The other implementation strategy I considered was modifying the security group rules to allow "all traffic" between and within the controlplane and node SGs. IMO that's a different but still reasonable "default" spot to be, and it would leave the question of how to network largely up to the choices in the addons.yaml file.

fejta-bot · 2019-04-28T03:06:59Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2019-04-28T03:20:47Z

/remove-lifecycle stale

dlipovetsky · 2019-08-01T18:32:58Z

Some notes:
There are two ways to allow end users to customize rules:

Modify rules in CAPA-reconciled security groups
Add security groups

Security groups are just collections of rules. An instance can have multiple security groups; AWS internally merges them all into one group, which is the most permissive of all the groups.

Currently, CAPA defines groups for specific machine roles (bastion host, control plane, node). The groups contain rules for the CNI plugin and stacked etcd. When an end users wants to use a CNI plugin other than Calico, or use external etcd, then those rules should not exist.

If CAPA can define groups for components, instead of machine roles, then it should be possible for the groups to contain only the necessary set of rules. That implies that end users are free to add security groups with the guarantee that their groups will not break cluster functionality because of the way AWS' internal merge works.

I strongly prefer this to asking end users to modify CAPA-reconciled security groups. If end users are allowed to modify CAPA-reconciled security groups, CAPA itself will need to guarantee that end users are not breaking cluster functionality.

detiber · 2019-08-01T18:56:01Z

@dlipovetsky One potential downside to separating out everything into separate groups is that we could start running into AWS limits on the number of Security Groups that can be attached to an instance.

Based on the docs, it looks like we are limited to a maximum of 5 security groups per network interface, which means if we would exceed that, then we would have to start allocating multiple ENIs on instance creation as well as managing the security groups properly across the multiple ENIs.

dlipovetsky · 2019-08-01T19:05:04Z

@detiber Good point, thanks.

dlipovetsky · 2019-08-14T22:28:26Z

To increase or decrease this limit, contact AWS Support. The maximum is 16. The limit for security groups per network interface multiplied by the limit for rules per security group cannot exceed 1000. For example, if you increase this limit to 10, we decrease the limit for your number of rules per security group to 100.

Per the docs linked, I want to note that while the default limit is 5 security groups per network interface, the limit can be increased on request, though there's no guarantee it would be.

End users should not have to request limit increases just to use CAPA's default security group configuration. But can end users be asked to request limit increases to be able to use CAP with custom security groups?

dlipovetsky · 2019-08-14T22:35:59Z

Another user story: Prometheus is a popular solution for monitoring. The Node Exporter is a popular way of collecting host-level metrics for Prometheus. Frequently, Prometheus "scrapes" (queries) Node Exporter endpoints. One Node Exporter runs on every host, and it runs in the host's network namespace. Therefore Node Exporter serves metrics on the host's IP, on some unprivileged port (typically 9100).

With the default security group rules, a Prometheus Pod running on one host cannot reach a Node Exporter running in a different host.

I'm reaching out to the Prometheus folks to understand the consequence of running Node Exporter in the Pod network namespace and will report back here.

Update: I found some evidence in the Prometheus helm chart for running Node Exporter in the Pod network namespace: helm/charts#12747.

ncdc · 2019-08-26T17:30:23Z

This will likely require API changes

fejta-bot · 2019-11-24T17:40:44Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2019-11-25T16:30:04Z

/remove-lifecycle stale
/lifecycle frozen

benmoss · 2019-12-18T17:12:34Z

This is a pain for us trying to add Windows support, since Calico only supports Windows in their proprietary version from Tigera. The main OSS CNI with good Windows support is Flannel, so we're stuck documenting how to create SGs manually. This will also likely break cluster deletion, since you won't be able to delete the VPC when security groups still exist inside of it. It feels like if CAPA is going to create any SGs it should be possible to control what they are.

I'd be interested in working on adding support for this but it's not clear what the API would look like.

bsideup · 2021-12-06T16:07:07Z

@sedefsavas thanks for sharing! Are there any short term plans to implement this issue then? e.g. by accepting #2876 ?

sedefsavas · 2022-03-17T02:12:23Z

Another customization to cover under this:
#3314

Also,
#3271

richardcase · 2022-07-12T15:28:36Z

/remove-lifecycle frozen

richardcase · 2022-07-25T16:59:46Z

/milestone clear

k8s-triage-robot · 2022-10-23T18:24:36Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rymancl · 2022-10-23T18:35:48Z

/remove-lifecycle stale

richardcase · 2023-01-18T12:18:25Z

@belgaied2 - i think we can use this issue for our discussion. Would you be able to add your use case here?

belgaied2 · 2023-01-18T12:28:20Z

Thanks @richardcase ,

Yes, this issue is of interest to us over at cluster-api-provider-rke2 since RKE2 relies on node registration mechanism that uses a custom port on control plane machines.

Principle is as follows:

First control plane machine initializes a cluster and open the port 9345.
subsequent nodes (both control plane and workers) will connect to that port to register themselves in the cluster.

This means that, if our provider needs to integrate with CAPA, it needs to customize the securityGroup for the controlplane (which is created by CAPA).

The reason why the existing feature of additionalSecurityGroups on AWSMachineTemplate resources does not work is because these securityGroup(s) need to exist beforehand, when the VPC is not yet created by the AWSCluster controller.

So, at best, we need a way to add a securityRule to the securityGroups that created by CAPA.

richardcase · 2023-01-18T12:34:20Z

This seems reasonable to me. I can work on this:

/assign
/lifecycle active

cablunar · 2023-03-09T08:37:00Z

@richardcase is there any ETA on the implementation? 🙏

alexander-demicev · 2023-04-19T09:18:39Z

/assign

alexander-demicev · 2023-04-19T09:18:51Z

@cablunar already working on it

richardcase · 2023-04-19T13:29:49Z

/unassign

com6056 · 2023-08-25T20:01:11Z

Should this be closed if #3271 still hasn't been addressed from what I can tell? Of course feel free to correct me if I'm wrong!

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 20, 2018

detiber added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Nov 26, 2018

timothysc added this to the Next milestone Jan 4, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019

ncdc changed the title ~~Handling AWS resources for cluster addons~~ Ability to customize security group rules Jul 1, 2019

ncdc modified the milestones: Next, v0.4 Jul 1, 2019

ncdc modified the milestones: v0.4.0 (v1alpha2), v0.4.x (v1alpha2), Next Aug 26, 2019

arzarif mentioned this issue Sep 18, 2019

BYO security groups #1137

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2019

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 25, 2019

ncdc mentioned this issue Jun 10, 2020

Ability to customize CNI ports #1744

Closed

bluejayA mentioned this issue Mar 16, 2022

[Dev] prometheus target 정상화 (feat. dashboard 확인) openinfradev/decapod-issues#33

Closed

sedefsavas mentioned this issue Mar 17, 2022

Security group rule "Node Port Services" can be more restrictive #3314

Open

Ankitasw mentioned this issue Mar 23, 2022

<clustername>-node security group contains a rule that overshadows <clustername>-lb security group rules #3271

Closed

richardcase mentioned this issue Mar 29, 2022

allow specifying additional ingress rules #2876

Closed

2 tasks

k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 12, 2022

k8s-ci-robot removed this from the v1.5.0 milestone Jul 25, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2022

k8s-ci-robot assigned richardcase Jan 18, 2023

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 18, 2023

k8s-ci-robot assigned alexander-demicev Apr 19, 2023

k8s-ci-robot unassigned richardcase Apr 19, 2023

richardcase removed the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 19, 2023

richardcase unassigned sedefsavas Apr 19, 2023

alexander-demicev mentioned this issue Apr 20, 2023

Additional ingress rules for control plane #4228

Merged

4 tasks

k8s-ci-robot closed this as completed in #4228 May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to customize security group rules #392

Ability to customize security group rules #392

randomvariable commented Nov 20, 2018

sethp-nr commented Nov 20, 2018

fejta-bot commented Apr 28, 2019

vincepri commented Apr 28, 2019

dlipovetsky commented Aug 1, 2019

detiber commented Aug 1, 2019

dlipovetsky commented Aug 1, 2019

dlipovetsky commented Aug 14, 2019

dlipovetsky commented Aug 14, 2019 •

edited

Loading

ncdc commented Aug 26, 2019

fejta-bot commented Nov 24, 2019

vincepri commented Nov 25, 2019

benmoss commented Dec 18, 2019 •

edited

Loading

bsideup commented Dec 6, 2021

sedefsavas commented Mar 17, 2022 •

edited

Loading

richardcase commented Jul 12, 2022

richardcase commented Jul 25, 2022

k8s-triage-robot commented Oct 23, 2022

rymancl commented Oct 23, 2022

richardcase commented Jan 18, 2023

belgaied2 commented Jan 18, 2023

richardcase commented Jan 18, 2023

cablunar commented Mar 9, 2023

alexander-demicev commented Apr 19, 2023

alexander-demicev commented Apr 19, 2023

richardcase commented Apr 19, 2023

com6056 commented Aug 25, 2023 •

edited

Loading

Ability to customize security group rules #392

Ability to customize security group rules #392

Comments

randomvariable commented Nov 20, 2018

sethp-nr commented Nov 20, 2018

fejta-bot commented Apr 28, 2019

vincepri commented Apr 28, 2019

dlipovetsky commented Aug 1, 2019

detiber commented Aug 1, 2019

dlipovetsky commented Aug 1, 2019

dlipovetsky commented Aug 14, 2019

dlipovetsky commented Aug 14, 2019 • edited Loading

ncdc commented Aug 26, 2019

fejta-bot commented Nov 24, 2019

vincepri commented Nov 25, 2019

benmoss commented Dec 18, 2019 • edited Loading

bsideup commented Dec 6, 2021

sedefsavas commented Mar 17, 2022 • edited Loading

richardcase commented Jul 12, 2022

richardcase commented Jul 25, 2022

k8s-triage-robot commented Oct 23, 2022

rymancl commented Oct 23, 2022

richardcase commented Jan 18, 2023

belgaied2 commented Jan 18, 2023

richardcase commented Jan 18, 2023

cablunar commented Mar 9, 2023

alexander-demicev commented Apr 19, 2023

alexander-demicev commented Apr 19, 2023

richardcase commented Apr 19, 2023

com6056 commented Aug 25, 2023 • edited Loading

dlipovetsky commented Aug 14, 2019 •

edited

Loading

benmoss commented Dec 18, 2019 •

edited

Loading

sedefsavas commented Mar 17, 2022 •

edited

Loading

com6056 commented Aug 25, 2023 •

edited

Loading