Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLB support #3088

Closed
carlosonunez opened this issue Jan 20, 2022 · 72 comments · Fixed by #3804
Closed

NLB support #3088

carlosonunez opened this issue Jan 20, 2022 · 72 comments · Fixed by #3804
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@carlosonunez
Copy link

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

As an engineer looking to deploy Kubernetes with CAPA
I would like my control plane to be fronted by a Network Load Balancer instead of Classic ELBs
So that I can continue to use CAPA after AWS deprecates them

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Classic ELBs are getting deprecated in August 2022, and AWS SDK for Go supports NLBs. We should use them!

More info here

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 20, 2022
@k8s-ci-robot
Copy link
Contributor

@carlosonunez: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sedefsavas
Copy link
Contributor

sedefsavas commented Jan 20, 2022

Thanks for bringing this up @carlosonunez
I think we are tracking this somewhere else, will find out and link these.

There is an ongoing test to see how to deal with hairpinning issue comes with NLBs (when LB directs Kubelet traffic to the api-server on the same node).

@carlosonunez
Copy link
Author

That's great that this is being tracked elsewhere! I couldn't find any issues pertaining to it and thought "That can't be right!" :)

@cbusch-pivotal
Copy link

Any update on whether this truly is being tracked? thanks

@sedefsavas sedefsavas added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2022
@sedefsavas sedefsavas added this to the v1.x milestone Jan 24, 2022
@sedefsavas sedefsavas self-assigned this Jan 24, 2022
@dlipovetsky
Copy link
Contributor

I haven't seen issue tracking NLB support, but we considered using NLB back in 2018: #115 (comment)

@dlipovetsky
Copy link
Contributor

There is an ongoing test to see how to deal with hairpinning issue comes with NLBs (when LB directs Kubelet traffic to the api-server on the same node).

For reference, CAPZ also has a hairpinning issue, although specifically when using an internal load balancer: kubernetes-sigs/cluster-api-provider-azure#1033 (comment). The current workaround is to prevent requests from kubelet on the control plane machines from going through the LB, by resolving NLB DNS record to 127.0.0.1.

@sedefsavas
Copy link
Contributor

I could not find a specific issue for this, I see this was discussed in CAPA v1alpha4 API changes document but during today's office hours, we triaged this issue and keeping this issue.

@sedefsavas
Copy link
Contributor

For reference, CAPZ also has a hairpinning issue, although specifically when using an internal load balancer: kubernetes-sigs/cluster-api-provider-azure#1033 (comment). The current workaround is to prevent requests from kubelet on the control plane machines from going through the LB, by resolving NLB DNS record to 127.0.0.1.

Thanks @dlipovetsky, good to know there is an example workaround. If I can't make it work by disabling client IP preservation, this might be the way to go.

@lubronzhan
Copy link

lubronzhan commented Jan 25, 2022

As suggested by @sedefsavas, changing the target group type from instance to IP, can also workaround this issue https://aws.amazon.com/premiumsupport/knowledge-center/target-connection-fails-load-balancer/
But the downside of this is the IP of instance might change during lifecycle, the corresponding target group needs to be updated, also the NLB's listeners needs to be updated as well. This could cause downtime.

Update
With instance type target, I can still curl the NLB endpoint within control plane, with appropriate SecurityGroup

@sedefsavas
Copy link
Contributor

Just a correction on retirement of Classic load balancers mentioned in the issue:

EC2 Classic Networking is being retired not the Classic Load Balancers.
Classic Load Balancers that are being used in EC2 Classic networking will need to be migrated to an LB in a VPC. If Classic Load Balancers are already in a VPC, they won't be affected.

TL;DR August 2022 retirement date is not for Classic ELBs.

@carlosonunez
Copy link
Author

carlosonunez commented Feb 17, 2022 via email

@anmolsachan
Copy link

anmolsachan commented Mar 1, 2022

This feature(NLB for control-plane) is very much required in the market irrespective of the CLB still being there.
Please let me know if there is a need for a contribution to this.

@vincepri
Copy link
Member

Are we planning to convert the existing Classic ELBs to NLBs?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 10, 2022
@voor
Copy link
Member

voor commented Jul 10, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 10, 2022
@sedefsavas sedefsavas changed the title Use NLBs instead of Classic ELBs NLB support Jul 28, 2022
@sedefsavas
Copy link
Contributor

While implementing this, we can avoid introducing the problem of creating LBs with identical names for clusters created with same name in different namespaces: #969

NLB names need to be unique in the same AWS account and region.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 2, 2022

Hmm, I mean, generally having multiple clusters in the same VPC will end up with that cluster be able to access other clusters on the same network sort of, right?

@voor
Copy link
Member

voor commented Nov 2, 2022

Just wondering if you're over-rotating on something that shouldn't be considered a problem -- what is the concern with something going straight to the EC2 instance's port 6443? It's not being advertised, and even if something did hit it directly it's the same security as if it went through the NLB. There is nothing magical happening at the NLB to shape or modify that traffic, the NLB is effectively some iptables magic itself to handle load balancing and forwarding the proper traffic.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 2, 2022

@voor Right. However, opening up 6443 to the world is a bit of a stretch, isn't it? :) Even if it isn't advertised. And yes, NLB is just some traffic routing magic it doesn't modify the request at all. But you still would want some amount of security regarding traffic coming to a port. You generally don't open any ports to the outside world.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 2, 2022

BTW, on it being public. With ipv6 there are no private addresses. So this because a moot point. If you use an IPv6 VPC the port will, "technically" be open to the world anyways.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 2, 2022

@voor I get what you are saying, the reason I need the port opened is that right now it only has it open for a security group as a source. Which doesn't work with the NLB. But yes, since the node is in a private network, opening it to 0.0.0.0/0 shouldn't be a problem, but I would like to be absolutely sure about that. :D

For example, I'm not sure / don't know if you can have control plane nodes in public networks if you so choose.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 2, 2022

You can technically have control plane nodes in public subnets if you bring your own cluster. I'm saying technically only because it's not really advised... :D

@Skarlso
Copy link
Contributor

Skarlso commented Nov 3, 2022

Successfully created an NLB. Now, lots of testing remaining and making it work with IPv6. ( that shouldn't be a problem. :) )

@johannesfrey
Copy link

johannesfrey commented Nov 4, 2022

@voor I get what you are saying, the reason I need the port opened is that right now it only has it open for a security group as a source. Which doesn't work with the NLB. But yes, since the node is in a private network, opening it to 0.0.0.0/0 shouldn't be a problem, but I would like to be absolutely sure about that. :D

For example, I'm not sure / don't know if you can have control plane nodes in public networks if you so choose.

I'm just wondering (if opening to 0.0.0.0/0 should be the default way to go here) if we still have a (configurable) way to restrict the access to the api-server for specific clients? Or is this a general "problem" of NLBs, as they don't work with SGs and the SG should rather be seen in conjunction with the actual EC2 instance of the control-plane? If so, in addition to the client IP that should be granted access, do you know if the internal IP of the NLB must also be added to the target security group of the instance?

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

The NLB doesn't really have an internal IP OR it's constantly changing. So that would be a difficult manner. However, since the NLB is part of the VPC, it's enough to add the VPC CIDR to the security group. That way, any IP that the NLB will get will be allowed, and at the very least, it's not 0.0.0.0. :)

And yes, that seems to be the consensus when dealing with NLBs since they don't have security group attachments similar to NAT gateways. It's a Layer 4 load balancer. Historically though, as I've read in many forums by AWS employees, there is no REAL reason that they couldn't make it work with security groups; it's a deliberate decision. Which we have no privy to, sadly. :)

@johannesfrey
Copy link

Ah, sure. 0.0.0.0 is not necessary 🙂.
Yeah, really a bummer that NLBs do not work with security groups in the usual way.
But speaking of a SG in front of the EC2 instance of the control-plane, it would not suffice to add the client IP as source (with client IP preservation on) because in reality the NLB connects to that instance (with an unknown internal IP)?

On a side note: @Skarlso, thanks so much for your quick replies and that you are driving this forward. Awesome! 👍

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

No problem. :) I like working on this issue. :D It's very complex. :)

But speaking of a SG in front of the EC2 instance of the control-plane, it would not suffice to add the client IP as source (with client IP preservation on) because in reality the NLB connects to that instance (with an unknown internal IP)?

I'm not a 100% sure yet to be honest. I don't see with what IP address the nlb tries contacting the instance. If it's using preservation I'm not sure which client ip it preserves.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

Incidentally, that might be the cause of my current problem. Namely that the instance is not joining the cluster. Or more precisely, it looks like kubelet is failing to start up.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

BOOM

➜  k get nodes --kubeconfig test.kubeconfig
NAME                                           STATUS   ROLES                  AGE     VERSION
ip-10-0-129-98.eu-central-1.compute.internal   Ready    control-plane,master   5m55s   v1.22.1


➜  cluster clusterctl describe cluster test-aws-cluster
NAME                                                                 READY  SEVERITY  REASON  SINCE  MESSAGE
Cluster/test-aws-cluster                                             True                     6m2s
├─ClusterInfrastructure - AWSCluster/test-aws-cluster                True                     8m30s
└─ControlPlane - KubeadmControlPlane/test-aws-cluster-control-plane  True                     6m2s
  └─Machine/test-aws-cluster-control-plane-pjnbs                     True                     8m10s

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

Okay, so this work in one of two ways:

Either:

  • Disable client IP preservation and allow for the VPC's private IP address to go through only
    Or:
  • Enable client IP preservation, but in that case, I have to open it up with 0.0.0.0/0 for it to work

What I'm going to do is, create a parameter/switch that lets the user control this behaviour. Is that okay?

@johannesfrey
Copy link

johannesfrey commented Nov 4, 2022

Okay, so this work in one of two ways:

Either:

* Disable client IP preservation and allow for the VPC's private IP address to go through only
  Or:

* Enable client IP preservation, but in that case, I have to open it up with `0.0.0.0/0` for it to work

What I'm going to do is, create a parameter/switch that lets the user control this behaviour. Is that okay?

Ah ok. So with client IP preservation on you must open it up for 0.0.0.0/0 because you cannot know beforehand which client tries to connect (e.g. from another VPC or externally)? In contrast to client IP preservation being off, the source will always be the NLB IP, which resides in the same VPC?!

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

Ah ok. So with client IP preservation on you must open it up for 0.0.0.0/0 because you cannot know beforehand which client tries to connect (e.g. from another VPC or externally)? In contrast to client IP preservation being off, the source will always be the NLB IP, which resides in the same VPC?!

Correct. 👍

@johannesfrey
Copy link

Ah ok. So with client IP preservation on you must open it up for 0.0.0.0/0 because you cannot know beforehand which client tries to connect (e.g. from another VPC or externally)? In contrast to client IP preservation being off, the source will always be the NLB IP, which resides in the same VPC?!

Correct. +1

Ok, got it. One naive thought: So instead of prematurely opening it up completely for 0.0.0.0/0 here with CAPA, couldn't we delegate the work to or do it similar to e.g. the aws-load-balancer-controller, which picks up the given CIDR from service.spec.loadBalancerSourceRanges and creates the necessary SGs/IngressRules only for this CIDR?

Though, this would depend on a Service of type LoadBalancer for the api-server... And this would also probably be too late, as the opening up is required during the Cluster creation phase?! Though, kubelet should not rely on it, as it will talk to the local api-server because of the /etc/hosts rewrite?! But most probably I'm mixing up stuff here 😅

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

And this would also probably be too late, as the opening up is required during the Cluster creation phase

This is correct. Without opening it up the node will not join the cluster and kubelet will fail with various errors or things that it can't reach.

Some of which MIGHT be the hair pinning issue which I'm investigating. For example just this:

Nov 04 08:23:45 ip-10-0-118-234 kubelet[1810]: E1104 08:23:45.087874    1810 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Nov 04 08:23:45 ip-10-0-118-234 kubelet[1810]: E1104 08:23:45.156058    1810 kubelet.go:2407] "Error getting node" err="node \"ip-10-0-118-234.eu-central-1.compute.internal\" not found"
Nov 04 08:23:45 ip-10-0-118-234 kubelet[1810]: E1104 08:23:45.256934    1810 kubelet.go:2407] "Error getting node" err="node \"ip-10-0-118-234.eu-central-1.compute.internal\" not found"

Looks very suspiciously like a hair pinning issue where kubelet can't access the node using its domain name.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 4, 2022

Okay, so I got something working for now. I'm going to roll with that, then iterate over the possible hair pinning issue.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 6, 2022

So I tried the hosts rewrite. Sadly, it didn't work. :(

Nov 06 06:22:43 ip-10-0-120-185 kubelet[2114]: E1106 06:22:43.455629    2114 kubelet.go:2407] "Error getting node" err="node \"ip-10-0-120-185.eu-central-1.compute.internal\" not found"
Nov 06 06:22:43 ip-10-0-120-185 kubelet[2114]: E1106 06:22:43.554024    2114 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://ypcqehss1nfq26i6gfooheufe2ok-k8s-3f5da43546a9b6c8.elb.eu-central-1.amazonaws.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-120-185.eu-central-1.compute.internal?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Nov 06 06:22:43 ip-10-0-120-185 kubelet[2114]: E1106 06:22:43.558110    2114 kubelet.go:2407] "Error getting node" err="node \"ip-10-0-120-185.eu-central-1.compute.internal\" not found"

So for now, I'm staying with my solution. FFW the network loadbalancer doc troubleshoot section also says that we should disable preserve client ip if hairpinning is an issue.

@voor
Copy link
Member

voor commented Nov 6, 2022

Does it set the X-Forwarded-For request header or X-Real-Ip header? That way audit logs will still have the client IP somewhere.

This is extremely unlikely since nlb is layer4 and headers are layer 7

@Skarlso
Copy link
Contributor

Skarlso commented Nov 6, 2022

It does, yes.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 6, 2022

Ah, yeah, I misread the doc. You're right @voor it does not set that header. Something else does if client ip preservation is on.

@richardcase richardcase removed this from the v1.x milestone Nov 10, 2022
@johannesfrey
Copy link

Just out of interest, as it's seems not be tracked in a milestone. I assume it won't be part of the v2.0.0 milestone?!

@Skarlso
Copy link
Contributor

Skarlso commented Nov 16, 2022

Probably not, because the PR needs reviews, tests, etc. That won't happen now, the team is pushing for 2.0.0 milestones and fixes now. Sorry! :/

@johannesfrey
Copy link

johannesfrey commented Nov 16, 2022

Probably not, because the PR needs reviews, tests, etc. That won't happen now, the team is pushing for 2.0.0 milestones and fixes now. Sorry! :/

Sure, no worries 🙂. Totally understandable. Just noticed that 2.0.0 is about to be released soon, so I thought I ask. Really nice that 2.0.0 is coming, even though without NLB 🙂 Unfortunately, I'm not too deep into the code base but I would also take a look at the PR (this should not block any further changes etc. :) next week, latest.

@Skarlso
Copy link
Contributor

Skarlso commented Dec 2, 2022

There are two hanging things:

But I could consider this closed. Please test and see if anything is missing / not working. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.