Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health-check support to loadbalancer #9757

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

brandond
Copy link
Contributor

@brandond brandond commented Mar 19, 2024

Proposed Changes

Add health-check support to loadbalancer

  • Adds support for health-checking loadbalancer servers. Servers are now health-checked periodically and before dialing; if the health-check fails, all existing connections to the server will be closed.
  • Wires up a remotedialer tunnel connectivity check as the health check for supervisor/apiserver connections.
  • Wires up a simple http request to the supervisor as the health check for etcd connections.

This ensures that any load-balanced connections to a node will be closed when the remotedialer tunnel to that node disconnects. This isn't much of an issue on K3s as the apiserver always exits when the supervisor shuts down, but on RKE2 the apiserver may continue running even when the supervisor stops, but without its load-balanced connection to etcd. This change will ensure that agents and etcd-only nodes disconnect from an apiserver when its supervisor goes down. Similarly, apiservers will fail over to a different etcd server when an etcd node's supervisor goes down.

Types of Changes

enhancement / bugfix

Verification

See linked RKE2 issue

There is no effective change to K3s, as the apiserver and etcd always exit when the supervisor exits, which forces clients to disconnect. The change is only apparent on RKE2, where the pods can continue running in a degraded state after the supervisor exits.

Testing

Linked Issues

User-Facing Change


Further Comments

@brandond brandond requested a review from a team as a code owner March 19, 2024 22:04
Copy link

codecov bot commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 47.20000% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 46.48%. Comparing base (8aecc26) to head (f4ff494).

Files Patch % Lines
pkg/etcd/etcdproxy.go 0.00% 39 Missing ⚠️
pkg/agent/loadbalancer/servers.go 66.66% 9 Missing and 4 partials ⚠️
pkg/agent/proxy/apiproxy.go 33.33% 7 Missing and 1 partial ⚠️
pkg/cluster/managed.go 0.00% 3 Missing ⚠️
pkg/agent/config/config.go 0.00% 1 Missing ⚠️
pkg/agent/loadbalancer/loadbalancer.go 83.33% 0 Missing and 1 partial ⚠️
pkg/cluster/cluster.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9757      +/-   ##
==========================================
- Coverage   52.94%   46.48%   -6.47%     
==========================================
  Files         154      154              
  Lines       13601    13679      +78     
==========================================
- Hits         7201     6358     -843     
- Misses       5038     6100    +1062     
+ Partials     1362     1221     -141     
Flag Coverage Δ
e2etests 39.21% <47.20%> (-10.25%) ⬇️
inttests 22.23% <12.00%> (-17.15%) ⬇️
unittests 16.17% <12.80%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brandond brandond force-pushed the loadbalancer-enhancements branch 11 times, most recently from 6db4ff3 to 07af6b7 Compare March 21, 2024 08:56
dereknola
dereknola previously approved these changes Mar 21, 2024
@brandond brandond force-pushed the loadbalancer-enhancements branch 4 times, most recently from a5a3877 to e4e6c35 Compare March 21, 2024 22:51
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
@brandond brandond requested review from dereknola and a team March 22, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants