-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: add conn_quota
for connection count limits
#3901
Conversation
489b0f3
to
32ed576
Compare
32ed576
to
a984b8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall is pretty great. I think there might be some memory safety issues in the conn_quota service but that's about it.
a984b8c
to
67cc890
Compare
CI failure was known issue #3924 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes for semaphore lifetime looks good. one question about a unique scenario which looks like safety depends on capture list destruction order. f test
commit subject probably needs changing.
67cc890
to
456efba
Compare
This had a rather unlucky arm64 run with three (!) unrelated test failures:
|
src/v/net/conn_quota.cc
Outdated
}) | ||
.finally([u = std::move(units)] {}) | ||
// Keep allowance alive until after units are dropped | ||
.finally([allowance, u = std::move(units)] {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you intend to remove the units capture in the second finally block? it should be a noop, but clangd should also warn here about use-after-move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was a mistake. Fixed.
This helper was previously only used in a test that didn't set a non-empty default value, but it shouldn't have been ignoring the value passed to constructor.
These settings will enable limiting the number of concurrent connections per broker, or per client IP per broker.
This is just a pass through to the equivalent semaphore method. It's useful if calling from a context where you know you won't sleep.
This is the logic for tracking node-wide allowances for how many connections may be opened, or how many connections may be opened per client IP.
This is the first unit test definition in src/v/net, so comes with a new CMakeLists etc.
This counts how many connections we dropped for hitting a connection count limit.
I'm about to use it in connection_limits_test.
This is a ducktape test for exercising the new configurable connection count limits.
This fixes the potential crash from clear() getting called in the background while reclaim_to or cancel_reclaim_to was holding the mutex in home_allowance. remote_allowance doesn't need this special treatment.
456efba
to
cab9d6f
Compare
Cover letter
RFC: #3744
The main complexity here is the cross-shard coordination of limits that apply at a per-node level. Each client IP has a "home" shard where their
home_allowance
lives, and other shards can "borrow" some tokens in theirremote_allowance
. To avoid starvation when many tokens have been borrowed, there is a "reclaim" mechanism whereby the home shard reclaims all borrowed tokens, and sets a flag on the remote shards to dispatch any freed tokens back to the home shard.In normal operation where the limit is comfortably high enough for the workload, all shards that are handling connections will end up with a few tokens in their borrowed pool, and service incoming requests without having to do cross-shard communication. Cross-shard communication only kicks in when there is some pressure for the tokens.
This PR does not add the "overrides" variant of the config for setting limits on individual addresses, that is future work.
These settings are similar to Kafka's
max.connections.per.ip
andmax.connections
settings.Connection limits are only applied to the Kafka protocol server, not the internal RPC server.
Fixes #3698
Release notes
Features
kafka_connections_max
andkafka_connections_max_per_ip
are added to control this behavior. By default both properties are switched off (i.e. the connection count is unlimited). These limits apply on a per-node basis. Note that the number of connections is not exactly equal to the number of clients: clients typically open 2-3 connections each.