-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make max client connections configurable and refactor rpc::connection_cache
#12906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you maybe comment why you now switched away from your original hash ring approach to a simple shuffle based one?
}; | ||
|
||
explicit backoff_policy(std::unique_ptr<impl> i) | ||
: _impl(std::move(i)) {} | ||
|
||
backoff_policy(backoff_policy&&) = default; | ||
|
||
backoff_policy(const backoff_policy& o) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rule of 3/5 something something? I have forgotten when things get implicitly constructed these days but might cleaner to just default/delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, will add a copy/move assignment operator.
src/v/rpc/connection_cache.h
Outdated
ss::future<> | ||
remove_connection_location(ss::shard_id dest_shard, model::node_id node) { | ||
return container().invoke_on(dest_shard, [node](auto& cache) { | ||
auto conn_loc = cache._connection_map.find(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think the find is redundant. You can just call erase
which will be noop if not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find, switching to just erase
src/v/rpc/connection_cache.cc
Outdated
if (!_connection_map.contains(n)) { | ||
return {}; | ||
} | ||
return {_connection_map.at(n)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use find
for this pattern? Avoids an extra hash table look up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you are, switching over to find
Couple of reasons;
|
|
||
absl::flat_hash_map<model::node_id, absl::flat_hash_set<ss::shard_id>> | ||
_node_to_shards; | ||
std::vector<std::pair<ss::shard_id, size_t>> _connections_per_shard; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a struct would be nice here as well for readability as it avoids the std::get stuff.
rpc::connection_cache
rpc::connection_cache
af0da93
to
a01d200
Compare
a01d200
to
67b4a9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool stuff
_connections.erase(connection); | ||
} | ||
|
||
auto cert_creds = co_await maybe_build_reloadable_certificate_credentials( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we yield here then _connections now doesn't contain an entry for this connection which other fibers might rely on. Is that a problem?
E.g.: connection_set::get straight out does:
transport_ptr get(model::node_id n) const {
return _connections.find(n)->second;
}
} | ||
auto holder = _gate.hold(); | ||
|
||
auto& alloc_strat = _coordinator_state->alloc_strat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potenial access to _coordinator_state->alloc_strat without owning the mutex?
} | ||
auto holder = _gate.hold(); | ||
|
||
auto& alloc_strat = _coordinator_state->alloc_strat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potenial access to _coordinator_state->alloc_strat without owning the mutex?
Tried fixing the things I pointed out above as per https://github.com/redpanda-data/redpanda/compare/stephan/connection-cache-fixes?expand=1 but still seems to crash so probably not the root cause. Maybe can still serve as inspiration. |
I don't yet have any specific feedback, but generally the |
The first change this PR makes is to allow for the number of client connections to each of the brokers user configurable.
The second change this PR makes is to use a stateful connection allocation method that ensures the following;
Shard aware connections v2 #8)
Fixes #12912
Backports Required
Release Notes
Features
rpc_client_connections_per_shard
cluster property that allows for the number of clients a broker opens to a given peer to be user configurable.