-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: use segment fallocate size from config #3288
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some unit tests that run and validate a workload against the segment appender while varying the fallocation step.
src/v/storage/segment_appender.h
Outdated
@@ -49,7 +51,8 @@ class segment_appender { | |||
|
|||
ss::io_priority_class priority; | |||
size_t number_of_chunks; | |||
size_t falloc_step{fallocation_step}; | |||
size_t falloc_step{ | |||
config::shard_local_cfg().segment_fallocation_step()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to stop using the thread-local for this stuff but pass it as an option on the caller, all the way to main.
testing core-local configs are super hard. generally not a good practice to use it for tests in particular.
to clarify @ZeDRoman we should run some workload (e.g. appends/reads in the simple case, or generally a simple fuzzer like opfuzz) that checks that the segment appender appears to be working ok. |
3b80e85
to
bff4782
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great thanks for adding the tests. let's expand the tests just a little bit.
@@ -31,7 +31,7 @@ SEASTAR_THREAD_TEST_CASE(test_can_append_multiple_flushes) { | |||
| ss::open_flags::truncate) | |||
.get0(); | |||
auto appender = segment_appender( | |||
f, segment_appender::options(ss::default_priority_class(), 1)); | |||
f, segment_appender::options(ss::default_priority_class(), 1, 16_KiB)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good, but can we parameter all the existing tests, not just a new test that is added, because presumably all of the existing functionality should continue to be preserved independent of the allocation size.
I think factoring out the body of the tests into helper functions will make this straight forward. Something like:
static void test_can_append_multiple_flushes(size_t size) {
...
}
SEASTAR_THREAD_TEST_CASE(test_can_append_multiple_flushes) {
run_test_can_append_multiple_flushes(16_KiB);
run_test_can_append_multiple_flushes(32_MiB);
@@ -217,3 +217,40 @@ SEASTAR_THREAD_TEST_CASE(test_can_append_little_data) { | |||
BOOST_REQUIRE_EQUAL(appender.file_byte_offset(), data.size()); | |||
appender.close().get(); | |||
} | |||
|
|||
SEASTAR_THREAD_TEST_CASE(test_fallocate_size) { | |||
for (const size_t fallocate_size : {1ul, 16_KiB, 32_MiB}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what validation is applied to this new parameter? can I use 0
here or non-power-of-two values? I think it is a good idea to have validation (if we don't already have that--perhaps I missed it), and make sure the tests here have a few edge cases in the space of allowed values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use any value > 0. But in segment_appender fallocation_offset will be aligned to 4096.
So I think that here will be better to have 4096 divisor in order to have page aligned fallocations
I'll add validator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may find the existing storage::internal::chunk_cache::validate_chunk_size
useful as an example (currently used for the append_chunk_size
setting).
It would be good to have some upper and lower bounds on this to prevent craziness if the user types it wrong: minimum should be the append_chunk_size (if we can't get at the live value from the validator then use the default 16kib) and max should be something arbitrary, maybe the default segment size (1gib). Arbitrary limits are kind of annoying, but it's better than enabling the user to instantly fill their disk by typing too many zeros in a config option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, just suggestions for making the setting adjustable without restart, and validation.
src/v/config/configuration.cc
Outdated
*this, | ||
"segment_fallocation_step", | ||
"Size for segments fallocation", | ||
{.example = "32768", .visibility = visibility::tunable}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be relatively easy to make live-adjustable:
- Consume it via the .bind() method and pass around a
binding
rather than an integer (https://vectorizedio.atlassian.net/wiki/spaces/CORE/pages/78676076/Configuration+Properties) - Once that's done, set ".needs_restart = needs_restart::no" here.
@@ -217,3 +217,40 @@ SEASTAR_THREAD_TEST_CASE(test_can_append_little_data) { | |||
BOOST_REQUIRE_EQUAL(appender.file_byte_offset(), data.size()); | |||
appender.close().get(); | |||
} | |||
|
|||
SEASTAR_THREAD_TEST_CASE(test_fallocate_size) { | |||
for (const size_t fallocate_size : {1ul, 16_KiB, 32_MiB}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may find the existing storage::internal::chunk_cache::validate_chunk_size
useful as an example (currently used for the append_chunk_size
setting).
It would be good to have some upper and lower bounds on this to prevent craziness if the user types it wrong: minimum should be the append_chunk_size (if we can't get at the live value from the validator then use the default 16kib) and max should be something arbitrary, maybe the default segment size (1gib). Arbitrary limits are kind of annoying, but it's better than enabling the user to instantly fill their disk by typing too many zeros in a config option.
Add segment_fallocate size to config properties and set segment_appender fallocate size from config
LGTM, needs Noah's earlier request for changes to be dismissed or replaced with a +1 before github will let us merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
@@ -77,6 +77,20 @@ class segment_appender { | |||
|
|||
void set_callbacks(callbacks* callbacks) { _callbacks = callbacks; } | |||
|
|||
/** Validator for fallocation step configuration setting */ | |||
static std::optional<ss::sstring> | |||
validate_fallocation_step(const size_t& value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there is not any benefit to taking a size_t
by const&
. I wonder if config::
is making this requirement?
Cover letter
"Today, redpanda fallocates 32M whenever necessary. We currently inject one raft message into every partition when a topic is created. If I create a topic with 1000 partitions, 32G of my disk is eaten.
The fallocate was chosen to be 32M, and large sizes are beneficial, but it may be useful to have this size be a configuration option" (#2876)
Fixes #2876
Release notes
Features
segment_fallocation_step
property. The default is unchanged from the previous behavior (32MB). You may wish to decrease this property if creating large numbers of partitions on systems with limited disk space.