-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow rm_stm to control max compaction offset #5318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Still needs hook into compaction, and an integration test; to validate the compaction offset limiting works as expected.
src/v/storage/log_manager.cc
Outdated
@@ -122,6 +122,7 @@ log_manager::housekeeping_scan(model::timestamp collection_threshold) { | |||
co_await current_log.handle.compact(compaction_config( | |||
collection_threshold, | |||
_config.retention_bytes(), | |||
current_log.handle.stm_manager()->max_compactible_offset(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it kind of seems like rm_stm should update storage layer (as opposed to storage querying rm_stm). rm_stm is in a position to know when an offset changes, and having information flow downward there is less complex coupling. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmaslankaprv suggested using polling, it seems like a more straightforward solution, also in the next part of the compaction /transaction integration the compaction logic will need to query if a given record is a part of the aborted or committed transaction. using an object to query instead of passing a list of aborted records gives more space for memory optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmaslankaprv suggested using polling, it seems like a more straightforward solution
I'm not sure I agree it is more straight forward when we consider the lifetime of all the new objects involved, namely that the callback to storage is going to have a captured reference back to rm_stm or some other thing. It's not a serious issue and shouldn't block work on a fix.
I do agree for querying and memory optimization of aborted records it doesn't make sense to push that down. It seems like this suggests that either part of transaction management should be in storage or some of compaction should be lifted out. I think it's worth discussing this sort of thing even if we go with something quick so we stay aware of how much tech debt we might be accumulating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the inverted dependencies are annoying and it's a good idea to lift compaction logic above raft and let stm directly control it but it's out of scope of the compaction / transactions integration and we can't condition transactions release on the fundamental refactoring of the compaction logic.
Right now we compact adjacent segments but when the transactions are involved in order to decide if a record b
overwrites a record a
(even when the records are from the same segment) we may need to look ahead for a unbounded amount of segments for search for corresponding abort or commit txn markers. So we either need to change compaction from working with adjacent segments to work with unbounded segments or to rely on an external oracle who happens to know future to guide the compaction logic. The oracle requires less changes.
But there is a silver lining - we are not increasing the tech debt balance. The storage layer already depends on stm_manager which contains a reference to rm_stm so things are not getting worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i agree with Noah here, it would be cleaner to reverse the dependency, but i also got Denis'es point about already existing stm_manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I was thinking similar things as I wrote this PR; would like to chat about layering here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there is a silver lining - we are not increasing the tech debt balance. The storage layer already depends on stm_manager which contains a reference to rm_stm so things are not getting worse.
thanks for mentioning this observation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I suggest to postpone merging before we start using max_compactable_offset to actually limit compaction to get the whole picture of the change.
src/v/storage/types.h
Outdated
@@ -340,11 +340,14 @@ struct compaction_config { | |||
explicit compaction_config( | |||
model::timestamp upper, | |||
std::optional<size_t> max_bytes_in_log, | |||
std::optional<model::offset> max_compact_offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already use model::offset::max()
as an unset value (in stm_manager) so it's better to stick with it and not to introduce optional
. making a type optional adds another dimension to the value space which we should keep in mind when we use the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular API, compaction_config
uses optional
, so I thought it was more consistent. I will change it to ...max()
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already use model::offset::max()
the intention of default initialization model::offset{}
(aka ::min
) is to indicate an unset state. unless using ::max
is convenient because of a calculation that is based on min(a, b)
then it is probably worth sticking to default initialization? I think that is what most of storage:: uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 you guys are making an advertisement for std::optional
. Joking aside, stm_manager
does depend on being able to use min()
to reduce multiple values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I am pro-std::optional and very much would like to see model::offset{}
as an unset value go away. But in the mean time I guess we should be consistent :(
src/v/storage/types.h
Outdated
std::vector<ss::shared_ptr<snapshotable_stm>> _stms; | ||
ss::shared_ptr<compactable_stm> _comp_stm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a special case for this stm type ?, would it be enough to use max_collectible_offset
as a compaction boundary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was one of my first ideas. @rystsov indicated that this compaction codepath should be distinct from the snapshot/eviction code. We should chat about the definition of "max collectible" and all the use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking that just setting max_collectible_offset
to the lso isn't enough because it isn't used in compaction. We may update the compaction layer and use max_collectible_offset
to postpone the compaction but the current approach has merit too e.g. we may use max_compactable_offset
& max_collectible_offset
to make sure we never do log eviction before we compact. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts?
I'm not too sure what you are proposing/asking, @rystsov.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, he's saying the value of having them separate is that it allows stm to control compaction and eviction separately, and asking if we think it is worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would vote for using max_collectible_offset
in compaction, there is no point in compacting a log if it is about to be delete either way, having a special case for compaction stm make the whole thing more complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I'd love to get this merged, I tend to agree with @mmaslankaprv. At least this is what I wanted to try first, because we can always add separate compaction control in the future if needed. I didn't see a need for this currently, but maybe in the future we introduce a new compaction algorithm or something that needs it?
Unless there are any objections, I'll whip up a new version which eliminates the separate compactable offset plumbing, so we can see the difference in approaches.
src/v/storage/types.h
Outdated
template< | ||
typename T, | ||
typename = std::enable_if_t<std::is_base_of_v<snapshotable_stm, T>>> | ||
void add_stm(ss::shared_ptr<T> stm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a clearer api if you added an add_compactable_stm
interface since we already have a run-time requirement that two compatible stms cannot be added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up not doing this (separately-named method) because of the awkwardness of the class hierarchy; a compactable_stm
is not a snapshotable_stm
. Ended up using a dynamic cast. LMK what you think.
Force push:
|
Marking draft--will also include integration with compaction and a test in this PR. Stay tuned. |
Added: integration test and hooking into actual compaction logic. Still a draft: rewriting the test code. |
293f985
to
ab3a858
Compare
Force-push:
(Sorry for the multiple pushes--I will debounce better in the future 😅) |
- Use model::offset::max() as a default value; this means you may delete or compact up to any offset.
- Compaction now takes max_collectible_offset into account. - No effect yet, yet since it is still set to max() in the compaction config.
Code is moved as-is, except where I cheated and moved a constant up to file scope.
This is a `generator` in the computer science sense; it generates a sequence, like an iterator. The set of values this generator will output is limited to a maximum size (cardinality). The distribution of values is uniform. They are a fixed length, and the simple algorithm values memory efficiency over actual entropy. Useful for randomly generating keys that can be compacted.
For testing compaction, we want a variant of append_random_batches() which limits the total number of distinct keys used. Unlike fully-random keys, this ensures that compaction can actually do some work.
Introduce a utility for compaction testing which helps detect which offset ranges were affected by compaction. This utility makes a single pass over all of the message offsets in a given log instance, and returning the total number of offset gaps, and the min and max offsets that were missing.
Last Stable Offset (LSO) is the largest offset such that no messages, up to and including LSO, are part of an unresolved transaction. Use LSO to determine the maximum offset that may be affected by compaction. This ensures that messages for open transactions--which may be aborted or committed in the future--cannot affect compaction. For example, an aborted message cannot replace a previous message with the same key, during compaction.
Three integration tests that verify behavior of max_compactable_offset: - Randomized batch / messsage generaton with a limited number of keys. Verify that compaction only deletes messages less than this max. - Single key #1: fixed message test that checks that the boundaries of the compacted area are precisely as expected. - Single key redpanda-data#2: same as above, but with max_compactible_offset unset, we validate that compactions works as before.
Thanks @dotnwat. My next force push should address all of these, except the question about dirty offset, and maybe the one on changing For the compactable_stm interface stuff, thread is here, and the bottom of the PR description covers the change. That commit was deleted--for now at least. |
7902cf3
to
ce45069
Compare
Force-push: (compare link should show as empty since end result is the same):
|
Current CI failures are unrelated #6237.. |
good point, i didn't notice it was in this namespace! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
we should have @rystsov (the author of the feature request), and/or @mmaslankaprv approve too.
@@ -26,7 +26,6 @@ | |||
#include <seastar/core/sstring.hh> | |||
#include <seastar/util/bool_class.hh> | |||
|
|||
#include <optional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: later in the file:
std::optional<size_t> max_partition_retention_size);
@@ -36,6 +36,9 @@ struct record_batch_spec { | |||
std::optional<model::timestamp> timestamp; | |||
}; | |||
|
|||
// Makes an random alphanumeric string, encoded in an iobuf. | |||
iobuf make_iobuf(size_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it looks to me like the fully qualified name is model::test::make_iobuf
. i'd move it to random generators or bytes test helpers. can be moved later.
|
||
// If adjacent segment compaction worked in order from oldest to newest, we | ||
// could use this assert. | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems reasonable, but using a multi-line comment instead of an if 0 block would be more idiomatic in the code base. this particular case seems very clear and the if 0/endif kind of highlights the aspect of it being code that the comment is referring to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great change! just small nits
@@ -251,6 +251,7 @@ rm_stm::rm_stm( | |||
} | |||
|
|||
bool rm_stm::check_tx_permitted() { | |||
// TODO support compaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please open a github issue with a context what exactly need to be done? and put the link here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It already exists #4824
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please open a github issue with a context what exactly need to be done? and put the link here
thanks @LenaAn it is nice to have the link to the github issue in the code comment itself. it's usually a big lift to reverse index from a code comment back to a github review comment.
return s; | ||
} | ||
|
||
} // namespace random_generators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no newline at the end of the file
* minimum and maximum offsets that were missing. | ||
* This is useful for observing which offsets were affected by compaction. | ||
*/ | ||
struct log_gap_analysis { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: another useful thing would be the analysis of records read and unique keys read. For example, we produce 10000 records with 200 unique keys and consume 200 records with 200 unique keys. So we see how much we compact data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The cover letter, the commit messages and the comments make it so easy to understand.
ss::sstring gen_alphanum_string(size_t n); | ||
|
||
static constexpr size_t alphanum_max_distinct_strlen = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as i understand gen_alphanum_max_distinct
returns random string with fixed length equal to alphanum_max_distinct_strlen
; because the string length is fixed it's better to rename it to alphanum_distinct_strlen
|
||
// If adjacent segment compaction worked in order from oldest to newest, we | ||
// could use this assert. | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if false
feels like a merge-broken version check or something, agree with Noah, it's better to use a multi-line comment
Thanks for the reviews! I will address remaining nits (change preprocessor conditional to comment, etc) in followup PR coming shortly. |
Allow
rm_stm
to limit maximum offset that compaction may affect, so it can prevent unresolved transactional messages from affecting other records.Fixes #4823
Highlights:
storage:` add max_collectible_offset to compaction_config
storage/tests: add key-limited random batch generator
For testing compaction, we want a variant of append_random_batches() which limits the total number of distinct keys used. Unlike fully-random keys, this ensures that compaction can actually do some work.
s/test/util: basic log gap analysis for compaction tests
Introduce a utility for compaction testing which helps detect which offset ranges were affected by compaction. This utility makes a single pass over all of the message offsets in a given log instance, and returning the total number of offset gaps, and the min and max offsets that were missing.
storage/test: integration test for max compaction (max_collectible) offset
Three integration tests that verify desired behavior:
Randomized batch / messsage generaton with a limited number of keys. Verify that compaction only deletes messages less than this max.
Single key with max compact offset (MCO): fixed message test that checks that the boundaries of the compacted area are precisely as expected.
Single key w/o MCO: same as above, but with max_collectible_offset unset, we validate that compactions works as before.
rm_stm: use last stable offset to limit compaction
Last Stable Offset (LSO) is the largest offset such that no messages, up to and including LSO, are part of an unresolved transaction. Use LSO to determine the maximum offset that may be affected by compaction. This ensures that messages for open transactions--which may be aborted or committed in the future--cannot affect compaction. For example, an aborted message cannot replace a previous message with the same key, during compaction.
v1 PR
included these changes to support separate control for deletion/eviction versus compaction, which we agreed can be eliminated for now:
storage: stm_manager support for compactable_stmMake stm_manager aware of a new type of stm: compactable_stm. This allows STM implementations to control compaction policy without leaking details into the main compaction logic.For now, this just exposes max_compactable_offset(), which allows rm_stm to prevent unresolved transactional messages from affecting compaction. We also plan to use this plumbing for a per-message predicate which will allow compaction to query the STM, and thus handle things like aborted records properly.Release notes