Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow rm_stm to control max compaction offset #5318

Merged
merged 15 commits into from
Sep 6, 2022

Conversation

ajfabbri
Copy link
Contributor

@ajfabbri ajfabbri commented Jul 2, 2022

Allow rm_stm to limit maximum offset that compaction may affect, so it can prevent unresolved transactional messages from affecting other records.

Fixes #4823

Highlights:

storage:` add max_collectible_offset to compaction_config

  • Use model::offset::max() as a default value; this means you may compact up to any offset.
  • Compaction uses max_collectible_offset, which is set in latter patch in this PR.

storage/tests: add key-limited random batch generator

For testing compaction, we want a variant of append_random_batches() which limits the total number of distinct keys used. Unlike fully-random keys, this ensures that compaction can actually do some work.

s/test/util: basic log gap analysis for compaction tests

Introduce a utility for compaction testing which helps detect which offset ranges were affected by compaction. This utility makes a single pass over all of the message offsets in a given log instance, and returning the total number of offset gaps, and the min and max offsets that were missing.

storage/test: integration test for max compaction (max_collectible) offset

Three integration tests that verify desired behavior:

  • Randomized batch / messsage generaton with a limited number of keys. Verify that compaction only deletes messages less than this max.

  • Single key with max compact offset (MCO): fixed message test that checks that the boundaries of the compacted area are precisely as expected.

  • Single key w/o MCO: same as above, but with max_collectible_offset unset, we validate that compactions works as before.

rm_stm: use last stable offset to limit compaction

Last Stable Offset (LSO) is the largest offset such that no messages, up to and including LSO, are part of an unresolved transaction. Use LSO to determine the maximum offset that may be affected by compaction. This ensures that messages for open transactions--which may be aborted or committed in the future--cannot affect compaction. For example, an aborted message cannot replace a previous message with the same key, during compaction.

v1 PR

included these changes to support separate control for deletion/eviction versus compaction, which we agreed can be eliminated for now:
storage: stm_manager support for compactable_stm
Make stm_manager aware of a new type of stm: compactable_stm. This allows STM implementations to control compaction policy without leaking details into the main compaction logic.

For now, this just exposes max_compactable_offset(), which allows rm_stm to prevent unresolved transactional messages from affecting compaction. We also plan to use this plumbing for a per-message predicate which will allow compaction to query the STM, and thus handle things like aborted records properly.

Release notes

  • none

src/v/storage/segment.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ajfabbri ajfabbri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Still needs hook into compaction, and an integration test; to validate the compaction offset limiting works as expected.

src/v/storage/types.h Outdated Show resolved Hide resolved
src/v/storage/compaction_reducers.cc Show resolved Hide resolved
src/v/storage/compaction_reducers.cc Outdated Show resolved Hide resolved
@@ -122,6 +122,7 @@ log_manager::housekeeping_scan(model::timestamp collection_threshold) {
co_await current_log.handle.compact(compaction_config(
collection_threshold,
_config.retention_bytes(),
current_log.handle.stm_manager()->max_compactible_offset(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it kind of seems like rm_stm should update storage layer (as opposed to storage querying rm_stm). rm_stm is in a position to know when an offset changes, and having information flow downward there is less complex coupling. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaslankaprv suggested using polling, it seems like a more straightforward solution, also in the next part of the compaction /transaction integration the compaction logic will need to query if a given record is a part of the aborted or committed transaction. using an object to query instead of passing a list of aborted records gives more space for memory optimization

Copy link
Member

@dotnwat dotnwat Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaslankaprv suggested using polling, it seems like a more straightforward solution

I'm not sure I agree it is more straight forward when we consider the lifetime of all the new objects involved, namely that the callback to storage is going to have a captured reference back to rm_stm or some other thing. It's not a serious issue and shouldn't block work on a fix.

I do agree for querying and memory optimization of aborted records it doesn't make sense to push that down. It seems like this suggests that either part of transaction management should be in storage or some of compaction should be lifted out. I think it's worth discussing this sort of thing even if we go with something quick so we stay aware of how much tech debt we might be accumulating.

Copy link
Contributor

@rystsov rystsov Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the inverted dependencies are annoying and it's a good idea to lift compaction logic above raft and let stm directly control it but it's out of scope of the compaction / transactions integration and we can't condition transactions release on the fundamental refactoring of the compaction logic.

Right now we compact adjacent segments but when the transactions are involved in order to decide if a record b overwrites a record a (even when the records are from the same segment) we may need to look ahead for a unbounded amount of segments for search for corresponding abort or commit txn markers. So we either need to change compaction from working with adjacent segments to work with unbounded segments or to rely on an external oracle who happens to know future to guide the compaction logic. The oracle requires less changes.

But there is a silver lining - we are not increasing the tech debt balance. The storage layer already depends on stm_manager which contains a reference to rm_stm so things are not getting worse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i agree with Noah here, it would be cleaner to reverse the dependency, but i also got Denis'es point about already existing stm_manager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I was thinking similar things as I wrote this PR; would like to chat about layering here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is a silver lining - we are not increasing the tech debt balance. The storage layer already depends on stm_manager which contains a reference to rm_stm so things are not getting worse.

thanks for mentioning this observation

Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I suggest to postpone merging before we start using max_compactable_offset to actually limit compaction to get the whole picture of the change.

src/v/storage/types.h Outdated Show resolved Hide resolved
src/v/storage/types.h Outdated Show resolved Hide resolved
@@ -340,11 +340,14 @@ struct compaction_config {
explicit compaction_config(
model::timestamp upper,
std::optional<size_t> max_bytes_in_log,
std::optional<model::offset> max_compact_offset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already use model::offset::max() as an unset value (in stm_manager) so it's better to stick with it and not to introduce optional. making a type optional adds another dimension to the value space which we should keep in mind when we use the type.

Copy link
Contributor Author

@ajfabbri ajfabbri Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular API, compaction_config uses optional, so I thought it was more consistent. I will change it to ...max()..

Copy link
Member

@dotnwat dotnwat Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already use model::offset::max()

the intention of default initialization model::offset{} (aka ::min) is to indicate an unset state. unless using ::max is convenient because of a calculation that is based on min(a, b) then it is probably worth sticking to default initialization? I think that is what most of storage:: uses?

Copy link
Contributor Author

@ajfabbri ajfabbri Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 you guys are making an advertisement for std::optional. Joking aside, stm_manager does depend on being able to use min() to reduce multiple values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I am pro-std::optional and very much would like to see model::offset{} as an unset value go away. But in the mean time I guess we should be consistent :(

src/v/storage/types.h Outdated Show resolved Hide resolved
std::vector<ss::shared_ptr<snapshotable_stm>> _stms;
ss::shared_ptr<compactable_stm> _comp_stm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a special case for this stm type ?, would it be enough to use max_collectible_offset as a compaction boundary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of my first ideas. @rystsov indicated that this compaction codepath should be distinct from the snapshot/eviction code. We should chat about the definition of "max collectible" and all the use cases.

Copy link
Contributor

@rystsov rystsov Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking that just setting max_collectible_offset to the lso isn't enough because it isn't used in compaction. We may update the compaction layer and use max_collectible_offset to postpone the compaction but the current approach has merit too e.g. we may use max_compactable_offset & max_collectible_offset to make sure we never do log eviction before we compact. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts?

I'm not too sure what you are proposing/asking, @rystsov.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, he's saying the value of having them separate is that it allows stm to control compaction and eviction separately, and asking if we think it is worthwhile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would vote for using max_collectible_offset in compaction, there is no point in compacting a log if it is about to be delete either way, having a special case for compaction stm make the whole thing more complicated

Copy link
Contributor Author

@ajfabbri ajfabbri Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I'd love to get this merged, I tend to agree with @mmaslankaprv. At least this is what I wanted to try first, because we can always add separate compaction control in the future if needed. I didn't see a need for this currently, but maybe in the future we introduce a new compaction algorithm or something that needs it?

Unless there are any objections, I'll whip up a new version which eliminates the separate compactable offset plumbing, so we can see the difference in approaches.

src/v/storage/types.h Outdated Show resolved Hide resolved
template<
typename T,
typename = std::enable_if_t<std::is_base_of_v<snapshotable_stm, T>>>
void add_stm(ss::shared_ptr<T> stm) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a clearer api if you added an add_compactable_stm interface since we already have a run-time requirement that two compatible stms cannot be added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up not doing this (separately-named method) because of the awkwardness of the class hierarchy; a compactable_stm is not a snapshotable_stm. Ended up using a dynamic cast. LMK what you think.

src/v/storage/types.h Outdated Show resolved Hide resolved
src/v/storage/types.h Outdated Show resolved Hide resolved
@ajfabbri
Copy link
Contributor Author

ajfabbri commented Jul 6, 2022

Force push:

  • Remove template for add_stm(). Could also add separate method add_compactable_stm(), but that either requires a dynamic cast (compactable_stm is not a snapshotable_stm), or calling both add_stm() and add_compactable_stm() in the case of an instance of rm_stm.
  • Use offset::max() instead of optional/nullopt
  • Add comment to clarify >= for max collectable offset

@ajfabbri ajfabbri marked this pull request as draft July 6, 2022 05:42
@ajfabbri
Copy link
Contributor Author

ajfabbri commented Jul 6, 2022

Looks good but I suggest to postpone merging before we start using max_compactable_offset to actually limit compaction to get the whole picture of the change.

Marking draft--will also include integration with compaction and a test in this PR. Stay tuned.

@ajfabbri
Copy link
Contributor Author

ajfabbri commented Jul 7, 2022

Added: integration test and hooking into actual compaction logic. Still a draft: rewriting the test code.

@ajfabbri ajfabbri force-pushed the transpaction1 branch 5 times, most recently from 293f985 to ab3a858 Compare July 8, 2022 19:17
@ajfabbri
Copy link
Contributor Author

Force-push:

  • Remove redundant inline
  • Remove two reference default lambda captures, make explicit.
  • Rebase on dev.
  • Fix formatting.

(Sorry for the multiple pushes--I will debounce better in the future 😅)

Aaron Fabbri added 12 commits August 25, 2022 17:51
- Use model::offset::max() as a default value; this means you may
  delete or compact up to any offset.
- Compaction now takes max_collectible_offset into account.
- No effect yet, yet since it is still set to max() in the
  compaction config.
Code is moved as-is, except where I cheated and moved a constant up to
file scope.
This is a `generator` in the computer science sense; it generates a
sequence, like an iterator. The set of values this generator will output
is limited to a maximum size (cardinality). The distribution of values
is uniform. They are a fixed length, and the simple algorithm values
memory efficiency over actual entropy.

Useful for randomly generating keys that can be compacted.
For testing compaction, we want a variant of append_random_batches()
which limits the total number of distinct keys used. Unlike fully-random
keys, this ensures that compaction can actually do some work.
Introduce a utility for compaction testing which helps detect which
offset ranges were affected by compaction. This utility makes a single
pass over all of the message offsets in a given log instance,
and returning the total number of offset gaps, and the min and max
offsets that were missing.
Last Stable Offset (LSO) is the largest offset such that no messages, up
to and including LSO, are part of an unresolved transaction.

Use LSO to determine the maximum offset that may be affected by
compaction. This ensures that messages for open transactions--which may
be aborted or committed in the future--cannot affect compaction. For
example, an aborted message cannot replace a previous message with the
same key, during compaction.
Three integration tests that verify behavior of max_compactable_offset:

- Randomized batch / messsage generaton with a limited number of keys.
  Verify that compaction only deletes messages less than this max.

- Single key #1: fixed message test that checks that the boundaries of the
  compacted area are precisely as expected.

- Single key redpanda-data#2: same as above, but with max_compactible_offset unset,
  we validate that compactions works as before.
@ajfabbri
Copy link
Contributor Author

ajfabbri commented Aug 26, 2022

nice stuff. this generally looks good to me.

  1. i think one of the earlier commits might not compile
  2. added a few minor nits i saw along the way

I was surprised to see the dynamic_cast / interface mixin disappear. I looked back through some of the feedback but didn't see that discussion. Do you have a pointer or could explain?

Thanks @dotnwat. My next force push should address all of these, except the question about dirty offset, and maybe the one on changing random_generators::make_iobuf() to be random_generators::make_random_iobuf()? I'm in the habit of de-duping redundant words in the full name, but let me know if you want that.

For the compactable_stm interface stuff, thread is here, and the bottom of the PR description covers the change. That commit was deleted--for now at least.

@ajfabbri
Copy link
Contributor Author

ajfabbri commented Aug 26, 2022

Force-push: (compare link should show as empty since end result is the same):

  • Split compaction config patch into multiple commits: 1. add max offset field to config, 2. change compaction to honor the config, 3. Remove unused include. 4. Comments
  • Fix bisect-ability for 1 above (chunk was misplaced in later commit when rebasing).

@ajfabbri
Copy link
Contributor Author

ajfabbri commented Aug 26, 2022

Current CI failures are unrelated #6237..

@dotnwat
Copy link
Member

dotnwat commented Aug 29, 2022

random_generators::make_random_iobuf()

good point, i didn't notice it was in this namespace!

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

we should have @rystsov (the author of the feature request), and/or @mmaslankaprv approve too.

@@ -26,7 +26,6 @@
#include <seastar/core/sstring.hh>
#include <seastar/util/bool_class.hh>

#include <optional>
Copy link
Member

@dotnwat dotnwat Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: later in the file:

      std::optional<size_t> max_partition_retention_size);

@@ -36,6 +36,9 @@ struct record_batch_spec {
std::optional<model::timestamp> timestamp;
};

// Makes an random alphanumeric string, encoded in an iobuf.
iobuf make_iobuf(size_t);
Copy link
Member

@dotnwat dotnwat Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it looks to me like the fully qualified name is model::test::make_iobuf. i'd move it to random generators or bytes test helpers. can be moved later.


// If adjacent segment compaction worked in order from oldest to newest, we
// could use this assert.
#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems reasonable, but using a multi-line comment instead of an if 0 block would be more idiomatic in the code base. this particular case seems very clear and the if 0/endif kind of highlights the aspect of it being code that the comment is referring to

Copy link
Contributor

@LenaAn LenaAn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great change! just small nits

@@ -251,6 +251,7 @@ rm_stm::rm_stm(
}

bool rm_stm::check_tx_permitted() {
// TODO support compaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please open a github issue with a context what exactly need to be done? and put the link here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already exists #4824

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please open a github issue with a context what exactly need to be done? and put the link here

thanks @LenaAn it is nice to have the link to the github issue in the code comment itself. it's usually a big lift to reverse index from a code comment back to a github review comment.

return s;
}

} // namespace random_generators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no newline at the end of the file

* minimum and maximum offsets that were missing.
* This is useful for observing which offsets were affected by compaction.
*/
struct log_gap_analysis {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: another useful thing would be the analysis of records read and unique keys read. For example, we produce 10000 records with 200 unique keys and consume 200 records with 200 unique keys. So we see how much we compact data.

Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The cover letter, the commit messages and the comments make it so easy to understand.

ss::sstring gen_alphanum_string(size_t n);

static constexpr size_t alphanum_max_distinct_strlen = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as i understand gen_alphanum_max_distinct returns random string with fixed length equal to alphanum_max_distinct_strlen; because the string length is fixed it's better to rename it to alphanum_distinct_strlen


// If adjacent segment compaction worked in order from oldest to newest, we
// could use this assert.
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if false feels like a merge-broken version check or something, agree with Noah, it's better to use a multi-line comment

@ajfabbri
Copy link
Contributor Author

ajfabbri commented Sep 6, 2022

Thanks for the reviews! I will address remaining nits (change preprocessor conditional to comment, etc) in followup PR coming shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permit compaction do its work only before a provided offset
8 participants