storage: checkpoint offset_translator more frequently when using many tiny partitions #4600

jcsp · 2022-05-06T10:34:38Z

Currently offset_translator::maybe_checkpoint uses a fixed 64MiB threshold to decide whether to write a checkpoint.

If I have e.g. 40k partitions on a node with 2TB of storage, the most any partition ever stores is 50MB, so the threshold is never reached. This results in a large amount of read IO on restart: currently ManyPartitionsTest has to wait as long for node startup as it spent writing the data in to begin with (many minutes).

We could make this configurable, similar to the falloc step size.
We could use our knowledge of the disk size and partition count to dynamically select a size that makes sure we are never reading more than a certain fraction of the disk size on startup.
We could keep a global count of the number of un-checkpointed bytes across all partitions, and trigger checkpoints based on that -- this would be the most direct way of bounding the amount of data that redpanda has to replay on startup, at the cost of more coordination.

It may be that the solution to this can also be used to drive a dynamic falloc step size (this has a similar issue where the default 32MiB threshold doesn't make much sense for systems with huge partition counts).

jcsp added kind/enhance New feature or request area/storage labels May 6, 2022

jcsp self-assigned this May 6, 2022

jcsp mentioned this issue May 10, 2022

storage: Compaction index uses hardcoded 512kB memory limit #4645

Closed

jcsp closed this as completed Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: checkpoint offset_translator more frequently when using many tiny partitions #4600

storage: checkpoint offset_translator more frequently when using many tiny partitions #4600

jcsp commented May 6, 2022 •

edited

Loading

storage: checkpoint offset_translator more frequently when using many tiny partitions #4600

storage: checkpoint offset_translator more frequently when using many tiny partitions #4600

Comments

jcsp commented May 6, 2022 • edited Loading

jcsp commented May 6, 2022 •

edited

Loading