Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make idempotency compatible with sarama #3824

Merged
merged 3 commits into from
Mar 4, 2022

Conversation

rystsov
Copy link
Contributor

@rystsov rystsov commented Feb 17, 2022

Cover letter

By default Redpanda is configured to use message.timestamp.type=CreateTime but Sarama doesn't set timestamp correctly and it defaults to -1. Redpanda immediately performs garbage collection (-1 is always past transactional_id_expiration) and removes sequential numbers before the next requests reach the broker.

As a result Redpanda returns an "out of order sequence" error.

Adding seq_table_min_size configuration to prevent gc before the size of the sequential numbers (sessions) reaches the threshold.

Release notes

  • make idempotency compatible with Sarama

@rystsov rystsov linked an issue Feb 17, 2022 that may be closed by this pull request
@twmb
Copy link
Contributor

twmb commented Feb 18, 2022

In what field are you seeing sarama not set the timestamp correctly? Sarama internally sets the timestamp if it is currently unset.

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks fine but I think there is some information missing about the solution. the fix is identified as being a fix for sarama compatibility, but the change affects all users. do sarama users need to change the configuration?

the size of 1000 doesn't seem problematic, but it does seem specific. would any value work? it's not clear what part of the solution is specific to sarama.

@c4milo
Copy link
Member

c4milo commented Feb 23, 2022

If it is a Sarama issue, why don’t we just send a patch to Sarama? It doesn’t feel it would be sustainable to work around clients’ limitations by patching RP. I would go the other way around if possible.

@emaxerrno
Copy link
Contributor

If it is a Sarama issue, why don’t we just send a patch to Sarama? It doesn’t feel it would be sustainable to work around clients’ limitations by patching RP. I would go the other way around if possible.

code looks like is a generic fix that denis found by running sarama, but generally aapplicable to those missing timestamps in txns. (i think)

@twmb
Copy link
Contributor

twmb commented Feb 23, 2022

Sarama doesn't set the MaxTimestamp field in record batches, it only sets FirstTimestamp and per-record TimestampDelta. The MaxTimestamp is -1 always.

Looking at Kafka source, I'm not sure what max timestamp is used for, but it doesn't look important for the most part (but, it's pretty difficult to skim the source sometimes [edit: per the protocol, "This is used by the broker to ensure the correct behavior even when Records within the batch are compacted out.]"). Also that nobody using sarama has encountered problems anywhere else leads me to think it's not the most important field.

src/v/model/adl_serde.h Outdated Show resolved Hide resolved
src/v/cluster/rm_stm.cc Show resolved Hide resolved
@dotnwat
Copy link
Member

dotnwat commented Mar 1, 2022

Looking at Kafka source, I'm not sure what max timestamp is used for, but it doesn't look important for the most part

@twmb yeh it's a little hard to tell. i do see that it seems to be available through a maxTimestamp method on the record batch, and is accessed 9 times in Log.scala, LogSegment.scala, ProducerStateManager.scala will probably take some time to look at all them.

Sarama doesn't set max timestamp and it defaults to -1. So when Redpanda uses
it to control the lifetime of the sessions it immidiatly garbage collects them.
Changing Redpanda to use first timestamp property of the batch instead of the
max timestamp.
Redpanda depends on server time to decide if it's safe to remove
idempotency sessions from the seq table cache. When the server
time is misconfigured it may lead to the availability issues: a
session is being removed while it's still active.

Adding a seq_table_min_size config option to set a threshold of
minimum active sessions which prevents them from being removed
even if they're expired.
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks ok to me. should we add a ducktape test. we have sarama available in ducktape context.

@rystsov
Copy link
Contributor Author

rystsov commented Mar 3, 2022

code looks ok to me. should we add a ducktape test. we have sarama available in ducktape context.

@dotnwat done it

@rystsov rystsov requested a review from dotnwat March 3, 2022 05:50
@rystsov rystsov merged commit 3487bac into redpanda-data:dev Mar 4, 2022
graphcareful pushed a commit that referenced this pull request Apr 20, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Idempotency work with Sarama
5 participants