Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v22.1.x] Fix duplicates consistency error by caching already translated offsets #5453

Merged
merged 12 commits into from
Jul 14, 2022

Conversation

rystsov
Copy link
Contributor

@rystsov rystsov commented Jul 13, 2022

Backport from pull request #5356, #5220

fixes #5452

@rystsov
Copy link
Contributor Author

rystsov commented Jul 13, 2022

There were multiple conflicts on cherry-picking

@rystsov
Copy link
Contributor Author

rystsov commented Jul 13, 2022

Had to omit cherry-picking upgrade testing because the upgrade testing dependencies are not backported

@mmedenjak mmedenjak added kind/bug Something isn't working area/kafka consistency kind/backport PRs targeting a stable branch labels Jul 13, 2022
VadimPlh and others added 4 commits July 13, 2022 11:54
(cherry picked from commit 7862d4b)

Conflicts:
	src/v/cluster/controller.cc
	src/v/cluster/controller.h
	src/v/redpanda/application.cc
Kafka client doesn't process unknown_server_error correctly and it may
lead to duplicates violating the idempotency. See the following issue
for more info: https://issues.apache.org/jira/browse/KAFKA-14034

request_timed_out just like unknown_server_error means that the true
outcome of the operation is unknown and unlike unknown_server_error it
doesn't cause the problem so switching to using it to avoid the problem

(cherry picked from commit 1a72446)
(cherry picked from commit 1dfd8d9)
Update all partition::replicate dependees which don't perform offset
translation to bypass it via a direct raft reference

(cherry picked from commit 67a3112)

Conflicts:
	src/v/cluster/partition.h
	src/v/cluster/partition_probe.cc
	src/v/kafka/server/group.cc
@rystsov
Copy link
Contributor Author

rystsov commented Jul 13, 2022

The sketchiest change is in feature_manages: it uses logical version which should be consistent with releases so if a feature (in this case rm_stm_kafka_cache) has one version in 22.1.5 and different version in 22.2.1 it will cause a problem. But rm_stm_kafka_cache was added after serde_raft_0 which isn't a part of this backport so it can't have the same version. After this PR is merged I'll update the dev to have consistent logical versions.

Updating consumer groups to use conditional replication to prevent a
situation when after a check a leadership jumps away, invalidates the
check, jumps back just in time for the post check replication.

check condition
  leadership goes to a new node
  the node replicates something which invalidates the conditions
  the leadership jumps back
the node successfully replicates assuming that the condition is true

Switched to a conditional replicate to fix the problem. When a group
manager detects a leadership change it replays the group's records to
reconstruct the groups state. We cache the current term in the state
and use it as a condition on replicate. In this case we know that if
the leadership bounce the replication won't pass.

(cherry picked from commit e693bea)

Conflicts:
	src/v/kafka/server/group.cc
We're going to mix raft and kafka offset in the same class, since
both the offsets uses the same type it's easy to make an error and
treat one as it was another. Introducing kafka offset to rely on the
type system to prevent such errors.

(cherry picked from commit 9335075)

Conflicts:
	src/v/cluster/types.h
Shifting offset translation down the abstraction well to eventually
reach rm_stm

(cherry picked from commit e3d24d9)
Preparing rm_stm to use kafka::offset based seq-offset cache. Right
now it uses raft offsets but there is a problem with it: once the
cache items become older that the head of the log (eviction) panda
becomes unable to use offset translation so we need to store already
translated offsets.

Since the cache is persisted as a part the snapshot so we need to
change the disk format and provide backward compatibility. The change
is splitted into two commits. Current commit introduces types to
represent old format seq_cache_entry_v1 and tx_snapshot_v1 and adds
compatibility machinary to convert old snapshot (tx_snapshot_v1) to new
snapshot (tx_snapshot).

The follow up commit updates the default types to use new format and
updates the mapping between old and default types.

(cherry picked from commit 63c5883)

Conflicts:
	src/v/cluster/rm_stm.cc
switching to caching seq-kafka offsets cache to avoid out of
range errors on translating offsets beyond the eviction point

(cherry picked from commit 4b42c7e)
(cherry picked from commit 065fb54)
(cherry picked from commit 8e7346d)

Conflicts:
	src/v/cluster/feature_table.cc
	src/v/cluster/feature_table.h
	tests/rptest/tests/cluster_features_test.py
@bharathv
Copy link
Contributor

bharathv commented Jul 13, 2022

The sketchiest change is in feature_manages: it uses logical version which should be consistent with releases so if a feature (in this case rm_stm_kafka_cache) has one version in 22.1.5 and different version in 22.2.1 it will cause a problem. But rm_stm_kafka_cache was added after serde_raft_0 which isn't a part of this backport so it can't have the same version. After this PR is merged I'll update the dev to have consistent logical versions.

Looks risky but I guess we don't have an option here. What you are suggesting makes sense to me. I skimmed through the feature tables on the branches (21.1.x, 22.1.x, dev) and the only conflict seems to be on dev where we need to bump serde_raft_0 and replace the existing serde_raft_0 entry with rm_stm_kafka_cache. Would be amazing to verify the upgrade combinations with an upgrade test though (I hope it is backported).

Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

I'm not sure what the policy is around backporting new features but in this case this is a 'bugfix' which needs a serialization change and this is blocking 2.1.5 release.

@jcsp You might be interested in this given it's implications on the feature table in the dev branch. Just FYI.

@rystsov rystsov merged commit 042089c into redpanda-data:v22.1.x Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kafka area/redpanda kind/backport PRs targeting a stable branch kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants