Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v22.1.x] Fix duplicates consistency error by caching already translated offsets #5453

Merged
merged 12 commits into from
Jul 14, 2022

Commits on Jul 13, 2022

  1. cluster: move feature_table outside controller

    (cherry picked from commit 7862d4b)
    
    Conflicts:
    	src/v/cluster/controller.cc
    	src/v/cluster/controller.h
    	src/v/redpanda/application.cc
    VadimPlh authored and rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    911fe49 View commit details
    Browse the repository at this point in the history
  2. k/produce: do not use unknown_server_error

    Kafka client doesn't process unknown_server_error correctly and it may
    lead to duplicates violating the idempotency. See the following issue
    for more info: https://issues.apache.org/jira/browse/KAFKA-14034
    
    request_timed_out just like unknown_server_error means that the true
    outcome of the operation is unknown and unlike unknown_server_error it
    doesn't cause the problem so switching to using it to avoid the problem
    
    (cherry picked from commit 1a72446)
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    b78ba3f View commit details
    Browse the repository at this point in the history
  3. cluster: remove dead code

    (cherry picked from commit 1dfd8d9)
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    c803886 View commit details
    Browse the repository at this point in the history
  4. cluster: prepare partition for translating offset

    Update all partition::replicate dependees which don't perform offset
    translation to bypass it via a direct raft reference
    
    (cherry picked from commit 67a3112)
    
    Conflicts:
    	src/v/cluster/partition.h
    	src/v/cluster/partition_probe.cc
    	src/v/kafka/server/group.cc
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    fc25ff5 View commit details
    Browse the repository at this point in the history
  5. k/group: avoid ABA problem

    Updating consumer groups to use conditional replication to prevent a
    situation when after a check a leadership jumps away, invalidates the
    check, jumps back just in time for the post check replication.
    
    check condition
      leadership goes to a new node
      the node replicates something which invalidates the conditions
      the leadership jumps back
    the node successfully replicates assuming that the condition is true
    
    Switched to a conditional replicate to fix the problem. When a group
    manager detects a leadership change it replays the group's records to
    reconstruct the groups state. We cache the current term in the state
    and use it as a condition on replicate. In this case we know that if
    the leadership bounce the replication won't pass.
    
    (cherry picked from commit e693bea)
    
    Conflicts:
    	src/v/kafka/server/group.cc
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    394c144 View commit details
    Browse the repository at this point in the history
  6. c/types: introduce kafka offset types

    We're going to mix raft and kafka offset in the same class, since
    both the offsets uses the same type it's easy to make an error and
    treat one as it was another. Introducing kafka offset to rely on the
    type system to prevent such errors.
    
    (cherry picked from commit 9335075)
    
    Conflicts:
    	src/v/cluster/types.h
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    8138d07 View commit details
    Browse the repository at this point in the history
  7. cluster: shift offset translation to partition

    Shifting offset translation down the abstraction well to eventually
    reach rm_stm
    
    (cherry picked from commit e3d24d9)
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    3594b59 View commit details
    Browse the repository at this point in the history
  8. rm_stm: prepare to use kafka::offset based cache

    Preparing rm_stm to use kafka::offset based seq-offset cache. Right
    now it uses raft offsets but there is a problem with it: once the
    cache items become older that the head of the log (eviction) panda
    becomes unable to use offset translation so we need to store already
    translated offsets.
    
    Since the cache is persisted as a part the snapshot so we need to
    change the disk format and provide backward compatibility. The change
    is splitted into two commits. Current commit introduces types to
    represent old format seq_cache_entry_v1 and tx_snapshot_v1 and adds
    compatibility machinary to convert old snapshot (tx_snapshot_v1) to new
    snapshot (tx_snapshot).
    
    The follow up commit updates the default types to use new format and
    updates the mapping between old and default types.
    
    (cherry picked from commit 63c5883)
    
    Conflicts:
    	src/v/cluster/rm_stm.cc
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    d0ef36e View commit details
    Browse the repository at this point in the history
  9. rm_stm: shift offset translation to rm_stm

    switching to caching seq-kafka offsets cache to avoid out of
    range errors on translating offsets beyond the eviction point
    
    (cherry picked from commit 4b42c7e)
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    b7d1f6e View commit details
    Browse the repository at this point in the history
  10. rm_stm: remove dead code

    (cherry picked from commit 065fb54)
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    2161b46 View commit details
    Browse the repository at this point in the history
  11. rm_stm: add feature_table as a dependency

    (cherry picked from commit 9000762)
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    5a95cb2 View commit details
    Browse the repository at this point in the history
  12. rm_stm: put kafka offset cache behind feature manager

    (cherry picked from commit 8e7346d)
    
    Conflicts:
    	src/v/cluster/feature_table.cc
    	src/v/cluster/feature_table.h
    	tests/rptest/tests/cluster_features_test.py
    rystsov committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    f7a13ff View commit details
    Browse the repository at this point in the history