Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data loss consistency violation #6019

Merged
merged 6 commits into from
Aug 20, 2022
Merged

Conversation

rystsov
Copy link
Contributor

@rystsov rystsov commented Aug 13, 2022

Cover letter

Kafka API doesn't have explicit begin txn API, when a transaction coordinator recieves first add_partition or add_group it starts a transaction. Also Redpanda defers disk flushes to the commit moment. Combinations of those thing caues a problem:

  1. a client issues add_partition to txn coordinator
  2. txn coordinator starts a transaction (this is in memory state)
  3. the client writes a message to data partition
  4. redpanda treats the write as acks=1 and acks the request before the replication is finished (it's safe to do it because on commit redpanda checks that all pending replication is done)
  5. txn coordinator & data partition experience re-election and the in memory state is lost
  6. the client issues add_group to txn coordinator
  7. txn coordinator is unaware about the ongoing txn and starts new transaction
  8. the client commits the txn and only the consumer group change is written

Since the data record was written with acks=1 just before re-election it gets lost and the client hasn't figured out that there was something wrong with transaction.

Fixes #6018

Backport Required

  • not a bug fix
  • papercut/not impactful enough to backport
  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

Release notes

Bug Fixes

  • Fix consistency violation caused by split-brain of the txn coordinator

@rystsov rystsov requested a review from a team as a code owner August 13, 2022 19:38
@rystsov rystsov requested review from andrewhsu and removed request for a team August 13, 2022 19:38
@rystsov rystsov requested review from mmaslankaprv and removed request for andrewhsu August 13, 2022 19:38
tools/offline_log_viewer/kafka.py Outdated Show resolved Hide resolved
tools/offline_log_viewer/kafka.py Outdated Show resolved Hide resolved
tools/offline_log_viewer/kafka.py Outdated Show resolved Hide resolved
src/v/cluster/tx_gateway_frontend.cc Show resolved Hide resolved
src/v/cluster/tm_stm.cc Show resolved Hide resolved
tools/offline_log_viewer/kafka.py Outdated Show resolved Hide resolved
tools/offline_log_viewer/kafka.py Outdated Show resolved Hide resolved
tools/offline_log_viewer/kafka.py Outdated Show resolved Hide resolved
tools/offline_log_viewer/kafka.py Outdated Show resolved Hide resolved
src/v/cluster/tx_gateway_frontend.cc Show resolved Hide resolved
src/v/cluster/tx_gateway_frontend.cc Show resolved Hide resolved
@rystsov
Copy link
Contributor Author

rystsov commented Aug 19, 2022

PartitionBalancerTest.test_full_nodes - #5884

dotnwat
dotnwat previously approved these changes Aug 19, 2022
src/v/cluster/tx_gateway_frontend.cc Outdated Show resolved Hide resolved
src/v/cluster/tx_gateway_frontend.cc Outdated Show resolved Hide resolved
tools/offline_log_viewer/storage.py Show resolved Hide resolved
tools/offline_log_viewer/storage.py Show resolved Hide resolved
help='Path to the log desired to be analyzed')
parser.add_argument(
'--path',
type=str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you'd like to integrate more tightly with argument parsing, you can change type = str to type = validate_path and validate_path to:

def validate_path(path):
    if not os.path.exists(path):
        raise ArgumentTypeError(f"Path doesn't exist {path}")
    controller = join(path, "redpanda", "controller")
    if not os.path.exists(controller):
        raise ArgumentTypeError(f"Each redpanda data dir should have controller piece but {controller} isn't found")
    return path

tools/offline_log_viewer/storage.py Show resolved Hide resolved
tools/offline_log_viewer/tx_coordinator.py Show resolved Hide resolved
Kafka API doesn't have explicit begin txn API, when a transaction
coordinator recieves first add_partition or add_group it starts a
transaction. Also Redpanda defers disk flushes to the commit moment.
Combinations of those thing caues a problem:

  1. a client issues add_partition to txn coordinator
  2. txn coordinator starts a transaction (this is in memory state)
  3. the client writes a message to data partition
  4. redpanda treats the write as acks=1 and acks the request before
     the replication is finished (it's safe to do it because on commit
     redpanda checks that all pending replication is done)
  5. txn coordinator & data partition experience re-election and the
     in memory state is lost
  6. the client issues add_group to txn coordinator
  7. txn coordinator is unaware about the ongoing txn and starts new
     transaction
  8. the client commits the txn and only the consumer group change is
     written

Since the data record was written with acks=1 just before re-election
it gets lost and the client hasn't figured out that there was some-
thing wrong with transaction.

The fix is to write to the txn coordinator's log when a transaction
starts; in this case when a crush-induced re-election happens new txn
coordinator has an opportunity to detect an ongoing txn and fail it.
the checks didn't handle well empty dirs
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new changes lgtm.

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎾 🎾 🎾 🎾 🎾 🎾 🎾 🎾 🎾

@rystsov
Copy link
Contributor Author

rystsov commented Aug 20, 2022

SIPartitionMovementTest.test_shadow_indexing - #4702

@rystsov rystsov merged commit b9834fb into redpanda-data:dev Aug 20, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistency issue: a confirmed written record is lost
3 participants