Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/metadata_viewer: Refactor metadata_viewer and add transaction metadata support #5294

Merged
merged 6 commits into from
Jul 1, 2022

Conversation

bharathv
Copy link
Contributor

Cover letter

Renames metadata_viewer -> offline_log_viewer as it is not just related to metadata anymore.
Prints additional batch header metadata and adds support for transactional control markers.

@bharathv bharathv requested a review from a team as a code owner June 30, 2022 17:31
@bharathv bharathv requested review from andrewhsu and removed request for a team June 30, 2022 17:31
@bharathv
Copy link
Contributor Author

I wanted to create this PR at a later time but since this renames stuff to new paths, I'd like to get this before others actively change it. We already have a couple of PRs in flight that are looking to do it. Also I need to rebase on top of John's latest changes.

@bharathv bharathv requested review from dotnwat and rystsov June 30, 2022 17:34
* renames to offline_log_viewer since it is not limited to metadata
anymore and we can parse actual record data.
* tools/storage.py is obselete.
Includes compression type and whether the batch is
transactional/control type.
Some topics are in kafka_internal ns like
tx (transaction coordinator) and id_allocator.
Example:

INFO:viewer:{
  "header_crc": 1126531443,
  "batch_size": 90,
  "base_offset": 3,
  "type": 9,
  "crc": 1766141844,
  "attrs": 32,
  "delta": 0,
  "first_ts": 1656554802864,
  "max_ts": 1656554802864,
  "producer_id": 1002,
  "producer_epoch": 1,
  "base_seq": -1,
  "record_count": 1,
  "type_name": "tx_prepare", <====
  "expanded_attrs": {
    "compression": "none",
    "transactional": false,
    "control_batch": true,
    "timestamp_type": false
  }
}
Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of rough spots bu otherwise looks good

tools/offline_log_viewer/viewer.py Outdated Show resolved Hide resolved
tools/offline_log_viewer/viewer.py Outdated Show resolved Hide resolved
if is_tx_ctrl:
record_dict["type"] = self.get_control_record_type(
record.key)
self.results.append(records)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log might be big so we should process it in a streaming fashion without loading the entire log in the memory so yield records

Moves the logic in KafkaLog like other methods,
cleans up duplicate code.

Introduces KafkaControlRecordType

For transactional && control records we extract
the exact control record type(commit/abort).
Copy link
Contributor Author

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest force push addressed the comments.

tools/offline_log_viewer/viewer.py Outdated Show resolved Hide resolved
@bharathv bharathv requested a review from rystsov June 30, 2022 19:34
@@ -104,6 +105,36 @@ def __next__(self):
headers)


class BatchType(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good -- controller.py already has some of these magic numbers in, so maybe those could be replaced with references to this class at the same time as adding it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I did some sanity testing on my local controller log and kvstore data and things seem ok.

Switch to the enum in various other places for
readability.
@bharathv bharathv requested a review from jcsp July 1, 2022 15:53
@jcsp
Copy link
Contributor

jcsp commented Jul 1, 2022

This is definitely getting to the point that we need tests for it, but that can come later (#5315, #5293)

@bharathv bharathv merged commit 0dec535 into redpanda-data:dev Jul 1, 2022
@bharathv bharathv deleted the 3142 branch July 1, 2022 18:36
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants