-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools/metadata_viewer: Refactor metadata_viewer and add transaction metadata support #5294
Conversation
I wanted to create this PR at a later time but since this renames stuff to new paths, I'd like to get this before others actively change it. We already have a couple of PRs in flight that are looking to do it. Also I need to rebase on top of John's latest changes. |
* renames to offline_log_viewer since it is not limited to metadata anymore and we can parse actual record data. * tools/storage.py is obselete.
Includes compression type and whether the batch is transactional/control type.
Some topics are in kafka_internal ns like tx (transaction coordinator) and id_allocator.
Example: INFO:viewer:{ "header_crc": 1126531443, "batch_size": 90, "base_offset": 3, "type": 9, "crc": 1766141844, "attrs": 32, "delta": 0, "first_ts": 1656554802864, "max_ts": 1656554802864, "producer_id": 1002, "producer_epoch": 1, "base_seq": -1, "record_count": 1, "type_name": "tx_prepare", <==== "expanded_attrs": { "compression": "none", "transactional": false, "control_batch": true, "timestamp_type": false } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of rough spots bu otherwise looks good
tools/offline_log_viewer/kafka.py
Outdated
if is_tx_ctrl: | ||
record_dict["type"] = self.get_control_record_type( | ||
record.key) | ||
self.results.append(records) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log might be big so we should process it in a streaming fashion without loading the entire log in the memory so yield records
Moves the logic in KafkaLog like other methods, cleans up duplicate code. Introduces KafkaControlRecordType For transactional && control records we extract the exact control record type(commit/abort).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest force push addressed the comments.
@@ -104,6 +105,36 @@ def __next__(self): | |||
headers) | |||
|
|||
|
|||
class BatchType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good -- controller.py already has some of these magic numbers in, so maybe those could be replaced with references to this class at the same time as adding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I did some sanity testing on my local controller log and kvstore data and things seem ok.
Switch to the enum in various other places for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Cover letter
Renames metadata_viewer -> offline_log_viewer as it is not just related to metadata anymore.
Prints additional batch header metadata and adds support for transactional control markers.