-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: update offline_log_viewer and use in ducktape tests #5315
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
e168164
tools: write offline_log_viewer kvstore/controller output to stdout
jcsp 9f00c02
tools: make offline_log_viewer aware of clean_segment kvstore key
jcsp 963fc3d
tools: offline_log_viewer don't replay pre-snapshot kvstore messages
jcsp 7f666f7
tools/metadata_viewer: handle kvstore deletions
jcsp b492385
tools: handle short header read in offline_log_viewer
jcsp 37b76a1
tools/offline_log_viewer: added decode_adl_or_serde to dispatch decod…
andijcr 18328b1
tools/offline_log_viewer: fixed variable used to read BatchType (was …
andijcr 4bbd0c5
tools/offline_log_viewer: added peek_int8, read_serde_enum
andijcr 021317a
tools/offline_log_viewer: added serde decoding for topic_commands
andijcr ce43d16
tools/offline_log_viewer: added decode_user_command for serde
andijcr 8b924de
tools/offline_log_viewer: added decode_acl_command for serde
andijcr 29f9e7e
tools/offline_log_viewer: added decode_config_command for serde for c…
andijcr 98e4df6
tools/offline_log_viewer: decode_feature_command, decode_node_managem…
andijcr 8fe0654
tools/offline_log_viewer: added method to get number of unprocessed b…
andijcr ba26cd7
tools/offline_log_viewer: added check that all the bytes of a record …
andijcr d8d85c4
tools/offline_log_viewer: support v=4 group_configuration
jcsp a9ab9fa
tools: fix kvstore reads in offline_log_viewer
jcsp 8557ac4
tests: include offline_log_viewer in Dockerfile
jcsp a7e6983
tests: add clients.offline_log_viewer
jcsp fac7d51
tests: validate kvstore cleanup in topic_delete_test
jcsp 1813135
tests: check controller log decode after multi-version upgrade
jcsp 189e475
tools: disable broken controller command decode
jcsp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,5 @@ | |
!tests/docker/ssh | ||
!tests/setup.py | ||
!tests/python | ||
!tests/go | ||
!tests/go | ||
!tools/offline_log_viewer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Copyright 2022 Redpanda Data, Inc. | ||
# | ||
# Use of this software is governed by the Business Source License | ||
# included in the file licenses/BSL.md | ||
# | ||
# As of the Change Date specified in that file, in accordance with | ||
# the Business Source License, use of this software will be governed | ||
# by the Apache License, Version 2.0 | ||
|
||
import json | ||
|
||
|
||
class OfflineLogViewer: | ||
""" | ||
Wrap tools/offline_log_viewer for use in tests: this is for tests that | ||
want to peek at the structures, but also for validating the tool itself. | ||
""" | ||
def __init__(self, redpanda): | ||
self._redpanda = redpanda | ||
|
||
def _cmd(self, suffix): | ||
viewer_path = "python3 /opt/scripts/offline_log_viewer/viewer.py" | ||
return f"{viewer_path} --path {self._redpanda.DATA_DIR} {suffix}" | ||
|
||
def read_kvstore(self, node): | ||
cmd = self._cmd("--type kvstore") | ||
kvstore_json = node.account.ssh_output(cmd, combine_stderr=False) | ||
return json.loads(kvstore_json) | ||
|
||
def read_controller(self, node): | ||
cmd = self._cmd("--type controller") | ||
controller_json = node.account.ssh_output(cmd, combine_stderr=False) | ||
try: | ||
return json.loads(controller_json) | ||
except json.decoder.JSONDecodeError: | ||
# Log the bad output before re-raising | ||
self._redpanda.logger.error( | ||
f"Invalid JSON output: {controller_json}") | ||
import time | ||
time.sleep(3600) | ||
raise |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool. What do you think the guidance should be on when to validate things on disk vs validating at a higher level? Applied too liberally it could eat up a lot of time and be hard to maintain.
For instance here what do we gain from reading the log compared to restarting redpanda and checking the kvstore? The reason being that we know kvstore serves data from memory and its only source of state is from the on disk file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When something is available via external APIs, I would generally expect tests to get it that way.
The first use case for this (in this PR) is for something we can't validate externally -- cleanup of data that is no longer needed. It's similar to when we would use the existing storage inspection stuff: when we want to validate something beyond client-visible correctness, like storage being reclaimed, flushed up to a certain point, or migrated to/from tiered storage.
There are a bunch of storage tests that do things like "write 5 segments worth of data, wait til we see 5 segments, wait for 3 segments to upload to tiered storage...", which I anticipate being more robust if we convert them to use a decode of the log to instead assert on "wait til we see N batches, don't care how they're segmented". Only for the tests that write very small amounts of data, of course.
I would also like to use this as a sanity check that log-structured things like the controller log are at a length we expect: right now it would be pretty easy to slip in a bug that caused like 1000 extra controller log messages (like some rogue fiber writing messages repeatedly), and nothing would notice/fail. We could use some debug admin APIs for querying that kind of thing: the tradeoff is how much code we bake into redpanda proper vs. how much we can maintain as test-specific stuff. Then again, stuff we bake into redpanda proper is more usable in field debug tools...
Long story short, I mostly anticipate doing direct on-disk reads in the context of storage tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation
i like it. i recall at some point @mmaslankaprv suggesting that we write a test that verified that all of the controller logs were identical (up to some common prefix) after a test run. i don't know if that was ever done but this would make that a lot easier!