Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: update offline_log_viewer and use in ducktape tests #5315

Merged
merged 22 commits into from
Dec 16, 2022

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jul 1, 2022

Cover letter

  • Pull in @andijcr changes from Feature: offline_log_viewer updates #6233 to enable serde decode of most controller log messages
  • Include the tool in test dockerfile and wrap it in a python client class for ducktape
  • Various corner case fixes for general log loading (short reads, reads on systems with no topics
  • Update group_configuration decode to support version 4
  • Use the tool in deletion test to confirm that kvstore parts of topic state are getting cleaned up. This follows on from storage: explicitly mark clean segments in kvstore on clean shutdown #5127, to get better confidence that new kvstore contents are being properly cleaned up.
  • Decode controller logs at the end of a multi-version upgrade test, to increase confidence that the tool works on various generations of data. When we enhance the upgrade tests in tests: fix upgrade tests not to skip versions #7310, this check should be done after the "big upgrade test" that steps through all the historic versions.

The tool is already in clustered ducktape AMI via https://github.com/redpanda-data/vtools/pull/772

Release notes

  • none

@jcsp
Copy link
Contributor Author

jcsp commented Jul 4, 2022

CI failures is: #5241

@jcsp jcsp marked this pull request as ready for review July 6, 2022 20:53
@jcsp jcsp requested review from a team, dotnwat and NyaliaLui as code owners July 6, 2022 20:53
@jcsp jcsp requested review from andrewhsu and removed request for a team July 6, 2022 20:53
@jcsp
Copy link
Contributor Author

jcsp commented Jul 6, 2022

Note: merge https://github.com/redpanda-data/vtools/pull/772 before merging this

dotnwat
dotnwat previously approved these changes Jul 7, 2022
Comment on lines 288 to 298
try:
[part, ntp_id] = part_ntp_id.split("_")
except ValueError:
# Not a partition dir: this may happen if we have a namespace dir
# with no partitions.
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a more specific test is necessary to handle cases where a topic has an underscore?

print(head, part_ntp_id, part, ntp_id)

data/namespace topic1_asdf topic1 asdf

like here I end up with topic1/asdf as the partition and ntp_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit went away in favor of 906a9e7 fix for empty dirs upstream

@@ -349,6 +351,10 @@ def decode(self):
s = Segment(path)
for batch in s:
for r in batch:
offset = batch.header.base_offset + r.offset_delta
if snapshot_offset is not None and offset < snapshot_offset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, would offset be inclusive here: "all messages from before up to including the snapshot offset"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct, I was thinking of next_offset, but the save_snapshot path in kvstore already subtracts 1. Fixed.

Comment on lines 219 to 226
if len(data) < HEADER_SIZE:
# Short read, probably log being actively written or unclean shutdown
return None

assert len(data) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like assert len(data) == 0 is no longer possible. indeed, the if statement also doesn't seem necessary now since data will definitely be < header size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


result = {}
for n in redpanda.nodes:
kvstore_json = n.account.ssh_output(viewer_cmd, combine_stderr=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really pretty neat

@NyaliaLui
Copy link
Contributor

Triggered rebuild on CI since the PR is a little old.

@jcsp
Copy link
Contributor Author

jcsp commented Nov 24, 2022

This should be pulled into #6233 because this PR can't decode modern logs, and that PR doesn't have anything to test it.

@jcsp jcsp force-pushed the test-kvstore-deletion branch 2 times, most recently from f107f42 to 3f18cd5 Compare December 2, 2022 14:13
@jcsp jcsp changed the title tests: use offline log decoder to verify kvstore key deletion on topic deletion tests: update offline_log_viewer and use in ducktape tests Dec 2, 2022
@jcsp jcsp removed the request for review from NyaliaLui December 2, 2022 14:18
@mmaslankaprv
Copy link
Member

/ci-repeat 1

jcsp and others added 22 commits December 15, 2022 09:18
On stderr it gets mixed with logs.  On stdout it is
straightforward to capture and decode json for use in tests.
This was wasteful: all messages from before the snapshot should
already be accounted for in the state we loaded from the snapshot.
A None value is a deletion.  Previously, we ignored
these and thereby the kvstore dump would include
all keys ever created, even if they had since
been deleted.
... and augmented Reader.read_envelope to accept a type_read Callable[rdr: Reader, version: Int]: Dict and a max_version parameter.

 If a type_read is used, it is invoked with reader and envelope if envelope.version <= max_version parameter.
 the result is joined with a dict {'envelope': envelope} and returned, otherwise an error dict is returned.

 If no type_read is used, envelope is returned immediately
the new code tries to mimic the structure of the adl decoding functions.
the new code is aware of the version number embedded in the binary stream and will skip the field if the version is not supported, while reporting the error.
…are processed

in case of unread byte, an 'unread': {'k': ..., 'v': ...} field is added and an error is logged

this check is not exact but is a good proxy to know if the decoding code is correct
Partitions have various kvstore items, stored by
the raft + storage layers.  When the topic is
deleted, they should be deleted too.

Use our offline log decoder tool to inspect
the kvstore out-of-band and validate the keys
go away when the topic is deleted.
This needs fixing, but better to comment it out and have
the rest of the command work.
@jcsp jcsp merged commit f12e562 into redpanda-data:dev Dec 16, 2022
@jcsp jcsp deleted the test-kvstore-deletion branch December 16, 2022 12:03
dotnwat added a commit that referenced this pull request Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants