-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ducky: test read_replica even if not data is uploaded yet #5620
Conversation
ae78720
to
6eb292e
Compare
backoff_sec=5, | ||
err_msg= | ||
f"Not all data is uploaded to S3 bucket, is S3 bucket: {list(self.redpanda._s3client.list_objects(self.s3_bucket_name))}" | ||
f"Not all data is uploaded to S3 bucket, is S3 bucket: {list(self.redpanda._s3client.list_objects(self.si_settings.cloud_storage_bucket))}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string is getting formatted when we enter wait_until, right? I think you'd rather see the bucket listing at the point of the timeout, rather than before we started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's an error message that is printed in case of timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in the time sequence we're doing:
- Process the format string, including listing objects in the bucket, and assign the result to
err_msg
- Attempt A...
- Attempt B...
- Attempt C...
- On timeout, print
err_msg
. But this is the list of objects from time 1, not from now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! deleted s3 content from the error message. The s3 content is printed on deletion of s3 bucket, so it's still possible to get this information
try: | ||
self.create_read_replica_topic() | ||
except RpkException as e: | ||
if "The server experienced an unexpected error when processing the request" in str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this function is just getting moved, but it would benefit from a comment explaining why "unexpected error" is the expected error in this context -- I'm guessing this is because read replica topic creation doesn't have a suitable kafka error code and so it uses UNKNOWN_SERVER_ERROR?
(Maybe this was already discussed, but perhaps something like UNKNOWN_TOPIC_OR_PARTITION would help to distinguish this case from true internal errors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mattschumpert do you have an opinion on error code when redpanda can't create read replica topic because there's no corresponding data in S3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LenaAn ,
something like
'Error creating Remote Read Replica topic: No topic data found for topic {foo} at s3://{bucket_url_with_pathl}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattschumpert : I think the request here is for suggestion of a good/relevant kafka error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I suggest:
LOG_DIR_NOT_FOUND | 57 | False | The user-specified log directory is not found in the broker config.
or
KAFKA_STORAGE_ERROR | 56 | True | Disk error when trying to access log file on the disk.
LOG_DIR_NOT_FOUND is actually quite accurate.
@LenaAn please check out the CI failulres + this is good to merge once they're associated with existing issues. ARM CI is stuck waiting for agents, we can merge past it for a python change like this one. |
Client may create a read replica topic when not all data is in S3 yet. We should test this scenario.
rebased to dev to get rid of merge conflict |
Cover letter
Client may create a read replica topic when not all data is in S3 yet.
We should test this scenario. Also increase timeout
Fixes #5540
UX changes
Describe in plain language how this PR affects an end-user. What topic flags, configuration flags, command line flags, deprecation policies etc are added/changed.
Release notes