Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ducky: test read_replica even if not data is uploaded yet #5620

Merged
merged 1 commit into from
Jul 28, 2022

Conversation

LenaAn
Copy link
Contributor

@LenaAn LenaAn commented Jul 25, 2022

Cover letter

Client may create a read replica topic when not all data is in S3 yet.
We should test this scenario. Also increase timeout

Fixes #5540

UX changes

Describe in plain language how this PR affects an end-user. What topic flags, configuration flags, command line flags, deprecation policies etc are added/changed.

Release notes

@mmedenjak mmedenjak added kind/bug Something isn't working area/tests ci-failure area/cloud-storage Shadow indexing subsystem labels Jul 26, 2022
@LenaAn LenaAn force-pushed the fix_test branch 2 times, most recently from ae78720 to 6eb292e Compare July 26, 2022 13:14
@LenaAn LenaAn requested a review from jcsp July 26, 2022 13:15
backoff_sec=5,
err_msg=
f"Not all data is uploaded to S3 bucket, is S3 bucket: {list(self.redpanda._s3client.list_objects(self.s3_bucket_name))}"
f"Not all data is uploaded to S3 bucket, is S3 bucket: {list(self.redpanda._s3client.list_objects(self.si_settings.cloud_storage_bucket))}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string is getting formatted when we enter wait_until, right? I think you'd rather see the bucket listing at the point of the timeout, rather than before we started

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's an error message that is printed in case of timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in the time sequence we're doing:

  1. Process the format string, including listing objects in the bucket, and assign the result to err_msg
  2. Attempt A...
  3. Attempt B...
  4. Attempt C...
  5. On timeout, print err_msg. But this is the list of objects from time 1, not from now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! deleted s3 content from the error message. The s3 content is printed on deletion of s3 bucket, so it's still possible to get this information

try:
self.create_read_replica_topic()
except RpkException as e:
if "The server experienced an unexpected error when processing the request" in str(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this function is just getting moved, but it would benefit from a comment explaining why "unexpected error" is the expected error in this context -- I'm guessing this is because read replica topic creation doesn't have a suitable kafka error code and so it uses UNKNOWN_SERVER_ERROR?

(Maybe this was already discussed, but perhaps something like UNKNOWN_TOPIC_OR_PARTITION would help to distinguish this case from true internal errors)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mattschumpert do you have an opinion on error code when redpanda can't create read replica topic because there's no corresponding data in S3?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LenaAn ,

something like

'Error creating Remote Read Replica topic: No topic data found for topic {foo} at s3://{bucket_url_with_pathl}'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattschumpert : I think the request here is for suggestion of a good/relevant kafka error code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piyushredpanda , @LenaAn

In this case, I suggest:

LOG_DIR_NOT_FOUND | 57 | False | The user-specified log directory is not found in the broker config.

or

KAFKA_STORAGE_ERROR | 56 | True | Disk error when trying to access log file on the disk.

LOG_DIR_NOT_FOUND is actually quite accurate.

jcsp
jcsp previously approved these changes Jul 27, 2022
@jcsp
Copy link
Contributor

jcsp commented Jul 27, 2022

@LenaAn please check out the CI failulres + this is good to merge once they're associated with existing issues.

ARM CI is stuck waiting for agents, we can merge past it for a python change like this one.

Client may create a read replica topic when not all data is in S3 yet.
We should test this scenario.
@LenaAn
Copy link
Contributor Author

LenaAn commented Jul 28, 2022

rebased to dev to get rid of merge conflict

@jcsp jcsp merged commit e83dd5b into redpanda-data:dev Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/tests ci-failure kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestReadReplicaService.test_simple_end_to_end ci failure
5 participants