-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in kafka_streams_test.KafkaStreamsWikipedia
#2889
Comments
Looks like this failed again post-fixup here: https://buildkite.com/vectorized/redpanda/builds/4434#7613b197-2d5a-4916-b96f-4c472fefd21e |
In prep for ci-party Problem descriptionIn CI, the driver (load generator) infrequently fails to generate data. This is a problem because there will be no output from the driver which is necessary to validate the functional test. Reproducing the issueDifficulty: hard Steps to reproduce:
|
I fixed this CI failure in the linked PR but the PR is stale now. Once I get my fork back up I can re-create the PR and get it merged. |
Pr with fix #2962 |
The KStreams example classes use the same names as the KStreams tests. Rename them and distinguish between an example and a driver. This will improve code readability. Within this commit are changes for redpanda-data#3032. The main problem in redpanda-data#3032 is that the internal Java application sometimes fails to count all input messages. This may be related to the 1min window used within the program. Re-enable with ok_to_fail for now until a fix is proposed. This commit also re-enables KStreams wikipedia test since it was disabled due to redpanda-data#2889. Running the test multiple times (50+) did not reproduce the issue in redpanda-data#2889. Redpanda has changed alot since redpanda-data#2889 was last seen, so re-enable with ok_to_fail to see if the problem still exists.
The PR with fix is now #4461 since I lost access to the previous PR. |
This test was disabled due to redpanda-data#2889 but a lot has changed since then. Re-enable the test with ok_to_fail to see if it continues to fail.
The KStreams example classes use the same names as the KStreams tests. Rename them and distinguish between an example and a driver. This will improve code readability. This commit also re-enables KStreamsWikipedia with ok_to_fail. The test was removed from the test suite but Redpanda has changed alot since the issue was reported. Let's re-enable to see if the issue continues. See redpanda-data#2889 for details.
PR to re-enable this test ok_to_fail is here #5615 |
This test was disabled due to redpanda-data#2889 but a lot has changed since then. Re-enable the test with ok_to_fail to see if it continues to fail.
This test was disabled due to redpanda-data#2889 but a lot has changed since then. Re-enable the test with ok_to_fail to see if it continues to fail.
I'm able to repro the issue locally. It fails 10/1000 runs so now I can make observations to discover the issue. This will likely take a few days to identify the root cause so I'll update the ticket as I find new information |
I discovered that on all the failed tests, the I need to inspect the KStreams code base more for when/what is supposed to submit requests to the registry. |
Weekend test runs revealed that the producers within KafkaStreams sometimes fail to generate data. So I'm testing some adjustments to the KafkaStreams code now to see if that resolves the issue. |
The wikipedia driver was originally coded to produce messages with a random number generator. The generator was for the range [0,99] which meant there was a chance that no records are produced. This caused a problem in Redpanda integration testing where success is determined by parsing output. This commit ensures that atleast 1 record is always sent to the example topics. Fixes: redpanda-data/redpanda#2889
The new commit incorporates a small change to the KStreams Wikipedia example. Previously it was possible for 0 messages to be sent to topics. Now a minimum of 1 message is. This problem manifested as a ducktape timeout error since the test relies on parsing output. Fixes: redpanda-data#2889
https://buildkite.com/vectorized/redpanda/builds/4042#a185db01-f1ab-494c-b801-7083a5547589
TimeoutError: Timed out waiting 600 seconds for service nodes to finish. These nodes are still alive: ['ExampleRunner-1-140671614982704 node 1 on docker_n_19']
From the debug log, the program used to generate load for this test did not generate output which is necessary to check for successful execution.
The text was updated successfully, but these errors were encountered: