print only failed tests with RerunFormatter #2290

furiel · 2023-06-05T15:33:50Z

I have a distributed system, where (especially after release), I need a couple of query attempts to get the system going and produce stable responses in a timely manner. RerunFormatter significantly reduced the noise of this kind of flakiness.

However, I have some concerns with the output of the formatter, that I would like to discuss with you.

For sake of reasoning, I created a deliberately flaky scenario. This is an example output of logFailedTestCaseAttempts(), where the flaky test is retried a couple of times, but was successful in the end.

[...]
1 scenario (1 passed)
3 steps (3 passed)
0m01.992s (executing steps: 0m00.014s)
features/playground.feature:3:3:3:3:3:3:3:3:3:3

Problems with the output

The output contains errors, including failing steps that were eventually fixed by re-execution.
I am leaning towards this is a bug. My primary goal was to use RerunFormatter to reduce noise. I want to see the real faliures, and not the eventially fixed flaky results. For example, in case I have a changeset I am testing, containing both actual failures and flaky ones, I cannot differentiate from the output and figure out which testcases should I investigate first.

One might argue that they want to know about flaky tests too. That's not my use case, but in case you disagree, I set the issue type to feature request.

The same line is repeated multiple times. This clutters the output, and make it hard to get a good grasp on how many faliures are actually. What I do is I have a separate script that post processes the output, and removes the duplicates. I don't think repeating the same lines is contains too much value.

Difficulties of extending `RerunFormatter`

The root of the unintended behavior is here: the guard does not consider the willBeRetried field of the steps.

I went ahead and figured maybe I can create a custom formatter extending RerunFormatter, modifying the condition to achieve what I want. But I ran into a couple of obstacles that might worth addressing, even if you want to keep the original behavior.

The plan was to copy-paste override the entire method, just with the condition changed, by adding a check for willBeRetried.

Aggregation and display is coupled in logFailedTestCaseAttempts()

The logic that aggregates the failures tightly coupled with printing to console. So if I want to change any behavior, I need to override the entire method. I do not have any option to get the failures, and do some post process (let's say remove duplicates from the output).

separator as private.

This was something I needed to work around by basically duplicating the field and it's initialization in the subclass.

What is the reason behind making it private? Could it be relaxed to protected?

doesNotHaveValue is not exported outside the project.

Defined here. Although these functions are fundamental to the project: most of the modules import these, none of the publicly available ones export it.

I ended up importing like

import {doesNotHaveValue} from '@cucumber/cucumber/lib/value_checker';

but I think it would worth making it publicly available.

This was the skeleton of the code I ended up with:

The extended class:

import {doesNotHaveValue} from '@cucumber/cucumber/lib/value_checker';

var getGherkinScenarioLocationMap = formatterHelpers.GherkinDocumentParser.getGherkinScenarioLocationMap;

function unstable(worstTestStepResult) {
  return worstTestStepResult.status !== messages.TestStepResultStatus.PASSED
}

class CustomRerunFormatter extends RerunFormatter {
  public readonly sep: string
  constructor(options: IFormatterOptions) {
    super(options);
    this.sep = options.parsedArgvOptions.rerun.separator;
  }

  logFailedTestCases(): void {
    const mapping = {}
    this.eventDataCollector
      .getTestCaseAttempts()
      .forEach(({ gherkinDocument, pickle, worstTestStepResult, willBeRetried }) => {
        if (unstable(worstTestStepResult) && !willBeRetried) {
          const relativeUri = pickle.uri
          const line =
            getGherkinScenarioLocationMap(gherkinDocument)[
              pickle.astNodeIds[pickle.astNodeIds.length - 1]
            ].line
          if (doesNotHaveValue(mapping[relativeUri])) {
            mapping[relativeUri] = []
          }
          mapping[relativeUri].push(line)
        }
      })
    const text = Object.keys(mapping)
      .map((uri) => {
        const lines = mapping[uri]
        return `${uri}:${lines.join(':')}`
      })
      .join(this.sep)
    this.log(text + "\n")
  }
}

The text was updated successfully, but these errors were encountered:

furiel · 2023-06-09T07:58:45Z

I was wondering if you had the opportunity to look into this. Do these sound reasonable?

I tried to join to the slack community, but when I entered my email in the registration link I found here, I received an error about

davidjgoss · 2023-06-10T11:47:55Z

Hey @furiel, thanks for the detailed report and sorry for the late reply.

I do think this is a bug - for the purposes of rerun formatter we should only report a scenario as failed if all retries fail.

Since you’ve already pinpointed the source, would you be up for making a PR with the fix?

(I’ve passed in the report re the Slack invite.)

furiel · 2023-06-14T07:41:46Z

@davidjgoss thanks for the confirmation. I am creating a pull request.

Before doing my modifications, I wanted to add a test about the incorrect behavior discussed in cucumber#2290 I will update this test in a later patch that fixes the issue. Signed-off-by: Antal Nemes <antal.nemes.hu@gmail.com>

Prior this patch, RerunFormatter should log unstable tests: tests that failed at any retry attempt. Checking the `willBeRetried` value makes sure we will not log failures for rerun attempts that will be retried. Referring: cucumber#2290 for more information. Signed-off-by: Antal Nemes <antal.nemes.hu@gmail.com>

Introducing `getFailureMap()` and `formatFailedTestCases()` to decouple console logging with the calculation. This would help users who want to customize RerunFormatter For more information: cucumber#2290 Signed-off-by: Antal Nemes <antal.nemes.hu@gmail.com>

separator is not part of a complex logic, neither a complex structure. Users may want to reuse it for customizing RerunFormatter. For more details: cucumber#2290 Signed-off-by: Antal Nemes <antal.nemes.hu@gmail.com>

Many of the internal libraries use these utility functions. In case users want to customize code, they may want to reuse some parts of the original implementation, and as such, they would need these functions. More details: cucumber#2290 Signed-off-by: Antal Nemes <antal.nemes.hu@gmail.com>