Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Use the outcome field to calculate the transaction error rate chart #75528

Merged
merged 16 commits into from
Sep 7, 2020

Conversation

cauemarcondes
Copy link
Contributor

@cauemarcondes cauemarcondes commented Aug 20, 2020

closes #74478

  • Pending: Fix API test

@cauemarcondes cauemarcondes requested a review from a team as a code owner August 20, 2020 07:56
@botelastic botelastic bot added the Team:APM All issues that need APM UI Team support label Aug 20, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@cauemarcondes cauemarcondes changed the title apm-error-rate-outcome [APM] Use the outcome field to calculate the transaction error rate chart Aug 20, 2020
@felixbarny
Copy link
Member

felixbarny commented Aug 20, 2020

This defines the error rate as failure/(failure + success), right? Have you considered the alternatives failure/(failure + success + unknown) or even failure/all_transactions (failure/(failure + success + unknown + null))?

@cauemarcondes
Copy link
Contributor Author

cauemarcondes commented Aug 20, 2020

This defines the error rate as failure/(failure + success), right?

Correct.

Have you considered the alternatives failure/(failure + success + unknown) or even failure/all_transactions (failure/(failure + success + unknown + null))?

No, I have I understood, based on this comment https://github.com/elastic/apm/pull/299/files#r471580617, that we should not count the unknown documents, that's also why I am not considering null values too. Do you think I should count it too failure/all_transactions?

@felixbarny
Copy link
Member

I've clarified the definition of error rate: elastic/apm@362e2a1

It's now defined as failure/(failure + success), just as you have implemented 👍

@cauemarcondes
Copy link
Contributor Author

@elasticmachine merge upstream

@cauemarcondes
Copy link
Contributor Author

@elasticmachine merge upstream

@cauemarcondes
Copy link
Contributor Author

@sqren do you know what's the correct way to change the esArchiver to add the new field (event.outcome) to fix the API test?

Copy link
Member

@sorenlouv sorenlouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sorenlouv
Copy link
Member

@cauemarcondes You can generate new esarchive snapshots with the commands described here. You'll need to ingest data into ES first (eg. ./scripts/compose.py start master --no-kibana --with-opbeans-node --with-opbeans-rum)

@cauemarcondes
Copy link
Contributor Author

retest

@cauemarcondes
Copy link
Contributor Author

@dgieselaar I've just updated the snapshot here f459501, also created a new test checking for the anomaly.

Copy link
Member

@dgieselaar dgieselaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a suggestion to avoid a snapshot-esque test, other than that, LGTM.


expect(response.body).to.eql({
noHits: false,
erroneousTransactionsRate: [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this is not really helpful. Can we avoid a snapshot test? e.g. we can count the buckets, or compare the avg only, etc.

Copy link
Member

@sorenlouv sorenlouv Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should spent a little more time finding the right things to test instead of blindly testing the full output (I've been very guilty of this myself). Testing the entire output like this makes the test prone to false positives, where it fails when new properties are added, which leads us to blindly updating it when the implementation changes, thus bugs can creep in.

The challenge by NOT doing snapshot testing is that we don't have as good coverage and bugs can creep in...

I tried to think of the important things to verify, while still making it flexible enough to allow a developer to change the implementation (eg. add new properties).

const { erroneousTransactionsRate } = response.body;

// we want to assert that the range is correct
it("has the correct start date", async () => {
  expect(_.first(erroneousTransactionsRate).x).to.be("...");
});

it("has the correct end date", async () => {
  expect(_.last(erroneousTransactionsRate).x).to.be("...");
});

it("has the correct number of buckets", async () => {
  expect(erroneousTransactionsRate.length).to.be(1337);
});

// we want to assert that the timeseries values align with the average calculated in elasticsearch (not sure they are identical so might not be possible but worth a shot)
it("has the correct calculation for average", async () => {
  const average = _.meanBy(erroneousTransactionsRate, (p) => p.y);
  expect(average).to.be(response.body.average);
});

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you suggested @sqren, but I feel that in the test scenarios you described above we are not testing what really matters for the API, that is the error rate. The average is not calculated in ES, it also uses lodash mean, so even if we change the way we calculate the error rate these tests will never fail.

I agree that we should avoid snapshot tests, but in the end, we are testing the return of an API and we must find a balance between these two approaches.


it('has the correct number of buckets', async () => {
expect(errorRateResponse.erroneousTransactionsRate.length).to.be(61);
});
Copy link
Member

@sorenlouv sorenlouv Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think the way we calculated the average was somewhat useful in verifying the error rate values. Without that we should at least verify the first error rate value:

        it('has the correct error rate', async () => {
          expect(first(errorRateResponse.erroneousTransactionsRate)?.y).to.be(...);
        });

@cauemarcondes
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

async chunks size

id value diff baseline
apm 4.9MB +34.0B 4.9MB

distributable file count

id value diff baseline
default 45410 +1 45409

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@cauemarcondes cauemarcondes merged commit 7f323a1 into elastic:master Sep 7, 2020
@cauemarcondes cauemarcondes deleted the apm-error-rate-outcome branch September 7, 2020 09:23
cauemarcondes added a commit to cauemarcondes/kibana that referenced this pull request Sep 7, 2020
…hart (elastic#75528)

* replacing error rate to use event.outcome field

* addressing PR comment

* fixing api test

* fixing API test

* fixing api tests

* rmeoving snapshot from api test

* testing error rate

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Sep 8, 2020
…' of github.com:jloleysens/kibana into ilm/fix/pre-existing-policy-with-no-existing-repository

* 'ilm/fix/pre-existing-policy-with-no-existing-repository' of github.com:jloleysens/kibana:
  fix empty string in selected indices (elastic#76855)
  [Security Solution] Refactor OverviewHost and OverviewNetwork to use Search Strategy (elastic#76409)
  Use Search API in Timelion (sync) (elastic#75115)
  [telemetry] expose getIsOptedIn function in plugin start contract (elastic#75143)
  [ILM] Clean up remaining js files and any typings (elastic#76803)
  [Logs UI] Shared `<LogStream />` component (elastic#76262)
  [Security Solution] Add unit test for all hosts (elastic#76752)
  [Security Solution] Add unit test for authentications search strategy (elastic#76665)
  Do not apply search source data for tsvb (elastic#75137)
  [Security Solution] Refactor NetworkDns to use Search Strategy (elastic#76250)
  [SECURITY SOLUTION] Adds 'cypress:open-as-ci' command (elastic#76125)
  [Logs UI] Update alert executor tests (elastic#75764)
  [Functional] Unskip vega tests and fix flakiness (elastic#76600)
  [Data] Query String Input accepts classname prop (elastic#76848)
  [ML] Swim lane pagination for viewing by job id (elastic#76847)
  [Security Solution] Refactor MatrixHistogram to use Search Strategy (elastic#76603)
  [APM] Use the outcome field to calculate the transaction error rate chart (elastic#75528)
  [APM] Use observer.hostname instead of observer.name (elastic#76074)
  Legacy logging: fix remoteAddress being duplicated in userAgent field (elastic#76751)
jloleysens added a commit to jloleysens/kibana that referenced this pull request Sep 8, 2020
…s-for-710

* 'master' of github.com:elastic/kibana:
  [Search] Use async es client endpoints (elastic#76872)
  fix empty string in selected indices (elastic#76855)
  [Security Solution] Refactor OverviewHost and OverviewNetwork to use Search Strategy (elastic#76409)
  Use Search API in Timelion (sync) (elastic#75115)
  [telemetry] expose getIsOptedIn function in plugin start contract (elastic#75143)
  [ILM] Clean up remaining js files and any typings (elastic#76803)
  [Logs UI] Shared `<LogStream />` component (elastic#76262)
  [Security Solution] Add unit test for all hosts (elastic#76752)
  [Security Solution] Add unit test for authentications search strategy (elastic#76665)
  Do not apply search source data for tsvb (elastic#75137)
  [Security Solution] Refactor NetworkDns to use Search Strategy (elastic#76250)
  [SECURITY SOLUTION] Adds 'cypress:open-as-ci' command (elastic#76125)
  [Logs UI] Update alert executor tests (elastic#75764)
  [Functional] Unskip vega tests and fix flakiness (elastic#76600)
  [Data] Query String Input accepts classname prop (elastic#76848)
  [ML] Swim lane pagination for viewing by job id (elastic#76847)
  [Security Solution] Refactor MatrixHistogram to use Search Strategy (elastic#76603)
  [APM] Use the outcome field to calculate the transaction error rate chart (elastic#75528)
  [APM] Use observer.hostname instead of observer.name (elastic#76074)
  Legacy logging: fix remoteAddress being duplicated in userAgent field (elastic#76751)

# Conflicts:
#	x-pack/plugins/index_lifecycle_management/__jest__/components/edit_policy.test.tsx
#	x-pack/plugins/index_lifecycle_management/common/types/index.ts
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/components/node_allocation.tsx
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/edit_policy.tsx
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/phases/warm_phase.tsx
#	x-pack/plugins/index_lifecycle_management/public/application/services/api.ts
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Sep 9, 2020
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

cauemarcondes added a commit to cauemarcondes/kibana that referenced this pull request Sep 11, 2020
…hart (elastic#75528)

* replacing error rate to use event.outcome field

* addressing PR comment

* fixing api test

* fixing API test

* fixing api tests

* rmeoving snapshot from api test

* testing error rate

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
# Conflicts:
#	x-pack/plugins/apm/common/__snapshots__/elasticsearch_fieldnames.test.ts.snap
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

cauemarcondes added a commit to cauemarcondes/kibana that referenced this pull request Sep 13, 2020
…hart (elastic#75528)

* replacing error rate to use event.outcome field

* addressing PR comment

* fixing api test

* fixing API test

* fixing api tests

* rmeoving snapshot from api test

* testing error rate

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
# Conflicts:
#	x-pack/plugins/apm/common/__snapshots__/elasticsearch_fieldnames.test.ts.snap
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Sep 13, 2020
cauemarcondes added a commit that referenced this pull request Sep 13, 2020
…rate chart (#75528) (#77288)

* [APM] Use the outcome field to calculate the transaction error rate chart (#75528)

* replacing error rate to use event.outcome field

* addressing PR comment

* fixing api test

* fixing API test

* fixing api tests

* rmeoving snapshot from api test

* testing error rate

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
# Conflicts:
#	x-pack/plugins/apm/common/__snapshots__/elasticsearch_fieldnames.test.ts.snap

* removing service map test
@dgieselaar dgieselaar self-assigned this Oct 12, 2020
@dgieselaar
Copy link
Member

Looks good in all browsers 👍

@dgieselaar dgieselaar added the apm:test-plan-done Pull request that was successfully tested during the test plan label Oct 12, 2020
@dgieselaar dgieselaar removed their assignment Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:test-plan-done Pull request that was successfully tested during the test plan release_note:enhancement Team:APM All issues that need APM UI Team support v7.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[APM] Use the outcome field to calculate the transaction error rate chart
6 participants