NOPS-1034 added threshold handling for asPercent and divideSeries #165

AniaMakes · 2021-10-27T12:37:15Z

Companion to #164

Graphite datapoints are values stored with their timestamp. But, if there are no events during that timestamp, the value will be null and not 0, ie denoting explicit lack of a value. (https://graphite.readthedocs.io/en/latest/terminology.html#graphite-terminology). So in some cases null is an expected - and even desirable value. For example if we're checking the error.count then 0 is ideal as that means no errors are occurring.

Things get a little bit interesting when we look at asPercent and divideSeries. Those two graphite functions are used in the same way - we take a sum of errors in _time_ and a sum of successes in _time_ and we either express the result as a percentage or division result. I think the desired metrics is the ratio of errors to successes; I'm not quite sure either asPercent or divideSeries reflects that, but it's good enough to be used as a valid metric. Anyhow, what this means is practice is most of the time sum of successes = _positive integer_ and sum of errors = null (ie 0). But zero divided by positive integer is still zero (for divideSeries), and zero times 100, then divided by positive integer is also zero (for asPercent).

What this PR does

it adds custom handling for asPercent and divideSeries
simplifies test headers for sumSeries because I got very confused

What this PR does not do
My initial though would be - once we checked that the result of asPercent or divideSeries was null, we'd run a fetch against the second metric of the check and fail if that's null too. I totally forgot you can't do await inside a map. As such we have two options: 1️⃣ merge this PR and call it done, given that's an improvement on what was there before; and 2️⃣ if we are really concerned about the possibility of Graphite being the weak point, we should go through the repos that are using those checks and add a n-health graphiteWorking check.

When reviewing this PR please state your preference for 1️⃣ vs 2️⃣ vs any other thought you might have on this matter.

serena97 · 2021-10-28T10:31:14Z

src/checks/graphiteThreshold.check.js

+				if(result.target && asPercentRegex.test(result.target) || result.target && divideSeriesRegex.test(result.target)){
+					const fetchCountPerTimeUnit = result.datapoints.map(item => Number(item[0]));
+					const isFailing = this.direction === 'above' ?
+					Number(fetchCountPerTimeUnit[0]) > this.threshold :


question: I think I need some more context here in understanding why we're only using the first element of fetchCountPerTimeUnit to determine whether it is failing. Is it because we can be sure that there is only 1 datapoint in the array for divideSeries or asPercent, and that datapoint is the result of the query? From what I understand, if there is more than 1 datapoint in the array, the api does not return the result of the query so the datapoints look like

"datapoints": [[1.0, 1635125460], [null, 1635129060], ...]

so I would have thought that we would need to do the calculations ourselves here, similar to how summarize(sumSeries) is done

oops, I just saw your tests and it answered my question - it is indeed 1 item in the datapoints array

datapoints: [[ 8, 1635125640]]

suggestion: as there's only 1 item in the datapoints array, perhaps we can just change line 54 to
const ratioOfErrorsToSuccesses = Number(result.datapoints[0][0])
to make it more readable

I took a slightly different approach. All the health checks I've seen have only one datapoint, but you are correct that this may not always be the case - I added a log to let us know if that's not the case (so we can iterate as needed).

serena97 · 2021-10-28T11:01:01Z

test/graphiteThreshold.check.spec.js

+			});
+		});
+
+		it('Should be unhealthy if below threshold', function (done) {


question: would it make sense for it to be healthy (instead of unhealthy) below the threshold? In my mind, if we get the % of errors against the successes, a low value suggests that there's a low failure rate

This tests checks that the direction: 'below' works. I doubt the actual functionality would be used because this functionality is generally used for rater of errors to successes. But sometimes you may want to track ratios of other events.

…an one

serena97 · 2021-10-29T17:05:33Z

Hey Ania, this looks good to me! I think we should merge this PR in because it helps identify the ratio of errors to successes and therefore is an improvement on what was there before. For option 2, I'd like to understand why Graphite is considered a weak point

NOPS-1034 added threshold handling for asPercent and divideSeries

added threshold handling for asPercent and divideSeries

d743aa8

AniaMakes requested a review from a team as a code owner October 27, 2021 12:37

serena97 reviewed Oct 28, 2021

View reviewed changes

added log to see if any of percent or divide have different length th…

2d2770d

…an one

serena97 approved these changes Oct 29, 2021

View reviewed changes

AniaMakes merged commit e8f9c1d into main Nov 3, 2021

AniaMakes deleted the divide-and-asPercent branch November 3, 2021 10:18

JSRedondo pushed a commit that referenced this pull request Jun 20, 2022

Merge pull request #165 from Financial-Times/divide-and-asPercent

2a1ce13

NOPS-1034 added threshold handling for asPercent and divideSeries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOPS-1034 added threshold handling for asPercent and divideSeries #165

NOPS-1034 added threshold handling for asPercent and divideSeries #165

AniaMakes commented Oct 27, 2021

serena97 Oct 28, 2021 •

edited

Loading

serena97 Oct 28, 2021 •

edited

Loading

AniaMakes Oct 29, 2021

serena97 Oct 28, 2021

AniaMakes Oct 29, 2021

serena97 commented Oct 29, 2021

NOPS-1034 added threshold handling for asPercent and divideSeries #165

NOPS-1034 added threshold handling for asPercent and divideSeries #165

Conversation

AniaMakes commented Oct 27, 2021

serena97 Oct 28, 2021 • edited Loading

Choose a reason for hiding this comment

serena97 Oct 28, 2021 • edited Loading

Choose a reason for hiding this comment

AniaMakes Oct 29, 2021

Choose a reason for hiding this comment

serena97 Oct 28, 2021

Choose a reason for hiding this comment

AniaMakes Oct 29, 2021

Choose a reason for hiding this comment

serena97 commented Oct 29, 2021

serena97 Oct 28, 2021 •

edited

Loading

serena97 Oct 28, 2021 •

edited

Loading