Fix summarize function #928

shanson7 · 2018-05-30T21:02:29Z

Fixes #903

Dieterbe · 2018-05-31T08:17:00Z

expr/func_summarize_test.go

+ // This test is specifically testing the timestamp behavior, so we don't need real values.
+ // However, we do need a lot of datapoints to trigger the bug we are regression testing against.
+ interval1Day := uint32(24 * 60 * 60)
+ numTimestamps := uint32(2000)


where does the 2000 come from? is the goal to test 2000 days?

The bug in this case was two-fold.

start-end caused an underflow for uint32

Division was by the interval, rather than the step (as it is in the python code) as well as using the adjusted start, end (rather than the original).

So, the bug comes about when (start-end) / interval is less than the number of input datapoints.

I needed a large enough number of input datapoints, and a large enough interval such that that len(input) > (start-end) / interval (taking into account underflow). These were arbitrary numbers that I selected that caused the above to be true.

Dieterbe · 2018-05-31T08:18:33Z

expr/func_summarize_test.go

+ {Val: 0, Ts: 1482192000},
+ {Val: 0, Ts: 1513728000},
+ {Val: 0, Ts: 1545264000},
+ {Val: 0, Ts: 1576800000},


what's the relation between these ts? can they be set from code so it's more obvious?

These are the expected outcome. Setting from code would be partially duplicating the logic in the function under test, which I try to avoid in test code.

(These are the results as graphite reported them)

i agree we shouldn't use advanced code that is similar to or shared with the code under test,
but now this is pretty unreadable: i have to pull out a calculator to find out they are apart by 2592000 , aka 30*24*3600, and to confirm that the first ts is the last interval boundary before the startTime.
I think trivial, obviously-correct solves this problem. Thus I would recommend something like:

start := startTime - (startTime % outputInterval) var unalignedExpected = []schema.Point{ {Val: 0, Ts: start}, {Val: 0, Ts: start + outputInterval}, {Val: 0, Ts: start + 2*outputInterval}, {Val: 0, Ts: start + 3*outputInterval}, }

Dieterbe · 2018-05-31T08:18:42Z

expr/func_summarize_test.go

+ {Val: 0, Ts: 1527709518},
+ {Val: 0, Ts: 1559245518},
+ {Val: 0, Ts: 1590781518},
+ {Val: 0, Ts: 1622317518},


what's the relation between these ts? can they be set from code so it's more obvious?

same here. we can clarify this a lot and make it obviously confirm to the docs ("Passing alignToFrom=true will instead create buckets starting at the from time" per http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.summarize)
by doing startTime, startTime + outputInterval, etc.

Dieterbe · 2018-05-31T08:22:07Z

expr/func_summarize_test.go

+ {
+ {
+ Target: "summarize(align, \"365d\", \"sum\")",
+ QueryPatt: "summarize(align, \"365d\", \"sum\")",


any particular reason we're testing with 365d buckets?
i would suggest to target the bugreport specifically and use 30d buckets , and a from-until range as used in the ticket of 89.000683 days, as well as a few different values like 89 days, 90 days, etc

The bug is also dependent on the input datapoints. That wasn't specified in the bug report, but I don't believe it's necessary to exactly reproduce as in the bug report. I expect that I could though, if you wish.

FWIW the data from the bug report has interval=10s (i just ran mt-index-cat )

shanson7 · 2018-06-01T14:05:18Z

I modified the time range to better match the issue report (assuming a 1 hour input interval). It still exhibits the bug with the old code, and is fixed with the new code.

Dieterbe · 2018-06-01T14:52:56Z

great, thanks Sean. will check it out and get back to you.

Dieterbe · 2018-06-04T07:58:39Z

expr/func_summarize_test.go

+ QueryPatt: "align",
+ QueryFrom: startTime,
+ QueryTo: endTime,
+ Interval: outputInterval,


shouldn't this be inputInterval?

Dieterbe · 2018-06-04T08:04:22Z

generally for bugfixes i recommend first commit = test case, 2nd = bugfix.
that makes it easy to verify. now the burden is on the reviewer (or any time a future reader wants to verify the solution) to manually undo the fix to confirm the old code doesn't pass the test case.
(git add -p and git rebase -i make it easy to structure the commit history this way, if needed)

Dieterbe · 2018-06-04T08:56:10Z

I modified the time range to better match the issue report (assuming a 1 hour input interval). It still exhibits the bug with the old code, and is fixed with the new code.

note that in the bug, there's only 1 returned output point instead of 3, whereas running the test on the broken code results in 3 instead of expected 4.

go test -run=TestSummarizeLargeIntervalTimestamps
--- FAIL: TestSummarizeLargeIntervalTimestamps (0.00s)
	func_summarize_test.go:759: case "LongIntervals" ("30d", "sum", false): len output expected [{0 1464480000} {0 1467072000} {0 1469664000} {0 1472256000}], got [{0 1464480000} {0 1467072000} {0 1469664000}]
FAIL
exit status 1
FAIL	github.com/grafana/metrictank/expr	0.004s

but, good news!
when i change the test to use input resolution of 10s I also get exactly 1 point back.

go test -run=TestSummarizeLargeIntervalTimestamps
--- FAIL: TestSummarizeLargeIntervalTimestamps (0.01s)
	func_summarize_test.go:759: case "LongIntervals" ("30d", "sum", false): len output expected [{0 1464480000} {0 1467072000} {0 1469664000} {0 1472256000}], got [{0 1464480000}]
FAIL
exit status 1
FAIL	github.com/grafana/metrictank/expr	0.015s

shanson7 · 2018-06-04T17:05:43Z

I restructured the commits to split out the test, fix and orthogonal changes.

In the test, I now compute the output timestamps using the expected interval. I also changed the input interval to 10s.

Dieterbe · 2018-06-04T17:22:33Z

we have a winner! now go enjoy monitorama!

shanson7 changed the title ~~Feature fix summary~~ Fix summarize function May 30, 2018

Dieterbe suggested changes May 31, 2018

View reviewed changes

Dieterbe reviewed Jun 4, 2018

View reviewed changes

shanson7 added 3 commits June 4, 2018 09:57

Add test to reproduce summarize bug

b7171d4

Fix bug in summarize returning too few datapoints

b17a754

Make summarize stable again

3af0860

shanson7 force-pushed the feature_fixSummary branch from 2fcb744 to 3af0860 Compare June 4, 2018 17:04

Dieterbe approved these changes Jun 4, 2018

View reviewed changes

Dieterbe merged commit f2fc7f5 into grafana:master Jun 4, 2018

Aergonus deleted the feature_fixSummary branch June 4, 2018 22:09

Dieterbe added this to the 0.10.0 milestone Dec 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix summarize function #928

Fix summarize function #928

shanson7 commented May 30, 2018

Dieterbe May 31, 2018

shanson7 May 31, 2018

Dieterbe May 31, 2018

shanson7 May 31, 2018

shanson7 May 31, 2018

Dieterbe Jun 4, 2018 •

edited

Dieterbe May 31, 2018

Dieterbe Jun 4, 2018

Dieterbe May 31, 2018

shanson7 May 31, 2018

Dieterbe Jun 4, 2018 •

edited

shanson7 commented Jun 1, 2018

Dieterbe commented Jun 1, 2018

Dieterbe Jun 4, 2018

Dieterbe commented Jun 4, 2018

Dieterbe commented Jun 4, 2018 •

edited

shanson7 commented Jun 4, 2018

Dieterbe commented Jun 4, 2018

Fix summarize function #928

Fix summarize function #928

Conversation

shanson7 commented May 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Jun 4, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Jun 4, 2018 • edited

Choose a reason for hiding this comment

shanson7 commented Jun 1, 2018

Dieterbe commented Jun 1, 2018

Choose a reason for hiding this comment

Dieterbe commented Jun 4, 2018

Dieterbe commented Jun 4, 2018 • edited

shanson7 commented Jun 4, 2018

Dieterbe commented Jun 4, 2018

Dieterbe Jun 4, 2018 •

edited

Dieterbe Jun 4, 2018 •

edited

Dieterbe commented Jun 4, 2018 •

edited