Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Fix summarize function #928

Merged
merged 3 commits into from
Jun 4, 2018
Merged

Conversation

shanson7
Copy link
Collaborator

Fixes #903

@shanson7 shanson7 changed the title Feature fix summary Fix summarize function May 30, 2018
// This test is specifically testing the timestamp behavior, so we don't need real values.
// However, we do need a lot of datapoints to trigger the bug we are regression testing against.
interval1Day := uint32(24 * 60 * 60)
numTimestamps := uint32(2000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does the 2000 come from? is the goal to test 2000 days?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug in this case was two-fold.

  1. start-end caused an underflow for uint32
  2. Division was by the interval, rather than the step (as it is in the python code) as well as using the adjusted start, end (rather than the original).

So, the bug comes about when (start-end) / interval is less than the number of input datapoints.

I needed a large enough number of input datapoints, and a large enough interval such that that len(input) > (start-end) / interval (taking into account underflow). These were arbitrary numbers that I selected that caused the above to be true.

{Val: 0, Ts: 1482192000},
{Val: 0, Ts: 1513728000},
{Val: 0, Ts: 1545264000},
{Val: 0, Ts: 1576800000},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the relation between these ts? can they be set from code so it's more obvious?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the expected outcome. Setting from code would be partially duplicating the logic in the function under test, which I try to avoid in test code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(These are the results as graphite reported them)

Copy link
Contributor

@Dieterbe Dieterbe Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree we shouldn't use advanced code that is similar to or shared with the code under test,
but now this is pretty unreadable: i have to pull out a calculator to find out they are apart by 2592000 , aka 30*24*3600, and to confirm that the first ts is the last interval boundary before the startTime.
I think trivial, obviously-correct solves this problem. Thus I would recommend something like:

        start := startTime - (startTime % outputInterval)
        var unalignedExpected = []schema.Point{
                {Val: 0, Ts: start},
                {Val: 0, Ts: start + outputInterval},
                {Val: 0, Ts: start + 2*outputInterval},
                {Val: 0, Ts: start + 3*outputInterval},
        }

{Val: 0, Ts: 1527709518},
{Val: 0, Ts: 1559245518},
{Val: 0, Ts: 1590781518},
{Val: 0, Ts: 1622317518},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the relation between these ts? can they be set from code so it's more obvious?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. we can clarify this a lot and make it obviously confirm to the docs ("Passing alignToFrom=true will instead create buckets starting at the from time" per http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.summarize)
by doing startTime, startTime + outputInterval, etc.

{
{
Target: "summarize(align, \"365d\", \"sum\")",
QueryPatt: "summarize(align, \"365d\", \"sum\")",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason we're testing with 365d buckets?
i would suggest to target the bugreport specifically and use 30d buckets , and a from-until range as used in the ticket of 89.000683 days, as well as a few different values like 89 days, 90 days, etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is also dependent on the input datapoints. That wasn't specified in the bug report, but I don't believe it's necessary to exactly reproduce as in the bug report. I expect that I could though, if you wish.

Copy link
Contributor

@Dieterbe Dieterbe Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the data from the bug report has interval=10s (i just ran mt-index-cat )

@shanson7
Copy link
Collaborator Author

shanson7 commented Jun 1, 2018

I modified the time range to better match the issue report (assuming a 1 hour input interval). It still exhibits the bug with the old code, and is fixed with the new code.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 1, 2018

great, thanks Sean. will check it out and get back to you.

QueryPatt: "align",
QueryFrom: startTime,
QueryTo: endTime,
Interval: outputInterval,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be inputInterval?

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 4, 2018

generally for bugfixes i recommend first commit = test case, 2nd = bugfix.
that makes it easy to verify. now the burden is on the reviewer (or any time a future reader wants to verify the solution) to manually undo the fix to confirm the old code doesn't pass the test case.
(git add -p and git rebase -i make it easy to structure the commit history this way, if needed)

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 4, 2018

I modified the time range to better match the issue report (assuming a 1 hour input interval). It still exhibits the bug with the old code, and is fixed with the new code.

note that in the bug, there's only 1 returned output point instead of 3, whereas running the test on the broken code results in 3 instead of expected 4.

go test -run=TestSummarizeLargeIntervalTimestamps
--- FAIL: TestSummarizeLargeIntervalTimestamps (0.00s)
	func_summarize_test.go:759: case "LongIntervals" ("30d", "sum", false): len output expected [{0 1464480000} {0 1467072000} {0 1469664000} {0 1472256000}], got [{0 1464480000} {0 1467072000} {0 1469664000}]
FAIL
exit status 1
FAIL	github.com/grafana/metrictank/expr	0.004s

but, good news!
when i change the test to use input resolution of 10s I also get exactly 1 point back.

go test -run=TestSummarizeLargeIntervalTimestamps
--- FAIL: TestSummarizeLargeIntervalTimestamps (0.01s)
	func_summarize_test.go:759: case "LongIntervals" ("30d", "sum", false): len output expected [{0 1464480000} {0 1467072000} {0 1469664000} {0 1472256000}], got [{0 1464480000}]
FAIL
exit status 1
FAIL	github.com/grafana/metrictank/expr	0.015s

@shanson7
Copy link
Collaborator Author

shanson7 commented Jun 4, 2018

I restructured the commits to split out the test, fix and orthogonal changes.

In the test, I now compute the output timestamps using the expected interval. I also changed the input interval to 10s.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 4, 2018

we have a winner! now go enjoy monitorama!

@Dieterbe Dieterbe merged commit f2fc7f5 into grafana:master Jun 4, 2018
@Aergonus Aergonus deleted the feature_fixSummary branch June 4, 2018 22:09
@Dieterbe Dieterbe added this to the 0.10.0 milestone Dec 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants