Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Fix multi-interval series issue #897

Merged
merged 4 commits into from
May 16, 2018
Merged

Conversation

shanson7
Copy link
Collaborator

Because we were not combining multiple intervals for the same series into a single idx.Node we were hitting issues with the code deduping the series on the remote end. This change should fix both cases:

multiple defs on a single node should be collapsed into a single idx.Node element
multiple defs across nodes should not overwrite one another and should be merged later (by mergeSeries)

The symptom was data sort of "flashing" on refresh, switching between the old and new intervals data (since only one was winning).


if existing.LastUpdate < int64(point.Time) {
existing.LastUpdate = int64(point.Time)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (and the similar one below) are somewhat orthogonal, but as part of the interval change a rogue datapoint flying in "reset" the LastUpdate time to an older time and made the data unfindable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very interesting. note that the tank (AggMetric) wouldn't add this point (unless it is accepted by the reorder buffer), which makes this case even more interesting.
Also this only seems a problem if this "buggy, old" point is not followed by a correct, recent point.
you're the first one I know with a stream this broken, but sure, we may as well protect against it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This didn't actually happen to us, but it was my leading guess of what happened. Turns out I was wrong. The issue we had was that during a backfill, the LastUpdate in cassandra doesn't get set correctly (because the data stopped and didn't cause a flush). It's a mega corner case though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, comparing Find with FindByTag and checking for how they differ is an interesting exercise. I found another bug which we can fix later: #899
probably no one has noticed cause no one uses the feature (except worldping which is currently untagged)

@Dieterbe
Copy link
Contributor

Because we were not combining multiple intervals for the same series into a single idx.Node

I'm missing a lot of info here. we = your deployment? or we = the metrictank codebase?
FindByTag returns a idx.Node for each MKey,
Find returns an idx.Node for each path (nameWithTags)
so did you modify the code? were you creating new paths when you change the interval?
can you describe the problem in more detail?

@shanson7
Copy link
Collaborator Author

can you describe the problem in more detail?

Sure. FindByTag will always return exactly one def per idx.Node. If there are multiple idx.Nodes with the same path, one of them is picked arbitrarily by this logic, which will cause a series with multiple intervals to be incomplete.

As an example, if you had foo.bar;baz=v and it was published at a 30s interval and then changed to a 1min interval, both would be returned as separate idx.Node from FindByTag and clusterFindyByTag would only keep 1 of them. In practice, this means that graphing this series over a time range that contains both series would cause "flapping" where one would show, then the other.

HasChildren: false,
Defs: []idx.Archive{*def},
})
if _, ok := seen[def.NameWithTags()]; !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ids returned by m.idsByTagQuery have already been filtered to only include the ids of defs that have a lastUpdate > from

So instead of searching for all defs with this NameWithTags, wouldnt it make more sense to just create a map[string]*idx.Node to group all results by the def.NameWithTags().

Series names that have not been seen before just create a new entry in the map, with idx.Node's Defs field set to []idx.Archive{*def}. If the series name has already been seen then the def just gets appended to idx.Node's Defs field

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I had actually implemented it like this originally, but found the defByTagSet and went that path. I didn't think about this though. Will fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree. irrespective of the lastUpdate check (which indeed can be removed) that seems simpler, cleaner and also more symmetrical to what MemoryIdx.Find() does

@Dieterbe
Copy link
Contributor

Dieterbe commented Apr 30, 2018

Now your explanation makes a lot more sense.
And for comparison, MemoryIdx.Find (or more specifically MemoryIdx.find) returns idx.Node which has multiple Defs (e.g. for multiple intervals) under the same path (that's how we store in memory because Tree.Items holds the memory.Node and Node has mutiple Defs). I just tested and confirms that switching intervals and getting the full (merged) results back works (with untagged series)

I agree that we should fix this, and generally we should make FindByTag more symmetrical to Find.

@shanson7
Copy link
Collaborator Author

shanson7 commented May 7, 2018

I'm not sure why a test failed here. It doesn't look like it's due to my change, but I cannot retry it.

@Dieterbe Dieterbe added this to the 0.9.1 milestone May 16, 2018
@Dieterbe
Copy link
Contributor

Dieterbe commented May 16, 2018

@shanson7 this PR looks 👍 i just added a few minor cleanups and will merge shortly.

@Dieterbe Dieterbe merged commit 6e3d03b into grafana:master May 16, 2018
@shanson7 shanson7 deleted the feature_multiDef branch July 30, 2018 19:36
@Dieterbe Dieterbe modified the milestones: 1.1, 0.10.0 Dec 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants