Enhancement - indication of returned data is rollup data #11700

shalstea · 2018-04-23T16:33:31Z

Grafana 5.0.3
MetricTank

We have users with many Charts on a single dashboard. Depending on the cardinality of the data and the timerange MetricTank may return rolled up data (in our case, configured for hourly). This can be subtle as potentially only 1 or 2 graphs out of nine are rolled up.

Also when the overall dashboard date range changes, sometimes from as little as 12 hours to 24 hours, rollup data is used.

Updated:

Docs for Metrictank metadata: https://github.com/grafana/metrictank/blob/master/docs/http-api.md#metadata

It would be extremely nice if there were an indicator that rolled up data was used. A mouse over would provide the configuration, e.g. hourly, daily

Dieterbe · 2018-10-08T15:57:05Z

an interesting question here is how do we represent this in case many series were queried (some people query anywhere between tens, hundreds and hundreds of thousands of series), and there may be several series returned as well, each corresponding to one or more queried series.

my proposal is to provide an indication of :

which storage schemas rules were used
add counts to mark how many series were queried corresponding to each archive specification of each schema.
when runtime consolidation is used, specify to what resolution and using what function.

an example when you hit a few different series with different settings:

   foo.bar.baz:
5  raw: 1min
   rollup1: 5min:60d
   rollup2: 30min:100d

   baz:
10 raw: 10s -> runtime consolidation to 1min using avg
11 raw: 10s -> runtime consolidation to 1min using min

   default:
0  raw: 10s
15 rollup: 1min:10d
   rollup: 30min: 100d

additionally, i propose we track the used from/end time for each series read (remember, may be different due to movingAverage, timeShift, etc) and then in grafana display like this:

this makes it a bit easier to reason about what happened.

as far as providing this information back from graphite/metrictank to grafana, i propose we extend the response json object with a meta category, and put this info under data or something.
this allows us to later add meta information for other things (e.g. warnings when a shard was unavailable and returned data is incomplete)

torkelo · 2018-10-09T12:52:14Z

as far as providing this information back from graphite/metrictank to grafana, i propose we extend the response json object with a meta category, a

That sounds like a good idea. Beyond visualising it like you propose I think the most important thing is to show that the data you are looking at has been rolled up using a specific time window & aggregation function.

Dieterbe · 2019-06-02T08:23:53Z

I plan to start working on this shortly.
the main thing to take into account is that a graphite render response is a json array of series. it is not a dict. this has been discussed lightly in grafana/metrictank#1130 as well.
so, to stay compatible with old/existing graphite clients, we must provide the meta section as an "opt-in" request flag to get a different response type (a dictionary with the extra section), and the best way to do this seems to be via a datasource configuration option. either:

a "supports meta" toggle on the graphite datasource
a separate metrictank datasource type which is identical to graphite, except also supports the meta flag
as part of a version selector (which would require upstream graphite also receiving a PR to expose some of its own meta stats as well using the same mechanism)

looping in @DanCech to see if 3 is doable, if not 1 is probably simplest

regardless, we'll go ahead and start working on the implementation for MT.
I think this is something we can do without much/any up-front design, and just figure it out as we go along with the coding changes in MT. cc @davkal @bergquist

bergquist · 2019-06-03T13:56:16Z

For this use case, I prefer 1.

Dieterbe · 2019-06-04T21:42:56Z

Originally I was thinking of returning this information tied to the response as a whole. Now I'm thinking this could be even better by associating these stats to each returned series. So each returned series would have its own stats about how many and which roll ups and runtime consolidation were used to generate that particular series. We could then also tie this information in the UI directly to each series, rather than having the total numbers for the chart as a whole which are harder to tie back to the individual series.
It does mean the UI should accommodate this metadata-per-series otherwise this is pointless.

torkelo · 2019-06-05T04:49:12Z

If your going to add series meta data then please also add query index so grafana can now the originating query

Dieterbe · 2019-06-05T06:32:22Z

How do you plan to use/visualize this information? Note that there may be N series for 1 originating query.

As far as whether i'll do the metadata per-response or per-series, i'll see how feasible each are. But there seems benefits to both. do you have a preference?

torkelo · 2019-06-06T13:05:15Z

Panel Menu > Inspect > Opens side drawer with raw data, request response.

And maybe some special handling for showing series rollup meta data.

Dieterbe · 2019-06-06T15:56:57Z

notes from meeting:

scott/sean would be happy enough if this takes the form as text data available in the panel/query inspector, and maybe a small indicator in the panel denoting that rollup data was used (if applicable)
torkel likes the idea of having a multi-purpose (more abstract) way of datasources providing any kind of warnings back to the panel (I like this too. in fact i opened marker in panels to make user aware of data issues #6448 for this a while back)
whether the info is grouped per target or per panel doesn't matter much to scott/sean as long as it's there (note: the risk of per series may be that it gets overwhelming? maybe..). we will see how we can do it in MT...
it's mainly me, Dieter, who wants this to be a nice, polished feature. because i know how painfully time consuming it can be to troubleshoot consolidation/rollup issues. the nicer we make this experience, the more it will pay off. plus I think it would be a great feature to have in general for graphite. I will probably bug @torkelo and @daniellee to get some extra effort on this beyond what BB is asking for.

Dieterbe · 2019-07-29T11:57:32Z

note to self: would be nice also if we can flag why a rollup or runtime consolidation was triggered (i'm thinking specifically when it's due to max-points-per-req-soft)

Dieterbe · 2019-09-25T13:04:23Z

metrictank-side work for this has started in grafana/metrictank#1481

Dieterbe · 2019-10-07T19:14:33Z

this is now merged in MT. see above PR for details and output format. We will however, tweak the output format a bit to be more user friendly.

torkelo · 2019-11-28T15:55:23Z

So we plan to make this data available in the panel inspector drawer (that you access via panel menu).

Will this be ok?

torkelo · 2019-11-28T16:04:47Z

related to #20710

shalstea · 2019-12-02T12:33:18Z

I am not 100% sure what exactly that means. It sounds have if I need to use the panel menu to navigate to see the rollup information for the query. I think that would be OK, as long as there is a visual indicator, e.g. a R or something, to indicator that one is looking at rollup data.

It would be helpful to see a screen snapshot or two.

torkelo · 2019-12-02T12:48:32Z

There would be nothing visible in the panel, you would have to open the panel inspect drawer.

shalstea · 2019-12-02T12:56:50Z

Why not? Is it too difficult to parse the results to determine if you are hitting rollup data?

Can you provide a screen snapshot of what you will provide / show for MetricTank once one opens the panel inspect drawer?

torkelo · 2019-12-02T13:18:07Z

Currently the PanelChrome (Panel header) can only show data errors. We have an upcoming redesign of panel header where we will left align title and show info, error state icons to the right of the title. In that redesign we can add something maybe. But it would be an icon shown on all panels, so would be very intrusive (for metrictank) users.

As how to show this in the inspect drawer. It would be something like this:

But we hope to have that designed and shown a bit nicer.

shalstea · 2019-12-02T15:03:31Z

That's a pretty tough thing to interpret. I have no idea what exactly that means although I can guess. At a minimum I would like MetricTank to document each of the lines. But It would preferred if this was shown with more user friendly grammar. Maybe add a link to their documentation?

torkelo · 2019-12-02T15:41:46Z

think @Dieterbe has some ideas for how to visualize this data.

Next step is to make the inspect drawer (that allows you to view any panel raw data, request & response, & meta info):

#20710

Then we will add plugin hook so a data source plugin can visualize it's meta data in a specific way.

Dieterbe · 2019-12-02T16:23:04Z

think @Dieterbe has some ideas for how to visualize this data.

Yes, I will refine my thinking on the UI mockup posted above. See ticket grafana/metrictank#1551 I will work with Ryan and our internal UX people on this.

Dieterbe · 2019-12-03T20:11:06Z

@shalstea until we have the new UI for this, here are the docs which should hopefully explain everything better: https://github.com/grafana/metrictank/pull/1559/files

Dieterbe · 2019-12-18T20:30:46Z

UI proposal:
first of all, please have a look at grafana/metrictank#1551 (comment) wherein I describe the steps that series go through, series lineage metadata and how they relate to the processing steps.

Furthermore, note that:

each returned series in the response body has a metadata section
each metadata section (corresponding to a single returned series) may comprise of multiple lineage sections. Why? Because to return any given output series, we may have had to fetch/process many input series (e.g. sumSeries() ) and those queried input series may have different schemas, or normalization or runtime consolidation parameters. Specifically, for any output series, we look at all the series that "went into it", and for each distinct combination of lineage properties we create a distinct lineage section, along with the count of series corresponding to that combination of parameters.

Typically there will be 1 lineage section in each series metadata section, but it's possible for there to be several.

Let's take this hypothetical example:
sumSeries(foo,bar)

it could result in a response like so:

{
    "version": "v0.1",
    "meta": {
        "stats": {
            "executeplan.resolve-series.ms": 11,
            "executeplan.get-targets.ms": 3,
            "executeplan.prepare-series.ms": 0,
            "executeplan.plan-run.ms": 0,
            "executeplan.series-fetch.count": 17,
            "executeplan.points-fetch.count": 85,
            "executeplan.points-return.count": 85,
            "executeplan.cache-miss.count": 0,
            "executeplan.cache-hit-partial.count": 0,
            "executeplan.cache-hit.count": 0,
            "executeplan.chunks-from-tank.count": 17,
            "executeplan.chunks-from-cache.count": 0,
            "executeplan.chunks-from-store.count": 0
        }
    },
    "series": [
        {
            "target": "sumSeries(foo,bar)",
            "datapoints": [
......
            ],
            "meta": [
        {
                    "schema-name": "stats_global",
                    "schema-retentions": "1m:35d:2h:2,10min:120d,2h:2y:6h:2",
                    "archive-read": 1,
                    "archive-interval": 600,
                    "aggnum-norm": 1,
                    "consolidate-normfetch": "AverageConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "LastConsolidator",
                    "count": 1
                },
                {
                    "schema-name": "default",
                    "schema-retentions": "1s:5d:20min:5:1542274085,1min:30d:2h:1:true,5min:120d:6h:1:true,2h:2y:6h:2",
                    "archive-read": 2,
                    "archive-interval": 300
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a](https://user-images.githubusercontent.com/20774/71120851-83be5780-21dd-11ea-8ded-3b488d40cb62.png)
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a-two](https://user-images.githubusercontent.com/20774/71120853-8456ee00-21dd-11ea-9743-a8f505699af4.png)

,
                    "aggnum-norm": 2,
                    "consolidate-normfetch": "LastConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "AverageConsolidator",
                    "count": 158
                }
            ]
        }
    ]
}

the lineage could thus be visualized as shown below.
note that in both cases, the "Fetch step" is the same visualization as my earlier mockup.
except now we need to visualize the normalization and runtime consolidation steps as well, somehow

suggestion 1

i didn't know any better but just describe the normalization and runtime consolidation steps as text.

suggestion 2

this one tries to make the steps more visual.
after all, at fetch time, the data has an interval, and the normalization and runtime consolidation steps are all about how that resolution is further reduced. Not sure how to cleanly visualize that, but this feels a bit more "linear"

notes:

"schema-retentions" consists of comma separated archive specifications. for each archive we only care about the first and second fields. (the resolution and retention). some fields may not be provided
some archives may be set to non-ready or only ready for reads as of a certain timestamp. (see how in the 2nd lineage section, there is a 5th field that is set to a timestamp and "true" for some of the archives). if an archive is not ready, or not ready before a timestamp, MT would never read data from it. maybe visually we could shade those areas that are non-ready.

dprokop · 2019-12-18T20:32:00Z

@sarlinska the comment above might be interesting for you in the context of Inspect Drawer design

Dieterbe · 2019-12-18T20:41:53Z

actually, the diagram's time direction should probably match that of a timeseries panel. thus going back in time should be going back to the left.
bonus points because this shows the interval information much closer to the further steps that alter the intervals. this one is my favorite one

torkelo · 2020-01-14T11:46:22Z

Some progress on showing metric tank query meta in the inspect feature was completed in 6.6, but the panel inspector is still behind feature toggle and not ready. The changes to the panel header to show indication is not yet possible to start on, first we need to unify our panel headers, then redesign the panel header. This work is scheduled for 7.0

torkelo · 2020-02-24T16:52:56Z

Not sure what the rollup indicator should say, orange ball icon, and then a tooltip "Rollups was used in calculating result check query inspector for details".

And then in the query inspector how to we make sense of this:

   "meta": [
        {
                    "schema-name": "stats_global",
                    "schema-retentions": "1m:35d:2h:2,10min:120d,2h:2y:6h:2",
                    "archive-read": 1,
                    "archive-interval": 600,
                    "aggnum-norm": 1,
                    "consolidate-normfetch": "AverageConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "LastConsolidator",
                    "count": 1
                },
                {
                    "schema-name": "default",
                    "schema-retentions": "1s:5d:20min:5:1542274085,1min:30d:2h:1:true,5min:120d:6h:1:true,2h:2y:6h:2",
                    "archive-read": 2,
                    "archive-interval": 300
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a](https://user-images.githubusercontent.com/20774/71120851-83be5780-21dd-11ea-8ded-3b488d40cb62.png)
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a-two](https://user-images.githubusercontent.com/20774/71120853-8456ee00-21dd-11ea-9743-a8f505699af4.png)

,
                    "aggnum-norm": 2,
                    "consolidate-normfetch": "LastConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "AverageConsolidator",
                    "count": 158
                }
            ]

torkelo · 2020-03-18T12:01:56Z

First iteration of this can be tested in master build soon. The panel header icon placement and style is not final and the panel header will be getting an overhaul later in 7.0

Dieterbe · 2020-03-18T16:09:08Z

You imply there is more work coming specific to this feature. Is there a ticket to track this?

torkelo · 2020-04-05T14:58:13Z

Not yet, relates to panel header / icon / state design .

Will create issue coming week

Dieterbe · 2020-04-27T10:41:25Z

What's the link to the issue please?

bergquist added prio/support-subscription type/feature-request datasource/Graphite labels Apr 24, 2018

Dieterbe mentioned this issue Oct 10, 2018

indication of returned data is rollup data grafana/metrictank#1091

Closed

marefr added area/datasource and removed area/datasource labels Mar 30, 2019

robert-milan mentioned this issue May 21, 2019

Roadmap grafana/metrictank#1319

Open

27 tasks

Dieterbe mentioned this issue Jun 6, 2019

meta=true toggle for graphite datasource #17472

Closed

Dieterbe mentioned this issue Nov 26, 2019

design rollup indicator (Lineage information) UI grafana/metrictank#1551

Closed

daniellee added this to the 6.6 milestone Nov 28, 2019

daniellee assigned torkelo and ryantxu Nov 28, 2019

ryantxu mentioned this issue Dec 4, 2019

Inspector: support custom metadata display #20854

Merged

Dieterbe mentioned this issue Dec 13, 2019

customizable panel title bar #21078

Closed

torkelo modified the milestones: 6.6.0-beta1, 7.0 Jan 14, 2020

torkelo mentioned this issue Mar 11, 2020

Graphite: Rollup indicator and custom meta data inspector #22738

Merged

torkelo closed this as completed Mar 18, 2020

dprokop mentioned this issue Apr 1, 2020

Platform Roadmap (7.0) #23241

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement - indication of returned data is rollup data #11700

Enhancement - indication of returned data is rollup data #11700

shalstea commented Apr 23, 2018 •

edited by torkelo

Dieterbe commented Oct 8, 2018

torkelo commented Oct 9, 2018

Dieterbe commented Jun 2, 2019

bergquist commented Jun 3, 2019

Dieterbe commented Jun 4, 2019 •

edited

torkelo commented Jun 5, 2019

Dieterbe commented Jun 5, 2019

torkelo commented Jun 6, 2019

Dieterbe commented Jun 6, 2019 •

edited

Dieterbe commented Jul 29, 2019

Dieterbe commented Sep 25, 2019

Dieterbe commented Oct 7, 2019 •

edited

torkelo commented Nov 28, 2019

torkelo commented Nov 28, 2019

shalstea commented Dec 2, 2019

torkelo commented Dec 2, 2019

shalstea commented Dec 2, 2019

torkelo commented Dec 2, 2019

shalstea commented Dec 2, 2019 •

edited

torkelo commented Dec 2, 2019

Dieterbe commented Dec 2, 2019

Dieterbe commented Dec 3, 2019

Dieterbe commented Dec 18, 2019 •

edited

dprokop commented Dec 18, 2019

Dieterbe commented Dec 18, 2019 •

edited

torkelo commented Jan 14, 2020

torkelo commented Feb 24, 2020

torkelo commented Mar 18, 2020

Dieterbe commented Mar 18, 2020

torkelo commented Apr 5, 2020

Dieterbe commented Apr 27, 2020

Enhancement - indication of returned data is rollup data #11700

Enhancement - indication of returned data is rollup data #11700

Comments

shalstea commented Apr 23, 2018 • edited by torkelo

Dieterbe commented Oct 8, 2018

torkelo commented Oct 9, 2018

Dieterbe commented Jun 2, 2019

bergquist commented Jun 3, 2019

Dieterbe commented Jun 4, 2019 • edited

torkelo commented Jun 5, 2019

Dieterbe commented Jun 5, 2019

torkelo commented Jun 6, 2019

Dieterbe commented Jun 6, 2019 • edited

Dieterbe commented Jul 29, 2019

Dieterbe commented Sep 25, 2019

Dieterbe commented Oct 7, 2019 • edited

torkelo commented Nov 28, 2019

torkelo commented Nov 28, 2019

shalstea commented Dec 2, 2019

torkelo commented Dec 2, 2019

shalstea commented Dec 2, 2019

torkelo commented Dec 2, 2019

shalstea commented Dec 2, 2019 • edited

torkelo commented Dec 2, 2019

Dieterbe commented Dec 2, 2019

Dieterbe commented Dec 3, 2019

Dieterbe commented Dec 18, 2019 • edited

suggestion 1

suggestion 2

dprokop commented Dec 18, 2019

Dieterbe commented Dec 18, 2019 • edited

torkelo commented Jan 14, 2020

torkelo commented Feb 24, 2020

torkelo commented Mar 18, 2020

Dieterbe commented Mar 18, 2020

torkelo commented Apr 5, 2020

Dieterbe commented Apr 27, 2020

shalstea commented Apr 23, 2018 •

edited by torkelo

Dieterbe commented Jun 4, 2019 •

edited

Dieterbe commented Jun 6, 2019 •

edited

Dieterbe commented Oct 7, 2019 •

edited

shalstea commented Dec 2, 2019 •

edited

Dieterbe commented Dec 18, 2019 •

edited

Dieterbe commented Dec 18, 2019 •

edited