Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement - indication of returned data is rollup data #11700

Closed
shalstea opened this issue Apr 23, 2018 · 31 comments
Closed

Enhancement - indication of returned data is rollup data #11700

shalstea opened this issue Apr 23, 2018 · 31 comments

Comments

@shalstea
Copy link

shalstea commented Apr 23, 2018

Grafana 5.0.3
MetricTank

We have users with many Charts on a single dashboard. Depending on the cardinality of the data and the timerange MetricTank may return rolled up data (in our case, configured for hourly). This can be subtle as potentially only 1 or 2 graphs out of nine are rolled up.

Also when the overall dashboard date range changes, sometimes from as little as 12 hours to 24 hours, rollup data is used.

Updated:

Docs for Metrictank metadata: https://github.com/grafana/metrictank/blob/master/docs/http-api.md#metadata

It would be extremely nice if there were an indicator that rolled up data was used. A mouse over would provide the configuration, e.g. hourly, daily

@Dieterbe
Copy link
Contributor

Dieterbe commented Oct 8, 2018

an interesting question here is how do we represent this in case many series were queried (some people query anywhere between tens, hundreds and hundreds of thousands of series), and there may be several series returned as well, each corresponding to one or more queried series.

my proposal is to provide an indication of :

  1. which storage schemas rules were used
  2. add counts to mark how many series were queried corresponding to each archive specification of each schema.
  3. when runtime consolidation is used, specify to what resolution and using what function.

an example when you hit a few different series with different settings:

   foo.bar.baz:
5  raw: 1min
   rollup1: 5min:60d
   rollup2: 30min:100d

   baz:
10 raw: 10s -> runtime consolidation to 1min using avg
11 raw: 10s -> runtime consolidation to 1min using min

   default:
0  raw: 10s
15 rollup: 1min:10d
   rollup: 30min: 100d

additionally, i propose we track the used from/end time for each series read (remember, may be different due to movingAverage, timeShift, etc) and then in grafana display like this:
grafana-rollup-indicator
this makes it a bit easier to reason about what happened.

as far as providing this information back from graphite/metrictank to grafana, i propose we extend the response json object with a meta category, and put this info under data or something.
this allows us to later add meta information for other things (e.g. warnings when a shard was unavailable and returned data is incomplete)

@torkelo
Copy link
Member

torkelo commented Oct 9, 2018

as far as providing this information back from graphite/metrictank to grafana, i propose we extend the response json object with a meta category, a

That sounds like a good idea. Beyond visualising it like you propose I think the most important thing is to show that the data you are looking at has been rolled up using a specific time window & aggregation function.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 2, 2019

I plan to start working on this shortly.
the main thing to take into account is that a graphite render response is a json array of series. it is not a dict. this has been discussed lightly in grafana/metrictank#1130 as well.
so, to stay compatible with old/existing graphite clients, we must provide the meta section as an "opt-in" request flag to get a different response type (a dictionary with the extra section), and the best way to do this seems to be via a datasource configuration option. either:

  1. a "supports meta" toggle on the graphite datasource
  2. a separate metrictank datasource type which is identical to graphite, except also supports the meta flag
  3. as part of a version selector (which would require upstream graphite also receiving a PR to expose some of its own meta stats as well using the same mechanism)

looping in @DanCech to see if 3 is doable, if not 1 is probably simplest

regardless, we'll go ahead and start working on the implementation for MT.
I think this is something we can do without much/any up-front design, and just figure it out as we go along with the coding changes in MT. cc @davkal @bergquist

@bergquist
Copy link
Contributor

For this use case, I prefer 1.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 4, 2019

Originally I was thinking of returning this information tied to the response as a whole. Now I'm thinking this could be even better by associating these stats to each returned series. So each returned series would have its own stats about how many and which roll ups and runtime consolidation were used to generate that particular series. We could then also tie this information in the UI directly to each series, rather than having the total numbers for the chart as a whole which are harder to tie back to the individual series.
It does mean the UI should accommodate this metadata-per-series otherwise this is pointless.

@torkelo
Copy link
Member

torkelo commented Jun 5, 2019

If your going to add series meta data then please also add query index so grafana can now the originating query

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 5, 2019

How do you plan to use/visualize this information? Note that there may be N series for 1 originating query.

As far as whether i'll do the metadata per-response or per-series, i'll see how feasible each are. But there seems benefits to both. do you have a preference?

@torkelo
Copy link
Member

torkelo commented Jun 6, 2019

Panel Menu > Inspect > Opens side drawer with raw data, request response.

And maybe some special handling for showing series rollup meta data.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 6, 2019

notes from meeting:

  • scott/sean would be happy enough if this takes the form as text data available in the panel/query inspector, and maybe a small indicator in the panel denoting that rollup data was used (if applicable)
  • torkel likes the idea of having a multi-purpose (more abstract) way of datasources providing any kind of warnings back to the panel (I like this too. in fact i opened marker in panels to make user aware of data issues #6448 for this a while back)
  • whether the info is grouped per target or per panel doesn't matter much to scott/sean as long as it's there (note: the risk of per series may be that it gets overwhelming? maybe..). we will see how we can do it in MT...
  • it's mainly me, Dieter, who wants this to be a nice, polished feature. because i know how painfully time consuming it can be to troubleshoot consolidation/rollup issues. the nicer we make this experience, the more it will pay off. plus I think it would be a great feature to have in general for graphite. I will probably bug @torkelo and @daniellee to get some extra effort on this beyond what BB is asking for.

@Dieterbe
Copy link
Contributor

note to self: would be nice also if we can flag why a rollup or runtime consolidation was triggered (i'm thinking specifically when it's due to max-points-per-req-soft)

@Dieterbe
Copy link
Contributor

metrictank-side work for this has started in grafana/metrictank#1481

@Dieterbe
Copy link
Contributor

Dieterbe commented Oct 7, 2019

this is now merged in MT. see above PR for details and output format. We will however, tweak the output format a bit to be more user friendly.

@torkelo
Copy link
Member

torkelo commented Nov 28, 2019

So we plan to make this data available in the panel inspector drawer (that you access via panel menu).

Will this be ok?

@torkelo
Copy link
Member

torkelo commented Nov 28, 2019

related to #20710

@shalstea
Copy link
Author

shalstea commented Dec 2, 2019

I am not 100% sure what exactly that means. It sounds have if I need to use the panel menu to navigate to see the rollup information for the query. I think that would be OK, as long as there is a visual indicator, e.g. a R or something, to indicator that one is looking at rollup data.

It would be helpful to see a screen snapshot or two.

@torkelo
Copy link
Member

torkelo commented Dec 2, 2019

There would be nothing visible in the panel, you would have to open the panel inspect drawer.

@shalstea
Copy link
Author

shalstea commented Dec 2, 2019

Why not? Is it too difficult to parse the results to determine if you are hitting rollup data?

Can you provide a screen snapshot of what you will provide / show for MetricTank once one opens the panel inspect drawer?

@torkelo
Copy link
Member

torkelo commented Dec 2, 2019

Currently the PanelChrome (Panel header) can only show data errors. We have an upcoming redesign of panel header where we will left align title and show info, error state icons to the right of the title. In that redesign we can add something maybe. But it would be an icon shown on all panels, so would be very intrusive (for metrictank) users.

As how to show this in the inspect drawer. It would be something like this:
image

But we hope to have that designed and shown a bit nicer.

@shalstea
Copy link
Author

shalstea commented Dec 2, 2019

That's a pretty tough thing to interpret. I have no idea what exactly that means although I can guess. At a minimum I would like MetricTank to document each of the lines. But It would preferred if this was shown with more user friendly grammar. Maybe add a link to their documentation?

@torkelo
Copy link
Member

torkelo commented Dec 2, 2019

think @Dieterbe has some ideas for how to visualize this data.

Next step is to make the inspect drawer (that allows you to view any panel raw data, request & response, & meta info):

#20710

Then we will add plugin hook so a data source plugin can visualize it's meta data in a specific way.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 2, 2019

think @Dieterbe has some ideas for how to visualize this data.

Yes, I will refine my thinking on the UI mockup posted above. See ticket grafana/metrictank#1551 I will work with Ryan and our internal UX people on this.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 3, 2019

@shalstea until we have the new UI for this, here are the docs which should hopefully explain everything better: https://github.com/grafana/metrictank/pull/1559/files

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 18, 2019

UI proposal:
first of all, please have a look at grafana/metrictank#1551 (comment) wherein I describe the steps that series go through, series lineage metadata and how they relate to the processing steps.

Furthermore, note that:

  1. each returned series in the response body has a metadata section
  2. each metadata section (corresponding to a single returned series) may comprise of multiple lineage sections. Why? Because to return any given output series, we may have had to fetch/process many input series (e.g. sumSeries() ) and those queried input series may have different schemas, or normalization or runtime consolidation parameters. Specifically, for any output series, we look at all the series that "went into it", and for each distinct combination of lineage properties we create a distinct lineage section, along with the count of series corresponding to that combination of parameters.

Typically there will be 1 lineage section in each series metadata section, but it's possible for there to be several.

Let's take this hypothetical example:
sumSeries(foo,bar)

it could result in a response like so:

{
    "version": "v0.1",
    "meta": {
        "stats": {
            "executeplan.resolve-series.ms": 11,
            "executeplan.get-targets.ms": 3,
            "executeplan.prepare-series.ms": 0,
            "executeplan.plan-run.ms": 0,
            "executeplan.series-fetch.count": 17,
            "executeplan.points-fetch.count": 85,
            "executeplan.points-return.count": 85,
            "executeplan.cache-miss.count": 0,
            "executeplan.cache-hit-partial.count": 0,
            "executeplan.cache-hit.count": 0,
            "executeplan.chunks-from-tank.count": 17,
            "executeplan.chunks-from-cache.count": 0,
            "executeplan.chunks-from-store.count": 0
        }
    },
    "series": [
        {
            "target": "sumSeries(foo,bar)",
            "datapoints": [
......
            ],
            "meta": [
        {
                    "schema-name": "stats_global",
                    "schema-retentions": "1m:35d:2h:2,10min:120d,2h:2y:6h:2",
                    "archive-read": 1,
                    "archive-interval": 600,
                    "aggnum-norm": 1,
                    "consolidate-normfetch": "AverageConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "LastConsolidator",
                    "count": 1
                },
                {
                    "schema-name": "default",
                    "schema-retentions": "1s:5d:20min:5:1542274085,1min:30d:2h:1:true,5min:120d:6h:1:true,2h:2y:6h:2",
                    "archive-read": 2,
                    "archive-interval": 300
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a](https://user-images.githubusercontent.com/20774/71120851-83be5780-21dd-11ea-8ded-3b488d40cb62.png)
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a-two](https://user-images.githubusercontent.com/20774/71120853-8456ee00-21dd-11ea-9743-a8f505699af4.png)

,
                    "aggnum-norm": 2,
                    "consolidate-normfetch": "LastConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "AverageConsolidator",
                    "count": 158
                }
            ]
        }
    ]
}

the lineage could thus be visualized as shown below.
note that in both cases, the "Fetch step" is the same visualization as my earlier mockup.
except now we need to visualize the normalization and runtime consolidation steps as well, somehow

suggestion 1

i didn't know any better but just describe the normalization and runtime consolidation steps as text.

46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a

suggestion 2

this one tries to make the steps more visual.
after all, at fetch time, the data has an interval, and the normalization and runtime consolidation steps are all about how that resolution is further reduced. Not sure how to cleanly visualize that, but this feels a bit more "linear"
46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a-two

notes:

  • "schema-retentions" consists of comma separated archive specifications. for each archive we only care about the first and second fields. (the resolution and retention). some fields may not be provided
  • some archives may be set to non-ready or only ready for reads as of a certain timestamp. (see how in the 2nd lineage section, there is a 5th field that is set to a timestamp and "true" for some of the archives). if an archive is not ready, or not ready before a timestamp, MT would never read data from it. maybe visually we could shade those areas that are non-ready.

@dprokop
Copy link
Member

dprokop commented Dec 18, 2019

@sarlinska the comment above might be interesting for you in the context of Inspect Drawer design

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 18, 2019

actually, the diagram's time direction should probably match that of a timeseries panel. thus going back in time should be going back to the left.
bonus points because this shows the interval information much closer to the further steps that alter the intervals. this one is my favorite one
46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a-two

@torkelo torkelo modified the milestones: 6.6.0-beta1, 7.0 Jan 14, 2020
@torkelo
Copy link
Member

torkelo commented Jan 14, 2020

Some progress on showing metric tank query meta in the inspect feature was completed in 6.6, but the panel inspector is still behind feature toggle and not ready. The changes to the panel header to show indication is not yet possible to start on, first we need to unify our panel headers, then redesign the panel header. This work is scheduled for 7.0

@torkelo
Copy link
Member

torkelo commented Feb 24, 2020

Not sure what the rollup indicator should say, orange ball icon, and then a tooltip "Rollups was used in calculating result check query inspector for details".

And then in the query inspector how to we make sense of this:

   "meta": [
        {
                    "schema-name": "stats_global",
                    "schema-retentions": "1m:35d:2h:2,10min:120d,2h:2y:6h:2",
                    "archive-read": 1,
                    "archive-interval": 600,
                    "aggnum-norm": 1,
                    "consolidate-normfetch": "AverageConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "LastConsolidator",
                    "count": 1
                },
                {
                    "schema-name": "default",
                    "schema-retentions": "1s:5d:20min:5:1542274085,1min:30d:2h:1:true,5min:120d:6h:1:true,2h:2y:6h:2",
                    "archive-read": 2,
                    "archive-interval": 300
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a](https://user-images.githubusercontent.com/20774/71120851-83be5780-21dd-11ea-8ded-3b488d40cb62.png)
![46619761-c82d6e00-cad7-11e8-8605-8d31ebf4ea2a-two](https://user-images.githubusercontent.com/20774/71120853-8456ee00-21dd-11ea-9743-a8f505699af4.png)

,
                    "aggnum-norm": 2,
                    "consolidate-normfetch": "LastConsolidator",
                    "aggnum-rc": 10,
                    "consolidate-rc": "AverageConsolidator",
                    "count": 158
                }
            ]

@torkelo
Copy link
Member

torkelo commented Mar 18, 2020

First iteration of this can be tested in master build soon. The panel header icon placement and style is not final and the panel header will be getting an overhaul later in 7.0

@torkelo torkelo closed this as completed Mar 18, 2020
@Dieterbe
Copy link
Contributor

You imply there is more work coming specific to this feature. Is there a ticket to track this?

@dprokop dprokop mentioned this issue Apr 1, 2020
14 tasks
@torkelo
Copy link
Member

torkelo commented Apr 5, 2020

Not yet, relates to panel header / icon / state design .

Will create issue coming week

@Dieterbe
Copy link
Contributor

What's the link to the issue please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants