Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Rollup indicator #1481

Merged
merged 12 commits into from
Oct 7, 2019
Merged

Rollup indicator #1481

merged 12 commits into from
Oct 7, 2019

Conversation

Dieterbe
Copy link
Contributor

@Dieterbe Dieterbe commented Sep 25, 2019

fix #1091

Here's my vision:
for every series, there's a SeriesSource. The SeriesSource describes how the data was generated through the concept of SeriesSourceProperties (which describe which storage-schemas rules applied to our data, which archive we read from, what kind of normalization and runtime consolidation was applied) along with a count of series for those properties.

In the most simple case (fetching 1 timeseries, and not running a function like sumSeries() or no merging together because of an interval change), the source is only one set of properties with a count of 1.

When things get interesting (mergeSeries, sumSeries(), etc) is when we need to combine multiple series to create one new output series. in all these cases we merge the SeriesSources together, adding up the counts based on their SeriesSourceProperties.

Ultimately then, for any series returned in the http response body, we know if it was composed out of just 1 source timeseries or multiple, and how they were generated.

Conceptually I think it's fairly simple. the main challenge is updating all processing functions accordingly (for now i've only done a few to showcase the idea) and making sure not to make any permanent changes to SeriesSource maps when we may reuse series in the processing pipeline

the output format for now looks like so:
(notice how we already had response-global metadata, and now we're adding per series metadata under "source")
note that instead of (or in addition to) schema-id i plan to return the actual schemas rule being used.

~ ❯❯❯ http  'http://localhost:6060/render?target=metrictank.stats.docker-env.default.input.carbon.metricdata.received.counter32&from=-5s&meta=true'
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 399
Content-Type: application/json
Date: Wed, 25 Sep 2019 12:57:19 GMT
Vary: Accept-Encoding

{
    "meta": {
        "stats": {
            "executeplan.cache-hit-partial.count": 0,
            "executeplan.cache-hit.count": 0,
            "executeplan.cache-miss.count": 0,
            "executeplan.chunks-from-cache.count": 0,
            "executeplan.chunks-from-store.count": 0,
            "executeplan.chunks-from-tank.count": 1,
            "executeplan.get-targets.ms": 0,
            "executeplan.plan-run.ms": 0,
            "executeplan.points-fetch.count": 5,
            "executeplan.points-return.count": 5,
            "executeplan.prepare-series.ms": 0,
            "executeplan.resolve-series.ms": 0,
            "executeplan.series-fetch.count": 1
        }
    },
    "series": [
        {
            "datapoints": [
                [
                    8152,
                    1569416235
                ],
                [
                    8559,
                    1569416236
                ],
                [
                    8864,
                    1569416237
                ],
                [
                    9162,
                    1569416238
                ],
                [
                    9450,
                    1569416239
                ]
            ],
            "source": [
                {
                    "aggnum-norm": 1,
                    "aggnum-rc": 0,
                    "archive": 0,
                    "consolidate-normfetch": "AverageConsolidator",
                    "consolidate-rc": "NoneConsolidator",
                    "count": 1,
                    "schema-id": 0
                }
            ],
            "tags": {
                "name": "metrictank.stats.docker-env.default.input.carbon.metricdata.received.counter32"
            },
            "target": "metrictank.stats.docker-env.default.input.carbon.metricdata.received.counter32"
        }
    ],
    "version": "v0.1"
}

cc @torkelo

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Sep 30, 2019

I have:

  • refactored it to bypass the msgp encoding problem, instead of non-string-keyed map, we now uses slices
  • cleaned up the code. instead of "source" it is now a meta section for the series
  • updated all expression functions up until per-second

left todo:

  • make sure it works across cluster, and in cluster while upgrading from older version
  • update remaining expression functions and do some interesting tests with processing functions that adjust the meta sections
  • test for cases where series from different archives get used

feedback anyone? @torkelo @shanson7

the per series metadata looks like this now.
notes:

  • format will be changed later to show actual storage-schemas rule used
  • aggNum means number of points consolidated into 1 point. (if >1, it means runtime consolidation was applied)
*/
	SchemaID              uint16                     // id of storage-schemas rule this series corresponds to
	Archive               uint8                      // which archive was being read from
	AggNumNorm            uint32                     // aggNum for normalization
	AggNumRC              uint32                     // aggNum runtime consolidation
	ConsolidatorNormFetch consolidation.Consolidator // consolidator used for normalization and reading from store (if applicable)
	ConsolidatorRC        consolidation.Consolidator // consolidator used for runtime consolidation (if applicable)
	Count                 uint32                     // number of series corresponding to these properties
*/


$ http  'http://localhost:6060/render?target=sum(metrictank.stats.docker-env.default.input.carbon.*.received.counter32)&from=-5s&meta=true'
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 400
Content-Type: application/json
Date: Mon, 30 Sep 2019 19:48:33 GMT
Vary: Accept-Encoding

{
    "meta": {
        "stats": {
            "executeplan.cache-hit-partial.count": 0,
            "executeplan.cache-hit.count": 0,
            "executeplan.cache-miss.count": 0,
            "executeplan.chunks-from-cache.count": 0,
            "executeplan.chunks-from-store.count": 0,
            "executeplan.chunks-from-tank.count": 3,
            "executeplan.get-targets.ms": 0,
            "executeplan.plan-run.ms": 0,
            "executeplan.points-fetch.count": 15,
            "executeplan.points-return.count": 15,
            "executeplan.prepare-series.ms": 0,
            "executeplan.resolve-series.ms": 1,
            "executeplan.series-fetch.count": 3
        }
    },
    "series": [
        {
            "datapoints": [
                [
                    200503,
                    1569872909
                ],
                [
                    204823,
                    1569872910
                ],
                [
                    209144,
                    1569872911
                ],
                [
                    213463,
                    1569872912
                ],
                [
                    217778,
                    1569872913
                ]
            ],
            "meta": [
                {
                    "aggnum-norm": 1,
                    "aggnum-rc": 0,
                    "archive": 0,
                    "consolidate-normfetch": "AverageConsolidator",
                    "consolidate-rc": "NoneConsolidator",
                    "count": 3,
                    "schema-id": 0
                }
            ],
            "target": "sumSeries(metrictank.stats.docker-env.default.input.carbon.*.received.counter32)"
        }
    ],
    "version": "v0.1"
}

@shanson7
Copy link
Collaborator

What happens when the request gets proxied to graphite-web?

@Dieterbe
Copy link
Contributor Author

What happens when the request gets proxied to graphite-web?

graphite-web will not ask metrictank for metadata, so it won't get it, it won't add any by itself, and will return the traditional response format.

to make this work with graphite, we need to:

  • make graphite request meta-enabled data from metrictank
  • transport the metadata through its internal request path, updating it where necessary (e.g. normalize(), runtime consolidation, etc)
  • processing functions should update it like MT does it (all functions, including the ones that MT already supports because to get a feature like this merged in graphite it should be implemented completely, and also MT proxies entire requests so if unspportedinMT(sumSeries(foo.*)) gets called, graphite will still execute the sumSeries)

This is a very significant amount of work. I would much rather spend our resources in porting over more graphite functions to metrictank.
Alternatively we could probably simplify this feature to reduce the amount of work with Graphite (e.g. make it a simple global "we have had to do normalization or runtime consolidation flag" but that would still mean touching many functions, and would make this feature much more like a one-off hack rather than a clean solution that integrates well with the rest of the stack.

@shanson7
Copy link
Collaborator

I would much rather spend our resources in porting over more graphite functions to metrictank.

I agree with this sentiment (for a number of reasons) but I think it's worth thinking on.

Having &meta=true sometimes return the new format will be annoying for clients since they will have to conditionally parse two different formats. Perhaps the grafana side needs to handle this anyway (since real graphite set ups wouldn't support the "meta" concept).

Also, the Rollup indicator will not be something that can be relied on, since it won't indicate if the request was proxied (which itself is nice to know). Again, perhaps grafana could make the assumption that the request was handled by graphite proper if the old format was returned.

I don't think that initial versions of this feature need to handle this, but perhaps there could be something that metrictank can do on the way back from graphite (since it is proxying the response back) to smooth some of this?

@Dieterbe
Copy link
Contributor Author

Having &meta=true sometimes return the new format will be annoying for clients since they will have to conditionally parse two different formats. Perhaps the grafana side needs to handle this anyway (since real graphite set ups wouldn't support the "meta" concept).

yes. note that is is already the case for the stats.

perhaps there could be something that metrictank can do on the way back from graphite (since it is proxying the response back) to smooth some of this?

right. do you mean it could modify the response and insert a meta section "sorry we don't have any metadata". in my view this is no better than grafana having an additional indicator state for "no metadata available" ?

@shanson7
Copy link
Collaborator

do you mean it could modify the response

Yeah, something like that. It could indicate in a canonical format that the request was, in fact, proxied. It wouldn't really help with the rollup indication, but it would provide some context. I'm not saying it should be done this way, just trying to discuss options

@Dieterbe Dieterbe force-pushed the rollup-indicator-1091 branch 2 times, most recently from e278eb4 to 409a490 Compare October 1, 2019 04:25
this allows us to represent it smaller in memory
properties regarding:
fetch, normalization, function processing, and runtime consolidation.
@Dieterbe Dieterbe force-pushed the rollup-indicator-1091 branch 2 times, most recently from 3af4ea8 to 08c858c Compare October 1, 2019 13:02
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 2, 2019

feedback from awoods:

  • most users don't need all this information (dieter counterpoint: information seems cheap to collect, and is there for the few powerusers who can use it. grafana can provide a simple UI that just shows , for example, whether there was any runtime consolidation or not)
  • make the information easier to understand (e.g. mention interval changes rathar than aggNum numbers, structure the fields more in a way that clarifies the processing that was applied)

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 3, 2019

  • make sure it works across cluster, and in cluster while upgrading from older version -> checked. counts get added up across cluster. older nodes obviously don't return metadata in render responses, and when their responses are aggregated by a newer node, their lack of metadata is treated as count=0
  • test for cases where series with different intervals get merged -> looks good. 2 sets of properties, differentiated by their aggnum-norm, and the consolidator respects consolidateBy (subject to availability of requested aggregate)

because our messagepack library doesn't like (de)serializing maps with
non-string keys. This is equivalent and avoids serializing or
interop issues with json.
* update docs to include tags and meta section
* fix a bunch of bugs where previously data wasn't being deeply copied
* cleanly copy the meta section where needed
* clean up code with some helper functions
* fix cases where QueryFrom/QueryTo was not being set
* simplify code
* don't like constructing new series listing all properties explicitly, because when we add a new series property, it can be forgotten or we must update all render functions
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 7, 2019

force push to take out this commit:

Refs: v0.13.0-44-g3a147df1
Author:     Dieter Plaetinck <[email protected]>
AuthorDate: Wed Sep 25 20:04:24 2019 +0200
Commit:     Dieter Plaetinck <[email protected]>
CommitDate: Tue Oct 1 08:36:22 2019 -0400

    DO NOT MERGE - to test half-upgraded cluster
---
 docker/docker-cluster/docker-compose.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docker/docker-cluster/docker-compose.yml b/docker/docker-cluster/docker-compose.yml
index dfa5ebee..4143c640 100644
--- a/docker/docker-cluster/docker-compose.yml
+++ b/docker/docker-cluster/docker-compose.yml
@@ -53,7 +53,7 @@ services:

   metrictank2:
     hostname: metrictank2
-    image: grafana/metrictank
+    image: grafana/metrictank:v0.13.0
     expose:
      - 6060
     volumes:
@@ -78,7 +78,7 @@ services:

   metrictank3:
     hostname: metrictank3
-    image: grafana/metrictank
+    image: grafana/metrictank:v0.13.0
     expose:
      - 6060
     volumes:

@Dieterbe Dieterbe changed the title WIP Rollup indicator Rollup indicator Oct 7, 2019
api/models/series.go Outdated Show resolved Hide resolved
Co-Authored-By: Anthony Woods <[email protected]>
@Dieterbe Dieterbe merged commit 61b5e89 into master Oct 7, 2019
@Dieterbe Dieterbe deleted the rollup-indicator-1091 branch October 7, 2019 19:13
@Dieterbe Dieterbe modified the milestones: sprint-2, sprint-1 Oct 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

indication of returned data is rollup data
3 participants