Document schema conversion on data import #1594

replay · 2020-01-09T02:25:21Z

documents how the whisper importer and how the schema conversion logic works.
it's not simple, so i tried to illustrate it with examples.

fixes #1583

Dieterbe · 2020-01-09T15:51:41Z

docs/whisper-data-import.md

+
+### Schema conversion
+
+The whisper importer reader requires the user to provide the path to the storage schemas used at the import destination, it then reads and parses those schemas. During the import it iterates over all the whisper files that need to be imported and processes each of them as following:


Let's make it very clear this is NOT the schemas file used locally on the graphite setup.

Dieterbe · 2020-01-09T16:27:03Z

docs/schema-conversion-logic.md

+
+This is how the schema conversion logic generates a destination archive:
+
+ * In the set of source archives which it has received as input, it finds the archive with the shortest retention which still contains enough data to satisfy the retention of the destination archive


does that mean in this case, that it will try to use the first archive? that seems suboptimal:

dest: 15s:1d src: 1s:1d,15s:5d

worse what if the src is 10s:1d,15s:5d ? It can't convert 10s to 15s

I think you just found a bug. I was going through that scenario where 1s:1d,15s:5d gets converted to 15s:1d, and it doesn't produce the expected result. let me write a unit test for that and then fix it

I'll consider that high prio because we're planning to use the schema conversions in a prod instance soon. I don't think we'll have to do a conversion which hits this bug there, afaict, but i'd still rather get it fixed asap

not to distract from the bug here, but 10s:1d,15s:5d isnt a valid retention policy, at least not when using Whisper. https://graphite.readthedocs.io/en/latest/whisper.html#archives-retention-and-precision

the precision of a longer retention archive must be divisible by precision of next lower retention archive.

Should we also enforce this in Metrictank?

good catch -> #1599

Dieterbe · 2020-01-09T16:30:24Z

docs/schema-conversion-logic.md

+This is how the schema conversion logic generates a destination archive:
+
+ * In the set of source archives which it has received as input, it finds the archive with the shortest retention which still contains enough data to satisfy the retention of the destination archive
+ * In the same set of source archives, it finds the archive with the longest retention that still has a higher or equal resolution as the destination archive


similarly here, will it pick the 2nd one? the first one seems better suited

dest: 4h:1y src: 2h:2y,3h:3y

I agree that if the 1st (high resolution) one has enough data, then the result is going to be better if we use it to re-generate the rollup with the adjusted aggregation span. but at the moment it would use the 2nd one and generate an approximation based on it. so this correctly documents the behavior, which we can change in the future with another PR

ok, does your existing PR fix it? link to it here. or does it not? -> can you create a new ticket?

docs/schema-conversion-logic.md

Dieterbe · 2020-01-09T16:44:46Z

docs/schema-conversion-logic.md

+#### Increasing the interval
+
+In the above described process, it can happen that some of the used input archives have a lower interval than the destination archive. In this case the conversion process will use the same aggregation logic as Metrictank uses internally to generate rollups to decrease the resolution (increase interval) of the input archive(s) to make it match the destination archive.
+


what happens when going from interval 10s to 15s?
or 4m to 5m ?

It still does an approximation which is certainly not perfect, but as good as possible. When #1609 is done then this should be a bit smarter and regenerate those archives from a higher resolution input, if possible. but in some situations there may not be any higher resolution input available, in these cases it will still do such an "unclean" conversion

Dieterbe · 2020-01-09T16:49:10Z

it's not simple, so i tried to illustrate it with examples.

It is not simple :D this looks pretty good. the examples help a lot

docs/schema-conversion-logic.md

docs/whisper-data-import.md

docs/schema-conversion-logic.md

fitzoh · 2020-01-14T03:34:29Z

It is not simple :D this looks pretty good. the examples help a lot

Agreed, good call on the examples

replay · 2020-01-17T20:40:56Z

Is there something else that needs to be changed before merging this? I know that it currently documents a behavior which is not optimal, as concluded in the comments, but it does accurately document the current behavior. Once we improve the behavior, then we can update the doc to reflect the improved behavior accordingly. There is already an issue to do so #1609

docs/schema-conversion-logic.md

fitzoh

Is there something else that needs to be changed before merging this? I know that it currently documents a behavior which is not optimal, as concluded in the comments, but it does accurately document the current behavior. Once we improve the behavior, then we can update the doc to reflect the improved behavior accordingly. There is already an issue to do so #1609

I'll defer to @Dieterbe , but aside from my minor/probably skippable comment, I don't see any issues

Dieterbe · 2020-01-27T11:37:10Z

README.md

@@ -78,6 +78,8 @@ Otherwise data loss of current chunks will be incurred. See [operations guide](
 * [Graphite](https://github.com/grafana/metrictank/blob/master/docs/graphite.md)
 * [Metadata](https://github.com/grafana/metrictank/blob/master/docs/metadata.md)
 * [Tags](https://github.com/grafana/metrictank/blob/master/docs/tags.md)
+* [Whisper data import](https://github.com/grafana/metrictank/blob/master/docs/whisper-data-import.md)
+* [Schema conversion logic](https://github.com/grafana/metrictank/blob/master/docs/schema-conversion-logic.md)


these 2 are way too related to be split into separate features.
can't we just have 1 page about data importing, with a section about whisper imports, and a section about schema conversion

Sure, I just split it because I figured it is likely that sooner or later the schema conversion logic would also be used for something other than the Whisper importer. But if that happens then we can just split it again.

I have consolidated these two pages

docs/whisper-data-import.md

Co-Authored-By: Dieter Plaetinck <[email protected]>

Dieterbe · 2020-01-28T17:07:42Z

docs/data-importing.md

+* Generates chunks from the points which have been generated by the schema conversion logic
+* Chunks are then sent to the importer-writer, which writes them into the store used by the destination cluster
+
+Note that this conversion allows to convert from one storage-schema to another, but it does not support changes to the storage-aggregations.


this could use more clarification.
If src aggregation says "avg", then we can't create "max". does that mean it will error? what will happen?
what happens if we want, say "avg" and "sum", or "avg", "sum" and "cnt"?

none of the tools or functions in the code allow the user to specify what aggregation functions they want to be used. all the whisper-importer-reader currently does is to read the whisper header (which includes the aggregation function) and then it uses that aggregation function.

if somebody would import data into a Metrictank which has a storage-aggregations.conf that specifies different aggregation functions than were used in the whisper files, then the resulting situation would be the same as if another Metrictank with different storage-aggregations.conf would have persisted its rollups in the store, the data just wouldn't be found.

OK, this is super useful. can you please add this to the document?

i have added it bff5e58

Dieterbe · 2020-01-28T17:08:27Z

docs/data-importing.md

+
+# Schema conversion logic
+
+Metrictank comes with schema conversion logic (`mdata/importer/converter.go`) which can be used to convert data from one storage schema into another one. This is currently only used by the Whisper importer, but may in the future also be used to import from other types of data sources.


our end user docs should not talk about specific source code.
if we want to describe our code in detail to developers, then it should go in devdocs.

I have removed the reference to that file.

replay added 2 commits January 4, 2020 07:27

document schema conversion

8d1e109

clarification

cd949a9

replay force-pushed the document_whisper_importer branch from 23f52dc to 8d1e109 Compare January 9, 2020 14:45

replay changed the title ~~[wip] document schema conversion~~ Document schema conversion Jan 9, 2020

Dieterbe reviewed Jan 9, 2020

View reviewed changes

docs/schema-conversion-logic.md Outdated Show resolved Hide resolved

Dieterbe reviewed Jan 9, 2020

View reviewed changes

Dieterbe mentioned this pull request Jan 13, 2020

enforce divisibility of rollup sequences #1599

Closed

fitzoh reviewed Jan 14, 2020

View reviewed changes

docs/schema-conversion-logic.md Outdated Show resolved Hide resolved

fitzoh reviewed Jan 14, 2020

View reviewed changes

docs/whisper-data-import.md Outdated Show resolved Hide resolved

fitzoh reviewed Jan 14, 2020

View reviewed changes

docs/schema-conversion-logic.md Outdated Show resolved Hide resolved

fitzoh reviewed Jan 14, 2020

View reviewed changes

docs/schema-conversion-logic.md Outdated Show resolved Hide resolved

fitzoh reviewed Jan 14, 2020

View reviewed changes

docs/schema-conversion-logic.md Outdated Show resolved Hide resolved

replay added 4 commits January 17, 2020 17:44

add link

e56a773

grammar and minor fixes

4fca292

remove implementational detail

41467fe

align spacing in examples

3f71d6f

replay force-pushed the document_whisper_importer branch from e583517 to 3f71d6f Compare January 17, 2020 20:45

fitzoh reviewed Jan 17, 2020

View reviewed changes

docs/schema-conversion-logic.md Outdated Show resolved Hide resolved

fitzoh approved these changes Jan 17, 2020

View reviewed changes

rename top title

a286447

Dieterbe reviewed Jan 27, 2020

View reviewed changes

docs/whisper-data-import.md Outdated Show resolved Hide resolved

Docs formatting

42d877e

Co-Authored-By: Dieter Plaetinck <[email protected]>

replay force-pushed the document_whisper_importer branch from 24cca18 to 42d877e Compare January 28, 2020 14:58

consolidate the two pages into one

e0663eb

Dieterbe reviewed Jan 28, 2020

View reviewed changes

replay added 2 commits January 28, 2020 14:19

remove link to source code

2784ba0

explain what happens if agg method does not match

bff5e58

Dieterbe changed the title ~~Document schema conversion~~ Document schema conversion on data import Jan 28, 2020

Dieterbe merged commit d37ddf3 into master Jan 28, 2020

Dieterbe deleted the document_whisper_importer branch January 28, 2020 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document schema conversion on data import #1594

Document schema conversion on data import #1594

replay commented Jan 9, 2020

Dieterbe Jan 9, 2020

Dieterbe Jan 9, 2020

replay Jan 10, 2020

replay Jan 10, 2020

replay Jan 10, 2020

woodsaj Jan 10, 2020

Dieterbe Jan 13, 2020

Dieterbe Jan 9, 2020

replay Jan 14, 2020 •

edited

Dieterbe Jan 14, 2020

replay Jan 15, 2020

Dieterbe Jan 9, 2020

replay Jan 17, 2020

Dieterbe commented Jan 9, 2020

fitzoh commented Jan 14, 2020

replay commented Jan 17, 2020

fitzoh left a comment

Dieterbe Jan 27, 2020

replay Jan 28, 2020

replay Jan 28, 2020

Dieterbe Jan 28, 2020

replay Jan 28, 2020

Dieterbe Jan 28, 2020

replay Jan 28, 2020

Dieterbe Jan 28, 2020

replay Jan 28, 2020


		### Schema conversion

		The whisper importer reader requires the user to provide the path to the storage schemas used at the import destination, it then reads and parses those schemas. During the import it iterates over all the whisper files that need to be imported and processes each of them as following:


		This is how the schema conversion logic generates a destination archive:

		* In the set of source archives which it has received as input, it finds the archive with the shortest retention which still contains enough data to satisfy the retention of the destination archive

		#### Increasing the interval

		In the above described process, it can happen that some of the used input archives have a lower interval than the destination archive. In this case the conversion process will use the same aggregation logic as Metrictank uses internally to generate rollups to decrease the resolution (increase interval) of the input archive(s) to make it match the destination archive.


		# Schema conversion logic

		Metrictank comes with schema conversion logic (`mdata/importer/converter.go`) which can be used to convert data from one storage schema into another one. This is currently only used by the Whisper importer, but may in the future also be used to import from other types of data sources.

Document schema conversion on data import #1594

Document schema conversion on data import #1594

Conversation

replay commented Jan 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Jan 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Jan 9, 2020

fitzoh commented Jan 14, 2020

replay commented Jan 17, 2020

fitzoh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Jan 14, 2020 •

edited