Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

mt-parrot: continuous validation by sending dummy stats and querying them back #1680

Merged
merged 60 commits into from
Apr 6, 2020

Conversation

fitzoh
Copy link
Contributor

@fitzoh fitzoh commented Feb 18, 2020

…for each metrictank partition.

Builds on #1665, has initial support for querying metrics back out of metrictank.

Doesn't actually emit useful stats on the results yet, but I wanted to get some feedback on the type of stats to emit.

monitor.go is probably the most interesting file at the moment

@fitzoh fitzoh requested a review from Dieterbe February 18, 2020 16:30
//total number of entries where drift occurred
fmt.Printf("parrot.monitoring.nonMatching;partition=%d; %d\n", stats.partition, stats.numNonMatching)
fmt.Println()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the proposed/sample metrics I'm thinking about emitting... do these make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we register the metrics up front?

I think this set is pretty good.
When I think about "what could problems in the data be?", I think it could be any combination of:

  1. incorrect values (val != ts): number could be off or be null.
  2. too many points / not enough points / unevenly spaced points / incorrect timestamps (all points should divide by the period without remainder and be 1 period apart) / points are not in sorted order

1 is well covered by the nancount and deltaSum, but 2 isn't really covered yet.

The delay (delta between last point sent and last non-nullpoint seen) is also interesting. "lag" seems to want to monitor something a bit differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we register the metrics up front?

So that would probably be a map of partition -> struct of partition specific metrics?

2 isn't really covered yet

I'll work something up on those

The delay (delta between last point sent and last non-nullpoint seen) is also interesting. "lag" seems to want to monitor something a bit differently.

Not sure what the latter portion of that means

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latter portion:
parrot.monitoring.lag measures something differently then what i would think it should measure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that would probably be a map of partition -> struct of partition specific metrics?

or a slice, is just a bit more efficient to work with than a map (we can exploit the fact that partition id's are a seqence that starts at 0, works well as slice indices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we register the metrics up front?

57151f2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 is well covered by the nancount and deltaSum, but 2 isn't really covered yet.

36498c0

cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved
@fitzoh fitzoh force-pushed the parrot-init branch 2 times, most recently from 2ca8897 to 51f6291 Compare February 18, 2020 20:13
@Dieterbe Dieterbe changed the title Add mt-parrot command to generate deterministic artificial metrics… mt-parrot: continuous validation by sending dummy stats and querying them back Feb 25, 2020
docs/tools.md Outdated Show resolved Hide resolved
publish/kafka/publish.go Outdated Show resolved Hide resolved
docs/tools.md Outdated Show resolved Hide resolved
cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved
cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved
cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved
@fitzoh
Copy link
Contributor Author

fitzoh commented Feb 26, 2020

This shouldn't be merged until #1685 and #1687 are merged and rebased

docs/tools.md Outdated Show resolved Hide resolved
cmd/mt-parrot/main.go Outdated Show resolved Hide resolved
cmd/mt-parrot/main.go Outdated Show resolved Hide resolved
cmd/mt-parrot/generate.go Outdated Show resolved Hide resolved
cmd/mt-parrot/generate.go Outdated Show resolved Hide resolved
cmd/mt-parrot/main.go Outdated Show resolved Hide resolved
cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved
cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved
cmd/mt-parrot/generate.go Outdated Show resolved Hide resolved
@fitzoh fitzoh force-pushed the parrot-init branch 3 times, most recently from e25199a to 4141888 Compare February 28, 2020 14:39
@fitzoh
Copy link
Contributor Author

fitzoh commented Mar 4, 2020

I think this is about ready for another pass @Dieterbe.

Stats aren't actually being published yet as they're blocked by #1706, but I think all current comments have been addressed.

@fitzoh
Copy link
Contributor Author

fitzoh commented Mar 17, 2020

CI is currently failing due to parrot metrics not being defined in metrics.md...
Does it sound reasonable to update the check to exclude them?

I feel like that's potentially confusing to document them with the main metrictank metrics.

@Dieterbe
Copy link
Contributor

The best solution for that would be if we can generate separate metrics.md files based on the binary.
Can you see if you can tweak https://github.com/Dieterbe/metrics2docs such that it does this?

@fitzoh
Copy link
Contributor Author

fitzoh commented Mar 27, 2020

The best solution for that would be if we can generate separate metrics.md files based on the binary.
Can you see if you can tweak https://github.com/Dieterbe/metrics2docs such that it does this?

This came up in a 1-1 with @woodsaj , his advice was to not worry about documenting parrot metrics for now, so I've disabled metrics2docs for the parrot command by removing the metric prefix from the docs in b2612af

@woodsaj
Copy link
Member

woodsaj commented Mar 27, 2020

his advice was to not worry about documenting parrot metrics for now

Yep, fixing metrics2docs is well outside of scope for this PR, so lets not let if block us from getting this work merged.

I have opened an issue for improving metrics2docs
#1731

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 30, 2020

@fitzoh I have pushed a bunch of commits that should address all my comments, as well as some feedback that was easier to just do rather than type it out.
I have not tested this yet. Once you've had a chance to go through it, maybe we can get together to clarify if something's unclear and test it and work out any final kinks

Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be good enough.
we can do further tweaks in subsequent PR's.

@Dieterbe Dieterbe merged commit a8b1091 into master Apr 6, 2020
@Dieterbe Dieterbe deleted the parrot-init branch April 6, 2020 12:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants