Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

mt-index-cat: support custom patterns and improve bench-realistic-workload-single-tenant.sh #1042

Merged
merged 14 commits into from
Oct 18, 2018

Conversation

Dieterbe
Copy link
Contributor

No description provided.

fmt.Println(" the chances need to add up to 100")
fmt.Println(" operation is one of:")
fmt.Println(" pass (passthrough)")
fmt.Println(" <number>rcnw (replace number random consecutive nodes with wildcards")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a confusing sentence replace number random consecutive nodes with wildcards. I think i understand how it's meant, but hard to read it a few times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about replace <number> random consecutive nodes with wildcards ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that would be much clearer because the number doesn't look like a word in the sentence


// random choice between replacing a node with a wildcard, a char with a wildcard, and passthrough
func pattern(in string) string {
mode := rand.Intn(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't rand need a seed? otherwise it will keep repeating the same pattern on every run i think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mst@mst-nb1:~/abc$ cat main.go
package main

import "math/rand"
import "fmt"

func main() {
	fmt.Println(rand.Intn(10))
	fmt.Println(rand.Intn(10))
	fmt.Println(rand.Intn(10))
	fmt.Println(rand.Intn(10))
	fmt.Println(rand.Intn(10))
}
mst@mst-nb1:~/abc$ go build main.go
mst@mst-nb1:~/abc$ ./main
1
7
7
9
1
mst@mst-nb1:~/abc$ ./main
1
7
7
9
1
mst@mst-nb1:~/abc$ ./main
1
7
7
9
1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this tool is used for repeated benchmark runs, it seems useful to have a consistent generated workload. no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could say that, yeah...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not doing cryptography here so I think this is fine.

// and otherwise, sometimes valid patterns are produced,
// but it's also possible to produce patterns that won't match anything (if '.' was taken out)
chars := rand.Intn(5)
pos := rand.Intn(len(in) - chars)
Copy link
Contributor

@replay replay Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(in) is <5 then it is possible that this will result be rand.Intn(<=0) (depending on chars). That would panic according to: https://golang.org/pkg/math/rand/#Intn


// round rounds number d to the nearest r-boundary
func round(d, r int64) int64 {
neg := d < 0
Copy link
Contributor

@replay replay Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

round appears to be prepared to deal with negative numbers as if they were positive numbers, but the list of else ifs in roundDuration appears to assume all numbers are positive. I think for consistency it would be better to either deal with negative numbers correctly or not everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roundDuration rounds durations. it seems to be implied to me that durations can't be negative.
i see your point though but i would rather not cripple a utility function because it's currently only being called by roundDuration.
we can argue both ways on this but it's not worth the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course a duration can be negative. The sign tells you if the duration is the amount of time before or after the reference time.

Via a comment, at

// set this after doing the query, to assure age can't possibly be negative

You state that age cant be negative, but that is not true. If the local time where this code is running is wrong, then age can easily be negative. Likewise, if the localtime on the server that is writing to the index is wrong the age can also end up negative.

Both of these are problems that we should not be masking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But looking at the code, negatives already look to be handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true re clock wrong.
also callers of mt-index-cat could technically call the age() function on any integer, which is a 2nd reason how a duration could be negative.
while @woodsaj is correct that we don't seem to break upon negative durations, @replay also brings up a good point that roundDuration doesn't behave as expected when given negative numbers (it should round them also). i will fix

os.Exit(-1)
}

// we one or more of "<chance> <operation>" followed by an input string at the end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the we wasn't supposed to be there

@Dieterbe
Copy link
Contributor Author

@replay thank you. PTAL

@Dieterbe
Copy link
Contributor Author

PR has been stuck for 2 weeks...
can someone please approve this

@woodsaj
Copy link
Member

woodsaj commented Oct 1, 2018

@Dieterbe can you rebase first.

func patternCustom(in ...interface{}) string {
usage := func() {
fmt.Println("usage of patternCustom:")
fmt.Println("input | patternCustom <chance> <operation>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to be made clear that <chance> <operation> can be repeated. convention for that is

input | patternCustom <chance> <operation>[ <chance> <operation>...]


// percentage chance, and function
func patternCustom(in ...interface{}) string {
usage := func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated in flag.Usage(). It should be split out to its own function to use for both?

patternCustomUsage(withExample bool) string {
  out := bytes.NewBuffer()
  out.WriteString(fmt.Sprintln("patternCustom: transforms a graphite.style.metric.name into a pattern with wildcards inserted according to rules provided:"))
  ...
  return out.String()
}

fmt.Println("the chances need to add up to 100")
fmt.Println("operation is one of:")
fmt.Println(" pass (passthrough)")
fmt.Println(" <number>rcnw (replace <number> random consecutive nodes with wildcards")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consecutive nodes starting from where?

Copy link
Member

@woodsaj woodsaj Oct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually mean

replace <number> consecutive nodes, from a random position with, a wildcard

what happens if the number of replacements is > than the (numberOfNodes - startPos)?

Copy link
Member

@woodsaj woodsaj Oct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also needs to be made clear that <number>is a single digit

Copy link
Contributor Author

@Dieterbe Dieterbe Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i clarified this.
if there's not enough nodes/characters, mt-index-cat will error out. that behavior doesn't need to be explicitly mentioned in the usage though.

fmt.Println("operation is one of:")
fmt.Println(" pass (passthrough)")
fmt.Println(" <number>rcnw (replace <number> random consecutive nodes with wildcards")
fmt.Println(" <number>rccw (replace <number> random consecutive characters with wildcards")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the consecutive characters contained within a single node or will the potentially span multiple nodes.
what if the number of replacements is > (totalChars - startChar)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not node aware. i suppose someone could add a node aware version if the need arises.
if the number is too high, see previous comment.

use 1 simple pattern instead of a mix of various patterns. we simply
don't do enough requests to have that become a consistent pattern.

Also, we have to filter for which metrics to use.
Without filtering we would get all metrics back, e.g.:

1 some.id.of.a.metric.1
1 some.id.of.a.metric.10
1 some.id.of.a.metric.100
1 some.id.of.a.metric.1000
1 some.id.of.a.metric.10000
1 some.id.of.a.metric.100000
...
1 some.id.of.a.metric.99999

This meant replacing a single char with a '*' could match many more
series then we meant to. e.g. the top one, replacing 1 with * would
match ALL series, resulting in very unfair benchmark runs.
Now we know that we only ever query for 1 or 10 series.
@Dieterbe
Copy link
Contributor Author

should be good to merge now

# this selects exactly 10k series that will match the regex, out of which we randomly replace 1 char with a wildcard, resulting in queries usually for 1 series, and sometimes for 10 series (depending on whether the replaced char falls within the dynamic part an the end of the name or not)
# so 20/25 chances for 1 series, 5/25 chances for 10
# as we execute 100Hz*300s=30k requests, this should give us a plenty high cache hit rate (while still testing the uncached code path)
# in practice, the cache rate sometimes looks fairly low and i'm not sure why. but anyway (seeing about 15% hit partial and the rest are misses), and only in the last minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sentence is strange but anyway (seeing about 15% hit partial and the rest are misses), and only in the last minute. what is in the last minute, those hits/misses?

Suggested change
# in practice, the cache rate sometimes looks fairly low and i'm not sure why. but anyway (seeing about 15% hit partial and the rest are misses), and only in the last minute
# in practice, the cache rate sometimes looks fairly low and i'm not sure why (seeing about 15% hit partial and the rest are misses).

Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Dieterbe Dieterbe merged commit e4db0a4 into master Oct 18, 2018
@Dieterbe Dieterbe deleted the better-mt-index-cat branch October 29, 2018 09:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants