Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

switch to logrus logging library #1056

Merged
merged 4 commits into from
Oct 3, 2018
Merged

Conversation

robert-milan
Copy link
Contributor

remove old version of logrus (github.com/Sirupsen/logrus)
add new version of logrus (github.com/sirupsen/logrus)
lock logrus version to 1.0.6
add logger/logger.go for custom TextFormatter used with logrus

first part of solving #624

api/api.go Show resolved Hide resolved
api/api.go Outdated Show resolved Hide resolved
@@ -626,7 +626,7 @@ func (q *TagQuery) testByTagMatch(def *idx.Archive) bool {
equal := strings.Index(tag, "=")
if equal < 0 {
corruptIndex.Inc()
log.Error(3, "memory-idx: ID %q has tag %q in index without '=' sign", def.Id, tag)
log.Errorf("ID %q has tag %q in index without '=' sign", def.Id, tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we no longer logging the component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be until we switch over to individual instances of logrus per module. It just looks like I forgot to place them back into that section, I'll fix it.

mdata/aggmetric.go Outdated Show resolved Hide resolved
@Dieterbe
Copy link
Contributor

Dieterbe commented Sep 20, 2018

some notes on the output formats:
metrictank:

2018-09-20 09:46:11.769 [WARNING] [metrictank] stats dialing localhost:2003 failed: dial tcp 127.0.0.1:2003: connect: connection refused. will retry
2018-09-20 09:46:14.386 [WARNING] [metrictank] kafka-cluster: Processing metricPersist backlog has taken too long, giving up lock after 1m0s.
2018-09-20 09:46:14.386 [INFO] [metrictank] carbon-in: listening on :2003/tcp
[Sarama] 2018/09/20 09:46:14 Connected to broker at kafka:9092 (registered as #0)
2018-09-20 09:46:14.394 [INFO] [metrictank] kafkamdm: consuming from mdm:3 from offset 0
2018-09-20 09:46:14.395 [INFO] [metrictank] kafkamdm: consuming from mdm:4 from offset 0

tools:

./build/mt-index-cat -addr http://localhost:6063 -from 60min cass -hosts localhost:9042 -schema-file scripts/config/schema-idx-cassandra.toml dump | less
2018-09-20 12:45:51.667 [INFO] [mt-index-cat] cassandra-idx: ensuring that keyspace metrictank exist.
2018-09-20 12:45:51.668 [INFO] [mt-index-cat] cassandra-idx: ensuring that table metric_idx exist.
  1. it's unfortunate that libraries such as sarama still do their own logging. but that's outside scope of this PR
  2. i don't think we need to print which binary is running. as that is obvious from context (e.g. pod running in, command that was run, etc). in fact for example in our logging platform we'll know which program is running as it'll be a label that we can query on. it doesn't need to be part of the output lines.
  3. in case it wasn't obvious. i'm a fan of logging the "component/module" (cassandra-idx, stats, etc) because it allows for easy filtering. perhaps we can refactor how we do this, as it's currently easy to forget to add the module to a line, but that also can be done in a future PR.

@Dieterbe
Copy link
Contributor

Dieterbe commented Sep 20, 2018

can you compare benches before and after this change?
you can use benchstat to compare. make sure to run all benches several time, both pre and post your changeset, to get accurate results. if the results are still noisy, you may have to disable cpu frequency scaling etc on your machine

https://godoc.org/golang.org/x/perf/cmd/benchstat

@robert-milan
Copy link
Contributor Author

I've been running benchmarks on master and this branch. I have disabled cpu frequency scaling and set my governor to performance, but I am still getting consistently inconsistent results specifically with the BenchmarkAggregate10k_* and BenchmarkGroupByTags10k_* series of benchmarks. Most of the other benchmarks are within tolerance of each other, but I am not very confident using the current numbers I have as a base for comparison. I am also receiving the same benchmarking failures on both branches. I have attached a zip of the results here.
benchmarks.zip

cmd/metrictank/metrictank.go Outdated Show resolved Hide resolved
@Dieterbe
Copy link
Contributor

Dieterbe commented Oct 2, 2018

we haven't run the benchmarks in a while apparently, i didn't realize some of them are broken.
addressing via #1077

to compare performance, will be most interesting to look at some methods that do a lot of Debug calls, especially those that have their LogLevel shields removed. e.g.
AggMetric.SyncChunkSaveState
AggMetric.Get
AggMetric.addAggregators
AggMetric.persist
AggMetric.add

add() is a good one, and is well represented in these benchmarks:

BenchmarkAggMetrics1000Metrics1Day
BenchmarkAggMetrics1kSeries2Chunks1kQueueSize
BenchmarkAggMetrics10kSeries2Chunks10kQueueSize
BenchmarkAggMetrics100kSeries2Chunks100kQueueSize

I've been running benchmarks on master and this branch. I have disabled cpu frequency scaling and set my governor to performance, but I am still getting consistently inconsistent results specifically with the BenchmarkAggregate10k_* and BenchmarkGroupByTags10k_* series of benchmarks. Most of the other benchmarks are within tolerance of each other, but I am not very confident using the current numbers I have as a base for comparison.

I don't have this issue. all benchmarks are within tolerance for me. in particular :

benchstat before.txt after.txt | 2>&1 | r AggMetrics                                                                                                                ⏎
AggMetrics1000Metrics1Day                                 7.56s ± 0%      7.76s ± 0%   ~     (p=1.000 n=1+1)
AggMetrics1kSeries2Chunks1kQueueSize                     0.06ns ± 0%     0.07ns ± 0%   ~     (p=1.000 n=1+1)

edit looks like these tests need to be fixed

@Dieterbe
Copy link
Contributor

Dieterbe commented Oct 2, 2018

with #1077 integrated in both master and this branch, I get:

benchstat before-new.txt after-new.txt
name          old time/op  new time/op  delta
AggMetricAdd   503ns ± 1%   520ns ± 2%  +3.44%  (p=0.000 n=10+10)

so a slight slowdown. no biggie

remove old version of logrus (github.com/Sirupsen/logrus)
add new version of logrus (github.com/sirupsen/logrus)
lock logrus version to 1.0.6
add logger/logger.go for custom TextFormatter used with logrus
update documentation to match log-level change
remove logging binary names
@Dieterbe Dieterbe merged commit b281e57 into master Oct 3, 2018
@Dieterbe Dieterbe deleted the switch-to-logrus-no-fields branch October 29, 2018 09:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants