[Metricbeat] CoreDNS module: Add Kibana Dashboard #11619

ioandr · 2019-04-02T22:53:34Z

This PR adds an overview dashboard for the CoreDNS Metricbeat module as proposed on #10432.

This is the first iteration on this dashboard, so please share any thoughts on the proposed visualizations.

Signed-off-by: Ioannis Androulidakis ioannis@arrikto.com

Fixes elastic#10432. Signed-off-by: Ioannis Androulidakis <ioannis@arrikto.com>

elasticmachine · 2019-04-02T22:53:36Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

ChrsMark · 2019-04-08T09:38:26Z

Hello @ioandr @ruflin ,

hopping into this to share a thought.

The Prometheus metrics we collect and try to visualize here that are of cumulative-histograms type should be ideally visualized with a heatmap. Prometheus histograms are nicely visualized like this in Grafana:

A nice blog describing this: Histograms to the rescue

So, is there a similar way to create such visualizations in Kibana and make the most out of these CoreDNS metrics (size_bytes, response_ns etc)?

odacremolbap · 2019-04-17T11:31:26Z

Hi @ioandr @ChrsMark

So much thanks for your contributions, and for your patience on this PR.

#11738 was merged which brought some renaming + new metrics for the CoreDNS cache, and renaming fields to conform to this

Can you please rebase, and adapt to the fields renaming?

Thanks!

ioandr · 2019-04-17T12:13:49Z

Hi @ioandr @ChrsMark

So much thanks for your contributions, and for your patience on this PR.

#11738 was merged which brought some renaming + new metrics for the CoreDNS cache, and renaming fields to conform to this

Can you please rebase, and adapt to the fields renaming?

Thanks!

Hey @odacremolbap, thanks for the heads-up on this!

I will take a look at the new CoreDNS metrics and field names and update this PR accordingly.

exekias · 2019-04-25T13:25:13Z

ey @ioandr, did you have a chance to have look to this? it would be nice to get this in

Fixes elastic#10432. Signed-off-by: Ioannis Androulidakis <ioannis@arrikto.com>

* Refactor visualization field names to meet the naming conventions * Add cache metrics (hits, misses) in existing dashboard * Set index to "metricbeat-*" Signed-off-by: Ioannis Androulidakis <ioannis@arrikto.com>

ioandr · 2019-04-25T23:07:43Z

ey @ioandr, did you have a chance to have look to this? it would be nice to get this in

Hi @exekias, I finally found the time to make some progress on this, thanks for your reminder.

@odacremolbap I renamed certain metrics as well as names of corresponding visualizations. Also, I added another visualization with cache-related metrics in the bottom of the dashboard. Let me know about your opinion on these changes, so we can have another round of review.

Please note that I am taking some days off, roughly until May 4th, so I won't be available until then. Feel free to commit on this PR so I do not block anything!

odacremolbap · 2019-04-26T12:19:50Z

thanks a lot @ioandr
taking a look

ioandr · 2019-05-07T12:57:03Z

thanks a lot @ioandr
taking a look

Hi @odacremolbap, any update on this? Please let me know If there is anything I can do to facilitate your review.

odacremolbap · 2019-05-07T14:34:37Z

@ioandr
I'm having a couple of issues

usage of non keywords as terms for aggregation
usage of data that is not being informed (in my case coredns.stats.dns.request.do.count) present as aggregation. Although a filter is present it looks like Kibana is evaluating and popping up an error.

It would help to know which version of ES/Kibana did you use so that I configure mine the same.

Thanks!

odacremolbap

Requesting some changes, or items to think about:

most of the data received are incrementing counts, we shouldn't be Avg on them
some metrics at the legend have the default and very long label, would be awesome to replace in most cases by a short string
buckets at prometheus are incremental, no need to stack them since bigger buckets already contain smaller ones, they are probably ok if shown overlapping.

I think the best way to go on is retrieving some sample metrics from the prometheus endpoint, like:

coredns_dns_request_size_bytes_bucket{proto="tcp",server="dns://:1053",zone="domain.elastic.",le="16000"} 44069

coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://:1053",zone=".",le="200"} 206986

and check the labels for each metric. Graphs shouldn't Avg on different labeled metrics, and potentially each set of labels should split the visualization, either in new series or charts.

I think the set of charts is ok, so the only stuff to go on is polishing the data in them.

metricbeat/module/coredns/_meta/kibana/7/dashboard/Metricbeat-coredns-overview.json

odacremolbap · 2019-05-09T12:50:49Z

@ioandr
I think my requested changes might need a decent amount of work and testing. I might contribute to this branch and sync with you at this thread, if you don't mind.

ioandr · 2019-05-09T14:32:53Z

@ioandr
I think my requested changes might need a decent amount of work and testing. I might contribute to this branch and sync with you at this thread, if you don't mind.

Sure, feel free to contribute on this PR. I will take a look at the requested changes and come back

odacremolbap · 2019-05-10T07:56:33Z

I've added per server/zone/family/proto metrics where needed.

Since prometheus endpoints return a cumulative number (total counts), graphs looks boring.
I'm not using bucketed metrics at all, will dive in here some more, probably a data table panel would be nice to add.

odacremolbap · 2019-05-10T09:57:18Z

there are some issues with my visualizations, working on it

kaiyan-sheng · 2019-05-10T20:09:01Z

jenkins, test this please

kaiyan-sheng · 2019-05-10T20:12:29Z

I was able to load the dashboard, but I'm having issue running coreDNS so I didn't get any data(my problem).

odacremolbap · 2019-05-20T09:32:53Z

@ioandr your PR got merged.
Thanks a lot for the contribution!

* [Metricbeat] CoreDNS module: Add Kibana Dashboard Fixes elastic#10432.

* Use time.Duration directly in GetStartTimeEndTime function (#12033) * Remove convertPeriodToDuration and use duration directly in GetStartTimeEndTime * Pass period in time.Duration type * Fix memory leak in Filebeat pipeline acker (#12063) * Fix memory leak in Filebeat pipeline acker Before this change acker goroutine was kept forever as processed events count was not correctly updated. Filebeat sends an empty event to update file states, this event is not published, but treated as dropped, without updating counters. This change makes sures that `a.events` count gets updated for dropped events also, so the acker gets closed after all ACKs happen. * Add convert processor (#11686) The `convert` processor converts a field in the event to a different type, such as converting a string to an integer. For a full description of the processor's capabilities see the included documentation. Closes #8124 * Update docs.asciidoc (#11852) (#12045) * Update docs.asciidoc Added in a short note on a few of the other configurable variables. * Make update * Remove systemd v233 requirement because it's no longer true (#12076) * Don't generate autodiscover config when no port matches host hints (#12086) On metricbeat, when the host autodiscover hint is used, and it includes the port, one of the exposed ports has to match with the one in the hint. If not, no configuration should be generated. If it is generated, it will have empty hosts, what would lead to unexpected errors as the seen in #8264. * Fix transptest testing (#12091) Rewrite test for more clarity ensuring that the server and listeners are stopped before the test function returns (cause for the case condition). Each test has it's very own socks5 server now. Tests not requiring a proxy don't spin up one. * Refactor and add tests for template and ilm handling. (#12065) Refactor and add more tests related to setting up template and ilm policy. Ensure template and ilm-policy is exported despite enabled=false setting. * [Auditbeat] Login: Fix re-read of utmp files (#12028) The `login` dataset is not using the previous file offset when reading new entries in a utmp file. As a result, whenever a new login event occurs, all records are re-read. Also expands the documentation, moves test files to testdata/, and adds a test case that adds a utmp record to the test file and re-reads it to make sure this bug does not happen again. * [Metricbeat][postgresql] Update lib/pq to fix #11393 (#12094) This updates github.com/lib/pq to the latest version to fix a socket leak when SSL is not enabled on the PostgreSQL server. Fixes #11393 * Make breaking changes separate files (#12002) * Change image references to use block syntax not inline (#11911) * Change image references to use block syntax not inline * Fix format for inline links * Revert "Make breaking changes separate files (#12002)" (#12116) This reverts commit c17586a. * fix queue.spool.write.flush.events config type (#12080) * Revert printing template and policy name on export. (#12067) Ensures behavior on running the `export` cmd does not change compared to last released version. * [Filebeat] Add -expected files by default (#12041) So far expected files in Filebeat tests were only generated and compared when a file exists. This changes to create a generated for all example logs. This will add a few more files to the repository but I think there the benefits outweight the costs as it means the modules are tested in more detail. Also minor changes will be detected easier. * Update vendored gosigar to 0.10.2 (#12101) #11924 duplicated some code from go-sysinfo that is affected by a memory leak (fixed in #12100) In this case, only master is affected as the PR that introduced the leaky gosigar wasn't backported. * Fix various memory leaks under Windows (#12100) A function in go-sysinfo used under Windows to split a command-line into arguments was leaking memory. This code was used in various places: add_process_metadata processor. (enabled by default in all Beats) Packetbeat's process monitor. (disabled by default). Auditbeat's system/process metricset. This PR updates vendored go-sysinfo to the most recent version, which fixes the leak. * New processor extract_array (#11761) This adds a new processor, extract_array, that allows accessing values inside arrays and copying them to target fields. * [Heartbeat] Remove not needed flags from setup command (#11856) The setup command until now contained all the possible options from the other Beats. As Heartbeat does not ship anymore with dashboards, the --dashboards command is not needed anymore and is only confusing. I also removed all the other commands except `--ilm-policy` and `--template`. I'm not aware that `--pipelines` or `--machine-learning` would be used. Here the comparison between `./heartbeat setup -h` from before and after. Before: ``` This command does initial setup of the environment: * Index mapping template in Elasticsearch to ensure fields are mapped. * Kibana dashboards (where available). * ML jobs (where available). * Ingest pipelines (where available). * ILM policy (for Elasticsearch 6.5 and newer). Usage: heartbeat setup [flags] Flags: --dashboards Setup dashboards -h, --help help for setup --ilm-policy Setup ILM policy --machine-learning Setup machine learning job configurations --pipelines Setup Ingest pipelines --template Setup index template ``` After: ``` This command does initial setup of the environment: * Index mapping template in Elasticsearch to ensure fields are mapped. * ILM Policy Usage: heartbeat setup [flags] Flags: -h, --help help for setup --ilm-policy Setup ILM policy --template Setup index template ``` In this PR I did not include a check for the config option `setup.dashboards` to make sure they are not there like apm-server does (https://github.com/elastic/apm-server/blob/2baefab778fdfe70c47bc2fb488677b2e43e4635/beater/beater.go#L60) as I don't think it's necessary. * Skip Windows testing if magefile.go does not exist (#12099) Changes the jenkins_ci.ps1 script to skip testing when magefile.go does not exist. This will allow us to add projects like x-pack/winlogbeat to the test matrix because not all branches have an x-pack/winlogbeat/magefile.go file. * Fix goroutine leak on initialization failures of log input (#12125) Outlets are created during log input initialization, and if it fails they were never freed. Handle this case. * Document and improve permission checks when running socket metricset from Docker (#12039) Update instructions for system/socket metricset on Docker. And base permission checks on capabilities rather than on the effective uid. Running a process as root doesn't mean that it has all privileges, specially when run as container. * [metricbeat] added CPU usage check to docker memory stats (#12062) * Change type from scaled_float to long and add format (#11982) * Change type from scaled_float to long and add format * [libbeat] Add unit tests for libbeat's client proxy settings (#12044) These tests set up server listeners and create libbeat clients with varying proxy settings, and verify that the clients ping the correct target URL. This is a preparation for #11713, since most of the logic (and work) is in testing the proxy settings; the much simpler PR adding the proxy-disable flag will be a followup to this one, to keep the functional changes isolated in case of rollbacks etc. * [Metricbeat](Etcd-Leader)Followers wont report leader metrics (#12004) * manage leader metricset so that followers don't report errors nor events * add debug message when skipping leader events from non leader members * Add package libbeat/common/cleanup (#12134) * Add package libbeat/common/cleanup The cleanup package adds helpers for deferred optional cleanup on errors. For example: ``` ok := False defer cleanup.IfNot(&ok, func() { ... }) // continue initialization ok = True return // some value ``` * Add changelog entry * [docs] add make fmt to contributing guide (#12118) * Move one changelog entry from breaking change to bug fix (#12146) * Sysmon and Security "modules" for Winlogbeat (#11651) Add pipelines for the Security and Sysmon event logs The Security processor handles just three events to start with - 4624, 4625, and 4648. These are event.category=authentication events. The Sysmon processor handles all event IDs found in the sysmon manifest (sysmon -s). It moves the event_data fields that are in ECS and does some type conversions. * [Auditbeat] Process: Add hash of executable (#11722) Adds the hash(es) of the process executable to `process.hash.*`. The default is to add SHA-1 only as `process.hash.sha1`. * [Docs] Comment out section that contains bad link (#12152) * [Filebeat] Introduce UTC as default timezone for modules tests (#12120) Currently all our modules have convert_timezone disable by default. The reason in 6.x for this was probably that 6.0 did not support convert_timezone and we did not want to introduce a breaking changes. New modules should have convert_timezone enabled by default. If a module has convert_timezone enabled by default the tests will fail as it takes the timezone of the local computer. To circumvent this, this PR sets the timezone of the tests to UTC so the same time zone is always used. No generated files were changed in this PR as all modules have convert_timezone off by default. But it will affect #12079 and #12032 * Add number of goroutines to reported metrics (#12135) * Add minimal ES template functionality. (#12103) When loading a template without fields, create a minimal template only applying given configuration, without any default values for mappings and settings. This allows to create additional templates only defining specific values. * Refactor logging in pgsql module (#12151) Guard debug logging statements with "isDebug" checks. And switch the module over to using named loggers. Fixes #12150 * Ignore doc type in ES search API for ES 8 (#12171) * [Docs] Make breaking changes separate files for each version (#12173) * [Filebeat] module for palo_alto (pan-os) logs (#11999) This is a module for Palo Alto Networks PAN-OS logs received via Syslog. It has been tested with logs for PAN-OS version 7.1 to 9.0. However, it is expected to work with earlier versions as the log format is compatible. * [cmd setup] Add and deprecate setup cmds for index handling (#12132) Deprecate `setup --template` and `setup --ilm` in favour of newly introduced `setup --index-management` command. Fix bug in template and write alias creation order to ensure creating properly managed indices. implements #12095 Co-authored-by: steffen.siering@elastic.co * Zdd zfs beat (#12136) Querying ZFS Storage and Pool Status * [metricbeat] Expand metricbeat dev guide for testing (#12105) * update dev guide to add examples and expand testing * Bugfix set template.order to 1 by default. (#12160) To ensure default order is not changed to 0, set it to 1 in default config. * [Auditbeat] Fix issues with multiple calls to rpmReadConfigFiles (#12168) This patch fixes two issues in Auditbeat's system/package on RPM distros: - Multiple calls to rpmReadConfigFiles lead to a crash (segmentation fault). It is necessary to call rpmFreeRpmrc after each call to rpmReadConfigFiles. See [1] for a similar issue. - In addition, it is also necessary to call rpmFreeMacros (when available) to avoid leaking memory after each rpmReadConfigFiles call. 1: https://lists.fedorahosted.org/pipermail/anaconda-patches/2015-February/015826.html Fixes #12147 * [Filebeat] Palo_alto module improvements (#12182) This PR adds some missing features to the recently merged palo_alto module: Dashboards (One for traffic logs, one for threats). Sets network.type to either ipv4 or ipv6. Renames palo_alto.pan_os.threat_file_or_url to palo_alto.pan_os.threat.resource. Splits palo_alto.pan_os.threat_id into palo_alto.pan_os.threat.id and palo_alto.pan_os.threat.name. * Add mesosbeat to the community beats (#12185) * Missing module.yml.disabled file for palo_alto (#12191) * [Filebeat] Add RabbitMQ module (#12032) * Parses single and multiline messages from [RabbitMQ's unified log file](https://www.rabbitmq.com/logging.html#log-file-location) * `var.paths` uses the `RABBITMQ_LOGS` env variable if present * `var.convert_timezone` (enabled by default since the [timestamps are localtime by default](https://github.com/erlang-lager/lager#universal-time)) * No dashboard * Tested with ``` ~/go/src/github.com/elastic/beats/x-pack/filebeat [master @ 092a3f5] ✓ ‣ MODULES_PATH=./module GENERATE=1 INTEGRATION_TESTS=1 TESTING_FILEBEAT_MODULES=rabbitmq nosetests tests/system/test_xpack_modules.py -v test_fileset_file_0_rabbitmq (test_xpack_modules.XPackTest) ... ok ---------------------------------------------------------------------- Ran 1 test in 3.320s OK ``` * [metricbeat] Add linux sockstat data to socket_summary metricset (#12050) * add sockstat data to socket_summary * Fix goroutine leak on non-explicit finalization of log inputs (#12164) If log inputs were finished because their context, or one of their ouleters have been finished, then it wasn't stopping its harvesters, leaking resources. * [Metricbeat] CoreDNS module: Add Kibana Dashboard (#11619) * [Metricbeat] CoreDNS module: Add Kibana Dashboard Fixes #10432. * [Libbeat][Metricbeat]Add IgnoreAllErrors to schema.Conv object (#12089) * schema conv option for ignore all errors Co-Authored-By: Jaime Soriano Pastor <jaime.soriano@elastic.co> * Add `container` input, deprecate `docker` in favor of it (#12162) Add `container` input, deprecate `docker` in favor of it This change adds a new container input for better support of CRI based scenarios. `docker` input was acting as a catch all input for all container related cases, but its config options were very opinionated towards Docker, with some issues: * `containers.ids` setting was good to abstract logs path, but we have seen many cases were logs are not under default location, or follow a different path pattern (ie CRI logs). * `containers.*` settings have shown counter intuitive for many users, in many cases we have seen people writing `container.*` instead, ending up in a config error. * Some existing settings (`combine_partials`, `cri.parse_flags`) were introduced as a way to offer a backwards compatible upgrades, but it doesn't really make sense to disable them, as they handle actual format behaviors. This new `container` input offers the same wrapper to read log files from containers with the following changes: * It exposes `paths` as the `log` input, instead of `containers.ids` and `containers.path`. * `parse_flags` and `combine_partials` are hardcoded, as there is no good reason to disable them. * `stream` selector is still available, under root settings. * It allows to select the log format (also atodetect it), giving room for future format changes. `format` can be `auto` (default), `docker` and `CRI`. Example configurations: Get Docker logs: ``` filebeat.inputs: - type: container paths: - /var/lib/docker/containers/*/*.log ``` Get Kubernetes logs: ``` filebeat.inputs: - type: container paths: - /var/log/pods/*/*/*.log # this could also be used: #- /var/log/containers/*.log ``` Previous `docker` input is deprecated in favor of this, to be removed in 8.0 * [Auditbeat] Fix formatting of config files on macOS and Windows (#12148) Fixes formatting of auditbeat.yml and auditbeat.reference.yml across platforms. * [libbeat] Escape BOM on JsonReader before trying to decode line (#11661) * fix json bom + testing * Set beat ID in registries after loading meta file (#12180) * Reset beat ID in registries in case loaded from meta file * Set, not reset * Adding CHANGELOG entry * [Filebeat] Move dashboards from 8 to 7 directory (#12217) The CoreDNS and Envoyproxy dashboard were in the 8 instead of 7 directory. This PR fixes this. * adjust doc

[Metricbeat] CoreDNS module: Add Kibana Dashboard

4188b67

Fixes elastic#10432. Signed-off-by: Ioannis Androulidakis <ioannis@arrikto.com>

ioandr requested review from a team as code owners April 2, 2019 22:53

ioandr mentioned this pull request Apr 2, 2019

Add support to monitor CoreDNS #10432

Closed

9 tasks

ruflin added Team:Integrations Label for the Integrations team Metricbeat Metricbeat review :Dashboards module labels Apr 3, 2019

ruflin requested a review from odacremolbap April 8, 2019 09:56

ruflin added the [zube]: In Review label Apr 8, 2019

odacremolbap mentioned this pull request Apr 11, 2019

Add CoreDNS metrics for cache plugin and runtime #11738

Merged

exekias added the v7.2.0 label Apr 24, 2019

ioandr added 2 commits April 25, 2019 22:45

[Metricbeat] CoreDNS module: Add Kibana Dashboard

bbe2b5f

Fixes elastic#10432. Signed-off-by: Ioannis Androulidakis <ioannis@arrikto.com>

[Metricbeat] CoreDNS module: Update Kibana Dashboard

25e5db5

* Refactor visualization field names to meet the naming conventions * Add cache metrics (hits, misses) in existing dashboard * Set index to "metricbeat-*" Signed-off-by: Ioannis Androulidakis <ioannis@arrikto.com>

ioandr force-pushed the feature-coredns-dashboard branch from 4188b67 to 25e5db5 Compare April 25, 2019 23:00

odacremolbap suggested changes May 9, 2019

View reviewed changes

use per server/zone/proto values

630df2e

fix duration and size visualizations

75e3ebf

odacremolbap approved these changes May 10, 2019

View reviewed changes

Merge branch 'master' into feature-coredns-dashboard

01949c4

odacremolbap and others added 4 commits May 12, 2019 14:15

Merge branch 'master' into feature-coredns-dashboard

c52fc1b

merge branches

27b35f5

Merge branch 'master' into feature-coredns-dashboard

412d496

add README to CoreDNS stats metricset

6b499f3

odacremolbap merged commit 0495f6c into elastic:master May 20, 2019

zube bot added [zube]: Done and removed [zube]: In Review labels May 20, 2019

andresrc removed the [zube]: Done label May 20, 2019

ph pushed a commit to ph/beats that referenced this pull request May 21, 2019

[Metricbeat] CoreDNS module: Add Kibana Dashboard (elastic#11619)

4e1997c

* [Metricbeat] CoreDNS module: Add Kibana Dashboard Fixes elastic#10432.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] CoreDNS module: Add Kibana Dashboard #11619

[Metricbeat] CoreDNS module: Add Kibana Dashboard #11619

ioandr commented Apr 2, 2019 •

edited

Loading

elasticmachine commented Apr 2, 2019

ChrsMark commented Apr 8, 2019 •

edited

Loading

odacremolbap commented Apr 17, 2019

ioandr commented Apr 17, 2019

exekias commented Apr 25, 2019

ioandr commented Apr 25, 2019

odacremolbap commented Apr 26, 2019

ioandr commented May 7, 2019

odacremolbap commented May 7, 2019

odacremolbap left a comment

odacremolbap commented May 9, 2019

ioandr commented May 9, 2019

odacremolbap commented May 10, 2019

odacremolbap commented May 10, 2019

kaiyan-sheng commented May 10, 2019

kaiyan-sheng commented May 10, 2019

odacremolbap commented May 20, 2019

[Metricbeat] CoreDNS module: Add Kibana Dashboard #11619

[Metricbeat] CoreDNS module: Add Kibana Dashboard #11619

Conversation

ioandr commented Apr 2, 2019 • edited Loading

elasticmachine commented Apr 2, 2019

ChrsMark commented Apr 8, 2019 • edited Loading

odacremolbap commented Apr 17, 2019

ioandr commented Apr 17, 2019

exekias commented Apr 25, 2019

ioandr commented Apr 25, 2019

odacremolbap commented Apr 26, 2019

ioandr commented May 7, 2019

odacremolbap commented May 7, 2019

odacremolbap left a comment

Choose a reason for hiding this comment

odacremolbap commented May 9, 2019

ioandr commented May 9, 2019

odacremolbap commented May 10, 2019

odacremolbap commented May 10, 2019

kaiyan-sheng commented May 10, 2019

kaiyan-sheng commented May 10, 2019

odacremolbap commented May 20, 2019

ioandr commented Apr 2, 2019 •

edited

Loading

ChrsMark commented Apr 8, 2019 •

edited

Loading