Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change .container.restarts to be actionnable in a monitor #3417

Closed
wants to merge 1 commit into from

Conversation

xvello
Copy link
Contributor

@xvello xvello commented Mar 28, 2019

What does this PR do?

The kubernetes.containers.restarts metric is currently collected as a ever-increasing gauge, and tagged by container_id. This makes the value stay at 1 and change tags every time a container restarts inside a given pod:

This makes the metric very hard to use for alerting, while we would expect it to give an actual count of restarts per pod_name/kube_container_name. This PR does the following:

  • change the collection type to a monotonic counter: the agent will compute the value delta and only send non-zero when a restart occurred
  • move this gauge (and the other container state gauges) to orchestrator cardinality, meaning the timeseries will not churn when the container restart

Depends on #3413 , the target branch will be changed back to master after the first PR is merged.

Monitor example

Testing with 2 then 6 restarts of redis containers on a given host:

Before (gauge)

After

Review checklist (to be filled by reviewers)

  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached
  • Feature or bugfix must have tests
  • Git history must be clean
  • If PR adds a configuration option, it must be added to the configuration file.

@stale
Copy link

stale bot commented Apr 27, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community.

@stale stale bot added the stale label Apr 27, 2019
@xvello xvello closed this Jun 26, 2019
@dd-devflow dd-devflow bot deleted the xvello/kubelet-restarts-type branch February 7, 2024 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant