Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restart and container state metrics to kubelet #2605

Conversation

schleyfox
Copy link
Contributor

What does this PR do?

This PR adds some metrics that would normally come from kube-state-metrics to the kubelet check. We (airbnb) were having problems getting kube-state-metrics integration to work on our very large clusters. The specific metrics that we want are available in the kubelet podlist and are similar to other metrics that are already being collected here.

  • Reports kubernetes.containers.restarts from the restartCount of the container status
  • Reports kubernetes.containers.{state,last_state}.{running,waiting,terminated} with reasons from the container status fields of the same name. The running metric is pretty redundant and included only for completeness.

Motivation

We were running into issues similar to #1346 (comment) . The response size of the metrics endpoint on kube-state-metrics was too large to submit through DataDog. Sharding by namespace was not possible for our use case because we use a large number of dynamic and variably sized namespaces. We are most interested in being able to detect containers in erroneous or crashloop states.

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
  • If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo

Additional Notes

Anything else we should know when reviewing?

Copy link
Contributor

@xvello xvello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, I'll give it a round of testing tomorrow.
For your information, we're freezing 6.8.0 on Thursday 29th, if that works for you, I'd be happy to work on getting this merged before.

Other than my suggestion to re-scope the restarts metric, you'll need to add metadata for these new gauges. Here is the documentation for it.

@@ -291,10 +300,13 @@ def _report_pods_running(self, pods, instance_tags):
tags += instance_tags
hash_tags = tuple(sorted(tags))
containers_tag_counter[hash_tags] += 1
containers_restarts_tag_counter[hash_tags] += container.get('restartCount', 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the .containers.restarts gauge has legit use cases on a per-pod basis, instead of aggregated per deployment/image. What do you think about moving it to _report_container_state_metrics?

I would tag the metric by high-cardinality pod tags (calling tags_for_pod(pod_id, True)) + the kube_container_name tag (and instance tags of course)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was definitely my goal (per pod/container high cardinality), but I don't really understand how to figure out which tags are available in which contexts from get_tags. I was basing this on observing that the container spec metrics which get tags via

tags = get_tags('%s' % cid, True) + instance_tags
seem to have tags for kube_pod_name, kube_container_name, image_id, etc.

Separate from whether the tags are correct, moving this to _report_container_state_metrics could be better from a code organization standpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this loop, container tags are retrieved in low cardinality (tags = get_tags(container_id, False)). Low cardinality excludes tags like container_name, container_id, pod_name that churns when a pod would be re-created with the same spec. You would basically get the same tags as in AutoDiscovery (kube_deployment, kube_container_name, kube_namespace, docker_image...).

What we want here is to be able to identify what exact pod might be restarting, instead of just the deployment, this is why I suggested using high cardinality for these. In high cardinality, there is no need to go through a counter, as we will have one timeserie per container, the logic would be closer to what we have in _report_container_spec_metrics (iterate over pods, then containers, directly submit gauges for each)

If you want to dig into what tag is what, you can have a look at our tag extractors:

You'll see tags added with either AddLow or AddHigh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation, I completely missed the high/low distinction. I've moved the restarts metrics to the state metrics.

@schleyfox schleyfox force-pushed the kubelet-add-restart-and-container-state-metrics branch from a6413d4 to d9d5afb Compare November 22, 2018 18:19
@schleyfox
Copy link
Contributor Author

Thanks for the review! Getting this in before the code freeze would be fantastic.

I've updated the metadata and can make any other changes that are needed to get this ready.

This is similar to the kube-state-metrics container.status_report
fields, but collected via kubelet.
Tracks the restartCount of containers as reported by kubelet. This is
similar to the metric collected by kube-state-metrics, but since
collection occurs per-node, it scales better for large clusters.
I believe that only last_state.terminated is possible, though the API
spec does not disallow other options.
@schleyfox schleyfox force-pushed the kubelet-add-restart-and-container-state-metrics branch from d9d5afb to 1293541 Compare November 24, 2018 16:55
@codecov-io
Copy link

codecov-io commented Nov 24, 2018

Codecov Report

Merging #2605 into master will increase coverage by 5.21%.
The diff coverage is 80%.

@@            Coverage Diff            @@
##           master   #2605      +/-   ##
=========================================
+ Coverage   83.89%   89.1%   +5.21%     
=========================================
  Files         638      10     -628     
  Lines       36323    1138   -35185     
  Branches     4425     152    -4273     
=========================================
- Hits        30472    1014   -29458     
+ Misses       4545      84    -4461     
+ Partials     1306      40    -1266

Copy link
Contributor

@xvello xvello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution 👍

Copy link
Contributor

@xvello xvello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the back-pedaling, I think there's an issue with the whitelist filtering: if the reason is not in the whitelist, the gauge will still be emitted. Could you please have a look at it?

kubelet/datadog_checks/kubelet/kubelet.py Show resolved Hide resolved
Also exclude redundant state.running metric
Copy link
Contributor

@xvello xvello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick reply, tests look great!

Indeed, container_ids on the running gauge are probably not very useful. We do have known issue that k.pods.running and k.containers.running lists all pods, not just the running ones. DataDog/datadog-agent#2701 should address that though.

@schleyfox
Copy link
Contributor Author

Excellent! Thanks for all of your help on this and sorry for all of the back and forth.

Do I need to take any further action?

@xvello xvello merged commit 6d815df into DataDog:master Nov 29, 2018
@xvello
Copy link
Contributor

xvello commented Nov 29, 2018

@schleyfox Code is merged and will be build into 6.8.0-rc.1 Monday. Thanks 👍

Our current release target for 6.8.0 is the week of Dec. 10th, in the meantime, the RCs will be available on the Docker Hub if you want to give it a last round of testing.

@xvello
Copy link
Contributor

xvello commented Mar 28, 2019

Hi @schleyfox

Feedback from other users is leading us to investigate the following changes:

  • make the metric a monotonic_count instead of a gauge: the agent will compute the diff directly and expose it as a counter instead of a gauge
  • remove the container_id and container_name tags from the metric, to avoid churning the timeseries at every restart (see image below). The metric would still be tagged with pod_name and kube_container_name for uniqueness.

This example graph would now translate to one single time-series for this pod, showing 1 restart/minute during this 6 minute period, then back to zero after things are settled.

Would that make sense for your use case of this metric?

@schleyfox
Copy link
Contributor Author

Yeah, that strikes me as very reasonable.

@xvello
Copy link
Contributor

xvello commented Apr 5, 2019

Thanks @schleyfox for your answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants