Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers in a kind cluster are not being reported in Datadog #101

Closed
arapulido opened this issue May 26, 2020 · 3 comments · Fixed by #107
Closed

Containers in a kind cluster are not being reported in Datadog #101

arapulido opened this issue May 26, 2020 · 3 comments · Fixed by #107
Labels
bug Something isn't working
Milestone

Comments

@arapulido
Copy link
Contributor

arapulido commented May 26, 2020

Describe what happened:

I have created a 3-node kind cluster running 1.18 and I am running datadog-operator v0.2.1 on it.

I deploy the node agent using the following configuration:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
  name: datadog-agent
  namespace: datadog
spec:
  credentials:
    apiKeyExistingSecret: secretapi
    appKeyExistingSecret: secretapp
  agent:
    image:
      name: "datadog/agent:latest"
    config:
      logLevel: "DEBUG"
      leaderElection: true
      tolerations:
      - operator: Exists
      criSocket:
        criSocketPath: /var/run/containerd/containerd.sock
        useCriSocketVolume: true
      env:
      - name: DD_KUBELET_TLS_VERIFY
        value: "false"
      - name: DD_KUBERNETES_KUBELET_HOST
        valueFrom:
          fieldRef:
            fieldPath: status.hostIP
    process:
      enabled: true

Describe what you expected:

I expect to get container information in Datadog. I get this instead:

Screenshot 2020-05-26 at 11 31 48

I get process information correctly:

Screenshot 2020-05-26 at 11 32 22

Steps to reproduce the issue:

  1. Create a 3 node kind cluster:
kind create cluster --name datadog-operator  --config cluster-config.yaml

cluster-config.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.18.0
- role: worker
  image: kindest/node:v1.18.0
- role: worker
  image: kindest/node:v1.18.0
  1. Deploy datadog operator v.0.2.1 using Helm 3
  2. Apply the agent configuration described in the issue
@arapulido
Copy link
Contributor Author

agent status and process-agent -check container outputs:

process-container-status.txt
status.txt

@clamoriniere clamoriniere added the bug Something isn't working label May 26, 2020
@clamoriniere clamoriniere modified the milestones: v1.0, v0.3 May 26, 2020
L3n41c added a commit that referenced this issue Jun 3, 2020
L3n41c added a commit that referenced this issue Jun 3, 2020
`$.spec.agent.config` contains the configuration for core agent.
So `$.spec.agent.config.env` applies only to core agent.

This patch adds `$.spec.agent.env` for environment variables that must
defined for all the agents and not only the core one.

Fixes #101
L3n41c added a commit that referenced this issue Jun 3, 2020
L3n41c added a commit that referenced this issue Jun 3, 2020
`$.spec.agent.config` contains the configuration for core agent.
So `$.spec.agent.config.env` applies only to core agent.

This patch adds `$.spec.agent.env` for environment variables that must
defined for all the agents and not only the core one.

Fixes #101
clamoriniere added a commit that referenced this issue Jun 3, 2020
* Fixes #97,  Fixes #95, Fixes #101, Fixes #102
* Make the definition of `DD_CRI_SOCKET_PATH` and `DOCKER_HOST` for all containers.
* Fix DatadogMetric generation with operator-sdk 0.17
* Add a way to specify environment variables for all containers
* Allow to use a custom secret key name when providing API or APP key via an existing secret
* This patch adds `$.spec.agent.env` for environment variables that must
defined for all the agents and not only the core one.


Co-authored-by: Cedric Lamoriniere <cedric.lamoriniere@datadoghq.com>
@L3n41c
Copy link
Member

L3n41c commented Jun 4, 2020

Indeed, good catch @arapulido !

In fact, what happened was that some environment variables were defined only for the core agent container whereas they were needed by the process agent as well.

I did a PR for the environment variables that are automatically managed by the operator (DD_CRI_SOCKET_PATH and DOCKER_HOST to be more precise) but the same issue applies to the user defined DD_KUBELET_TLS_VERIFY environment variable.

Environment variables defined in spec.agent.config.env are applied only to the core agent.
So, in my PR, I added a new parameter spec.agent.env for environment variables that have to be applied to all agents.

So, once you’ll have upgraded to a version of the operator that has my change, you’ll need to move the definition of DD_KUBELET_TLS_VERIFY from spec.agent.config.env to spec.agent.env.

@arapulido
Copy link
Contributor Author

@L3n41c Thanks! And thanks so much for taking the time to explain the root cause and the fix. Really appreciated! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants