Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scheduling of system-probe related checks in the core agent and missing permissions for network policies #168

Merged
merged 2 commits into from
Nov 5, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ endif
BUNDLE_METADATA_OPTS ?= $(BUNDLE_CHANNELS) $(BUNDLE_DEFAULT_CHANNEL)

# Image URL to use all building/pushing image targets
IMG ?= datadog/datadog-operator:latest
IMG ?= datadog/operator:latest

# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
ifeq (,$(shell go env GOBIN))
Expand Down
2 changes: 1 addition & 1 deletion config/manager/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ resources:
- manager.yaml
images:
- name: controller
newName: datadog/datadog-operator
newName: datadog/operator
newTag: latest
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
6 changes: 6 additions & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,12 @@ rules:
- get
- list
- watch
- apiGroups:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you reference also adding networkpolicies RBAC in the PR description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- networking.k8s.io
resources:
- networkpolicies
verbs:
- '*'
- apiGroups:
- policy
resources:
Expand Down
19 changes: 17 additions & 2 deletions controllers/datadogagent/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,20 @@ func defaultPodSpec() corev1.PodSpec {
}

func defaultSystemProbePodSpec() corev1.PodSpec {
agentWithSystemProbeVolumeMounts := []corev1.VolumeMount{}
agentWithSystemProbeVolumeMounts = append(agentWithSystemProbeVolumeMounts, defaultMountVolume()...)
agentWithSystemProbeVolumeMounts = append(agentWithSystemProbeVolumeMounts, []corev1.VolumeMount{
{
Name: "sysprobe-socket-dir",
ReadOnly: true,
MountPath: "/opt/datadog-agent/run",
},
{
Name: "system-probe-config",
MountPath: "/etc/datadog-agent/system-probe.yaml",
SubPath: "system-probe.yaml",
},
Comment on lines +836 to +840
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that before we didn't mount the config map that contains the system-probe.yaml configuration?

Copy link
Contributor Author

@vboulineau vboulineau Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly (in core agent; not in system probe itself ofc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that, before the oom_kill and tcp_queue_length checks, system-probe talked only to the process-agent.
So, it was not needed to access system-probe.yaml from the core agent.

When oom_kill and tcp_queue_length checks were introduced, it initially worked without system-probe.yaml because the core agent was looking for the system-probe socket at its default path. (And as we are inside a container, there was no value to make this path configurable.)
But following some code change, I think that the core agent now needs to have access to system-probe.yaml to have the socket path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW running agent check tcp_queue_length still does not work, wonder if related?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the metrics are showing up in the app ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes status is fine too

Copy link
Contributor

@celenechang celenechang Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you have to do agent check tcp_queue_length -c /etc/datadog-agent/system-probe.yaml

Comment on lines +836 to +840
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the system-probe-config volume be mounted as read-only ?

Suggested change
{
Name: "system-probe-config",
MountPath: "/etc/datadog-agent/system-probe.yaml",
SubPath: "system-probe.yaml",
},
{
Name: "system-probe-config",
MountPath: "/etc/datadog-agent/system-probe.yaml",
SubPath: "system-probe.yaml",
ReadOnly: true,
},

}...)
return corev1.PodSpec{
ServiceAccountName: "foo-agent",
InitContainers: []corev1.Container{
Expand All @@ -850,7 +864,7 @@ func defaultSystemProbePodSpec() corev1.PodSpec {
Command: []string{"bash", "-c"},
Args: []string{"for script in $(find /etc/cont-init.d/ -type f -name '*.sh' | sort) ; do bash $script ; done"},
Env: defaultEnvVars(),
VolumeMounts: defaultMountVolume(),
VolumeMounts: agentWithSystemProbeVolumeMounts,
},
{
Name: "seccomp-setup",
Expand Down Expand Up @@ -888,7 +902,7 @@ func defaultSystemProbePodSpec() corev1.PodSpec {
},
},
Env: defaultEnvVars(),
VolumeMounts: defaultMountVolume(),
VolumeMounts: agentWithSystemProbeVolumeMounts,
LivenessProbe: defaultLivenessProbe(),
ReadinessProbe: defaultReadinessProbe(),
},
Expand Down Expand Up @@ -1976,6 +1990,7 @@ func Test_newExtendedDaemonSetFromInstance_endpointsChecksConfig(t *testing.T) {

test.Run(t)
}

func extendedDaemonSetWithSystemProbe(ddaHash string, podSpec corev1.PodSpec) *edsdatadoghqv1alpha1.ExtendedDaemonSet {
return &edsdatadoghqv1alpha1.ExtendedDaemonSet{
ObjectMeta: metav1.ObjectMeta{
Expand Down
17 changes: 17 additions & 0 deletions controllers/datadogagent/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -1153,6 +1153,23 @@ func getVolumeMountsForAgent(spec *datadoghqv1alpha1.DatadogAgentSpec) []corev1.
},
}...)
}

// SystemProbe volumes
if datadoghqv1alpha1.BoolValue(spec.Agent.SystemProbe.Enabled) {
volumeMounts = append(volumeMounts, []corev1.VolumeMount{
{
Name: datadoghqv1alpha1.SystemProbeSocketVolumeName,
MountPath: datadoghqv1alpha1.SystemProbeSocketVolumePath,
ReadOnly: true,
},
{
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath,
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath,
},
Comment on lines +1165 to +1169
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the config file be mounted read-only ?

Suggested change
{
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath,
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath,
},
{
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath,
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath,
ReadOnly: true,
},

}...)
}

return append(volumeMounts, spec.Agent.Config.VolumeMounts...)
}

Expand Down
1 change: 1 addition & 0 deletions controllers/datadogagent_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ type DatadogAgentReconciler struct {
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=*
// +kubebuilder:rbac:groups=apps,resources=daemonsets,verbs=*
// +kubebuilder:rbac:groups=policy,resources=poddisruptionbudgets,verbs=*
// +kubebuilder:rbac:groups=networking.k8s.io,resources=networkpolicies,verbs=*

// Reconcile loop for DatadogAgent
func (r *DatadogAgentReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
Expand Down