-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix scheduling of system-probe related checks in the core agent and missing permissions for network policies #168
Conversation
…issing permissions for network policies
@@ -210,6 +210,12 @@ rules: | |||
- get | |||
- list | |||
- watch | |||
- apiGroups: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you reference also adding networkpolicies RBAC in the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
{ | ||
Name: "system-probe-config", | ||
MountPath: "/etc/datadog-agent/system-probe.yaml", | ||
SubPath: "system-probe.yaml", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that before we didn't mount the config map that contains the system-probe.yaml configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly (in core agent; not in system probe itself ofc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that, before the oom_kill
and tcp_queue_length
checks, system-probe
talked only to the process-agent
.
So, it was not needed to access system-probe.yaml
from the core agent.
When oom_kill
and tcp_queue_length
checks were introduced, it initially worked without system-probe.yaml
because the core agent was looking for the system-probe
socket at its default path. (And as we are inside a container, there was no value to make this path configurable.)
But following some code change, I think that the core agent now needs to have access to system-probe.yaml
to have the socket path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW running agent check tcp_queue_length
still does not work, wonder if related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the metrics are showing up in the app ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes status is fine too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you have to do agent check tcp_queue_length -c /etc/datadog-agent/system-probe.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m just wondering if the system-probe.yaml
file could be mounted read-only in the core agent container ?
{ | ||
Name: "system-probe-config", | ||
MountPath: "/etc/datadog-agent/system-probe.yaml", | ||
SubPath: "system-probe.yaml", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the system-probe-config
volume be mounted as read-only ?
{ | |
Name: "system-probe-config", | |
MountPath: "/etc/datadog-agent/system-probe.yaml", | |
SubPath: "system-probe.yaml", | |
}, | |
{ | |
Name: "system-probe-config", | |
MountPath: "/etc/datadog-agent/system-probe.yaml", | |
SubPath: "system-probe.yaml", | |
ReadOnly: true, | |
}, |
{ | ||
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName, | ||
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath, | ||
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the config file be mounted read-only ?
{ | |
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName, | |
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath, | |
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath, | |
}, | |
{ | |
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName, | |
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath, | |
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath, | |
ReadOnly: true, | |
}, |
…t.Config.CollectEvents`
Codecov Report
@@ Coverage Diff @@
## master #168 +/- ##
==========================================
+ Coverage 60.66% 60.78% +0.11%
==========================================
Files 34 34
Lines 4744 4753 +9
==========================================
+ Hits 2878 2889 +11
+ Misses 1649 1648 -1
+ Partials 217 216 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
What does this PR do?
To auto-activate OOM and TCP checks, the core agent needs to have access to
system-probe
socket and configuration file.RBAC declaration for NetworkPolicies was missing (Kustomize)
Fixed DCA RBAC to collect events following introduction of
Spec.ClusterAgent.Config.CollectEvents
Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
Describe your test plan
Activate system-probe OOM or TCP checks and verify they run in core agent.