Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scheduling of system-probe related checks in the core agent and missing permissions for network policies #168

Merged
merged 2 commits into from
Nov 5, 2020

Conversation

vboulineau
Copy link
Contributor

@vboulineau vboulineau commented Nov 4, 2020

What does this PR do?

To auto-activate OOM and TCP checks, the core agent needs to have access to system-probe socket and configuration file.
RBAC declaration for NetworkPolicies was missing (Kustomize)
Fixed DCA RBAC to collect events following introduction of Spec.ClusterAgent.Config.CollectEvents

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Describe your test plan

Activate system-probe OOM or TCP checks and verify they run in core agent.

@vboulineau vboulineau added the bug Something isn't working label Nov 4, 2020
@vboulineau vboulineau added this to the v0.4 milestone Nov 4, 2020
@vboulineau vboulineau requested a review from a team as a code owner November 4, 2020 16:52
@@ -210,6 +210,12 @@ rules:
- get
- list
- watch
- apiGroups:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you reference also adding networkpolicies RBAC in the PR description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +836 to +840
{
Name: "system-probe-config",
MountPath: "/etc/datadog-agent/system-probe.yaml",
SubPath: "system-probe.yaml",
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that before we didn't mount the config map that contains the system-probe.yaml configuration?

Copy link
Contributor Author

@vboulineau vboulineau Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly (in core agent; not in system probe itself ofc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that, before the oom_kill and tcp_queue_length checks, system-probe talked only to the process-agent.
So, it was not needed to access system-probe.yaml from the core agent.

When oom_kill and tcp_queue_length checks were introduced, it initially worked without system-probe.yaml because the core agent was looking for the system-probe socket at its default path. (And as we are inside a container, there was no value to make this path configurable.)
But following some code change, I think that the core agent now needs to have access to system-probe.yaml to have the socket path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW running agent check tcp_queue_length still does not work, wonder if related?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the metrics are showing up in the app ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes status is fine too

Copy link
Contributor

@celenechang celenechang Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you have to do agent check tcp_queue_length -c /etc/datadog-agent/system-probe.yaml

Copy link
Member

@L3n41c L3n41c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@L3n41c L3n41c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m just wondering if the system-probe.yaml file could be mounted read-only in the core agent container ?

Comment on lines +836 to +840
{
Name: "system-probe-config",
MountPath: "/etc/datadog-agent/system-probe.yaml",
SubPath: "system-probe.yaml",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the system-probe-config volume be mounted as read-only ?

Suggested change
{
Name: "system-probe-config",
MountPath: "/etc/datadog-agent/system-probe.yaml",
SubPath: "system-probe.yaml",
},
{
Name: "system-probe-config",
MountPath: "/etc/datadog-agent/system-probe.yaml",
SubPath: "system-probe.yaml",
ReadOnly: true,
},

Comment on lines +1165 to +1169
{
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath,
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the config file be mounted read-only ?

Suggested change
{
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath,
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath,
},
{
Name: datadoghqv1alpha1.SystemProbeConfigVolumeName,
MountPath: datadoghqv1alpha1.SystemProbeConfigVolumePath,
SubPath: datadoghqv1alpha1.SystemProbeConfigVolumeSubPath,
ReadOnly: true,
},

@codecov-io
Copy link

codecov-io commented Nov 4, 2020

Codecov Report

Merging #168 into master will increase coverage by 0.11%.
The diff coverage is 86.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #168      +/-   ##
==========================================
+ Coverage   60.66%   60.78%   +0.11%     
==========================================
  Files          34       34              
  Lines        4744     4753       +9     
==========================================
+ Hits         2878     2889      +11     
+ Misses       1649     1648       -1     
+ Partials      217      216       -1     
Flag Coverage Δ
unittests 60.78% <86.66%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
controllers/datadogagent_controller.go 66.66% <ø> (ø)
controllers/datadogagent/clusteragent.go 70.97% <50.00%> (+0.16%) ⬆️
controllers/datadogagent/utils.go 82.38% <100.00%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e87c37...848935a. Read the comment docs.

@vboulineau vboulineau merged commit 28fb8b2 into master Nov 5, 2020
@vboulineau vboulineau deleted the vboulineau/fix_system_probe branch November 5, 2020 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants