Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with add_process_metadata and k8s manifest for Auditbeat #24890

Closed
jsoriano opened this issue Apr 1, 2021 · 7 comments · Fixed by #29717
Closed

Memory leak with add_process_metadata and k8s manifest for Auditbeat #24890

jsoriano opened this issue Apr 1, 2021 · 7 comments · Fixed by #29717
Labels
bug Team:Integrations Label for the Integrations team

Comments

@jsoriano
Copy link
Member

jsoriano commented Apr 1, 2021

There seems to be a memory leak with add_process_metadata that is reproduced with the reference configuration provided to run Auditbeat in Kubernetes.
This processor is used in this scenario to obtain the container.id from the process.id, so add_kubernetes_metadata can enrich events. But issue is also reproduced when add_kubernetes_metadata is not used.

Tried to reproduce in a simpler scenario, only with docker, but memory usage of this processor didn't seem to increase beyond ~13MB. In the linked discuss issue there seems to be problems even with 1GB memory limits. Difference could be in the maximum number of pids allowed (sysctl kernel.pid_max).

add_process_metadata has a process cache whose entries are never cleaned, but the key is the pid, so its size is effectively limited by the maximum number of pids in the machine. The problem may be that kernel.pid_max can be quite big.

Some stragegy should be applied to remove unneeded or expired entries from this cache.

For confirmed bugs, please report:

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@mareckii
Copy link

mareckii commented Apr 1, 2021

Maybe it doesn't really matter but i'm using GKE with google COS as operating system on nodes.

heap2.prof.zip - both processors active
heap3.prof.zip - only add_process_metadata active

@jsoriano
Copy link
Member Author

jsoriano commented Apr 1, 2021

@mareckii thanks for the memory profiles, it actually looks like the problem is around the process cache in add_process_metadata. You mention in the discuss issue that after some days it ends up taking hundreds of MB. It'd be great if you could share a profile after a couple of days, to double-check if it is the cache in add_process_metadata what continues growing.

@jsoriano
Copy link
Member Author

jsoriano commented Apr 1, 2021

It seems that google COS is configured with a pid max of 2**22 (4194304), this is 128 times what I have in the machine where I tried (32768). If same memory usage ratio is maintained, a cache for so many pids would take more than 1.5GB.

@mareckii could you confirm by checking the pid max in one of your affected machines? This can be checked with cat /proc/sys/kernel/pid_max or sysctl kernel.pid_max.

@mareckii
Copy link

mareckii commented Apr 7, 2021

Hi,

cat /proc/sys/kernel/pid_max
4194304

if it helps memory dumps after few days:
heap4.prof.zip

@jsoriano
Copy link
Member Author

jsoriano commented Apr 7, 2021

Thanks @mareckii.

Yes, in this profile most of the memory in use is allocated by the process cache in add_process_metadata, this would be consistent with the high value of pid_max.

This cache should have a different strategy for these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants