node-problem-detector cannot run in non-privileged mode #698

ialidzhikov · 2022-09-01T06:49:56Z

/kind bug

What happened?

Running containers in privileged mode is not recommended as privileged containers run with all linux capabilities enabled and can access the host's resources. Running containers in privileged mode opens number of security threads such as breakout to underlying host OS.

Currently the node-problem-detector DaemonSet runs in privileged mode.

node-problem-detector/deployment/node-problem-detector.yaml

Lines 41 to 42 in d8b2940

    
           securityContext: 
        
             privileged: true

Trying to run node-problem-detector in non-privileged mode (even with all capabilities added) one of its monitors fails with:

E0808 06:25:33.740326       1 problem_detector.go:55] Failed to start problem daemon &{/config/kernel-monitor.json 0xc00035b7a0 0xc000443100 {{kmsg map[] /dev/kmsg 5m } 10 kernel-monitor [{KernelDeadlock  {0 0 <nil>} KernelHasNoDeadlock kernel has no deadlock} {ReadonlyFilesystem  {0 0 <nil>} FilesystemIsNotReadOnly Filesystem is not read-only}] [{temporary  OOMKilling Killed process \d+ (.+) total-vm:\d+kB, anon-rss:\d+kB, file-rss:\d+kB.*} {temporary  TaskHung task [\S ]+:\w+ blocked for more than \w+ seconds\.} {temporary  UnregisterNetDevice unregister_netdevice: waiting for \w+ to become free. Usage count = \d+} {temporary  KernelOops BUG: unable to handle kernel NULL pointer dereference at .*} {temporary  KernelOops divide error: 0000 \[#\d+\] SMP} {temporary  Ext4Error EXT4-fs error .*} {temporary  Ext4Warning EXT4-fs warning .*} {temporary  IOError Buffer I/O error .*} {temporary  MemoryReadError CE memory read error .*} {permanent KernelDeadlock DockerHung task docker:\w+ blocked for more than \w+ seconds\.} {permanent ReadonlyFilesystem FilesystemIsReadOnly Remounting filesystem read-only}] 0xc00043d21e} [] <nil> 0xc00045aea0 0xc00044bb80}: failed to create kmsg parser: open /dev/kmsg: operation not permitted

I don't fully understand what it requires to read kernel logs from /dev/kmsg.

What did you expect to happen?

I would expect to be able to run node-problem-detector in non-privileged mode.

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2022-11-30T07:12:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-12-30T08:02:34Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

ialidzhikov · 2022-12-30T16:13:18Z

/remove-lifecycle rotten

k8s-triage-robot · 2023-03-30T16:51:41Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ialidzhikov · 2023-03-31T07:52:03Z

/remove-lifecycle stale

balu-ce · 2023-05-04T11:20:41Z

Any update on this ?

btiernay · 2023-05-24T02:34:44Z

Duplicate of #625

AlexzSouz · 2023-12-08T11:15:02Z

Duplicate of #625

Both issues DO NOT have a solution for the problem @ialidzhikov mentioned and that I'm currently experiencing. The "duplicate" issue you (@btiernay) shared only contains comments from @k8s-triage-robot. No solution is provided 🤷

Any solution so far?

alazyer · 2023-12-29T07:14:42Z

How about trying with plugin of journald instead? it works fine for me to detect "NodeOOM", "PodOOM" with pattern ".Out of memory." and ".Memory cgroup out of memory."

k8s-triage-robot · 2024-03-28T07:39:00Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

wangzhen127 · 2024-04-05T17:37:00Z

NPD's goal is to detect infra layer issues. So it needs to read logs in a place where non-privileged containers do not have permission. Additionally, we use health checker in production to repair kubelet and containerd by killing them. Those need privilege.

Depending on how you would like to use NPD, there may be a chance that you can tune your daemonset yaml without the privilege access. @hakman for kops, does it run NPD in non-privilege mode?

wangzhen127 · 2024-04-05T17:55:01Z

/remove-kind bug

wangzhen127 · 2024-04-05T17:55:15Z

/remove-lifecycle stale

haardm · 2024-06-08T00:25:43Z

Hello, I am also facing similar issue while reading from /dev/kmsg using NPD while my container is not given privileged mode. Is there any workaround? We only need to read, no mutating actions on our side.

k8s-triage-robot · 2024-09-06T01:04:31Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 1, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 30, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 30, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 30, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 31, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 28, 2024

wangzhen127 mentioned this issue Apr 5, 2024

Non Root option #837

Closed

k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Apr 5, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node-problem-detector cannot run in non-privileged mode #698

node-problem-detector cannot run in non-privileged mode #698

ialidzhikov commented Sep 1, 2022

k8s-triage-robot commented Nov 30, 2022

k8s-triage-robot commented Dec 30, 2022

ialidzhikov commented Dec 30, 2022

k8s-triage-robot commented Mar 30, 2023

ialidzhikov commented Mar 31, 2023

balu-ce commented May 4, 2023

btiernay commented May 24, 2023

AlexzSouz commented Dec 8, 2023

alazyer commented Dec 29, 2023

k8s-triage-robot commented Mar 28, 2024

wangzhen127 commented Apr 5, 2024

wangzhen127 commented Apr 5, 2024

wangzhen127 commented Apr 5, 2024

haardm commented Jun 8, 2024

k8s-triage-robot commented Sep 6, 2024

node-problem-detector cannot run in non-privileged mode #698

node-problem-detector cannot run in non-privileged mode #698

Comments

ialidzhikov commented Sep 1, 2022

What happened?

What did you expect to happen?

k8s-triage-robot commented Nov 30, 2022

k8s-triage-robot commented Dec 30, 2022

ialidzhikov commented Dec 30, 2022

k8s-triage-robot commented Mar 30, 2023

ialidzhikov commented Mar 31, 2023

balu-ce commented May 4, 2023

btiernay commented May 24, 2023

AlexzSouz commented Dec 8, 2023

alazyer commented Dec 29, 2023

k8s-triage-robot commented Mar 28, 2024

wangzhen127 commented Apr 5, 2024

wangzhen127 commented Apr 5, 2024

wangzhen127 commented Apr 5, 2024

haardm commented Jun 8, 2024

k8s-triage-robot commented Sep 6, 2024