title | weight |
---|---|
The "capabilities" gadget |
10 |
The capabilities gadget allows us to see what capability security checks are triggered by applications running in Kubernetes Pods.
Linux capabilities allow for a finer privilege control because they can give root-like capabilities to processes without giving them full root access. They can also be taken away from root processes. If a pod is directly executing programs as root, we can further lock it down by taking capabilities away. Sometimes we need to add capabilities which are not there by default. You can see the list of default and available capabilities in Docker. Specially if our pod is directly run as user instead of root (runAsUser: ID), we can give some more capabilities (think as partly root) and still take all unused capabilities to really lock it down.
Here we have a small demo app which logs failures due to lacking capabilities. Since none of the default capabilities is dropped, we have to find out what non-default capability we have to add.
$ cat docs/examples/app-set-priority.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: set-priority
labels:
k8s-app: set-priority
spec:
selector:
matchLabels:
name: set-priority
template:
metadata:
labels:
name: set-priority
spec:
containers:
- name: set-priority
image: busybox
command: [ "sh", "-c", "while /bin/true ; do nice -n -20 echo ; sleep 5; done" ]
$ kubectl apply -f docs/examples/app-set-priority.yaml
deployment.apps/set-priority created
$ kubectl logs -lname=set-priority
nice: setpriority(-20): Permission denied
nice: setpriority(-20): Permission denied
We could see the error messages in the pod's log. Let's use Inspektor Gadget to watch the capability checks:
$ kubectl gadget capabilities --selector name=set-priority
TIME UID PID TID COMM CAP NAME AUDIT INSETID
13:01:54 1 4779 4779 true 6 CAP_SETGID 0 0
13:01:54 1 4779 4779 true 7 CAP_SETUID 0 0
13:01:54 1 4780 4780 nice 6 CAP_SETGID 0 0
13:01:54 1 4780 4780 nice 7 CAP_SETUID 0 0
13:01:54 1 4780 4780 nice 23 CAP_SYS_NICE 0 0
13:01:54 1 4781 4781 sleep 6 CAP_SETGID 0 0
13:01:54 1 4781 4781 sleep 7 CAP_SETUID 0 0
^CInterrupted!
We can leave the gadget with Ctrl-C.
In the output we see that the SYS_NICE
capability got checked when nice
was run.
We should probably add it to our pod template for nice
to work. We can also drop
all other capabilites from the default list (see link above) since nice
did not use them:
$ cat docs/examples/app-set-priority-locked-down.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: set-priority
labels:
k8s-app: set-priority
spec:
selector:
matchLabels:
name: set-priority
template:
metadata:
labels:
name: set-priority
spec:
containers:
- name: set-priority
image: busybox
command: [ "sh", "-c", "while /bin/true ; do nice -n -20 echo ; sleep 5; done" ]
securityContext:
capabilities:
add: ["SYS_NICE"]
drop: [all]
At this moment we have to make sure that we are allowed to grant SYS_NICE
for new pods in the
restricted pod security policy.
$ kubectl get psp
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
nginx-ingress-controller false NET_BIND_SERVICE RunAsAny MustRunAs MustRunAs MustRunAs false configMap,secret
privileged true * RunAsAny RunAsAny RunAsAny RunAsAny false *
restricted false RunAsAny MustRunAs MustRunAs MustRunAs false configMap, …
For privileged pods adding SYS_NICE
would work, but not for the default pods.
We can change that by editing the policy.
$ kubectl edit psp restricted # opens the editor to add the below two lines
spec:
allowPrivilegeEscalation: false
allowedCapabilities: # <- add these two
- SYS_NICE # lines here
…
After saving we can verify that we are allowed to add new pods which grant SYS_NICE
.
$ kubectl get psp
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
nginx-ingress-controller false NET_BIND_SERVICE RunAsAny MustRunAs MustRunAs MustRunAs false configMap,secret
privileged true * RunAsAny RunAsAny RunAsAny RunAsAny false *
restricted false SYS_NICE RunAsAny MustRunAs MustRunAs MustRunAs false configMap, …
Let's verify that our locked-down version works.
$ kubectl delete -f docs/examples/app-set-priority.yaml
deployment.apps "set-priority" deleted
$ kubectl apply -f docs/examples/app-set-priority-locked-down.yaml
deployment.apps/set-priority created
$ kubectl logs -lname=set-priority
$ kubectl delete -f docs/examples/app-set-priority-locked-down.yaml
The logs are clean, so everything works!
By the way, in our Inspekor Gadget terminal we still see the same checks done as expected.
We do not see if they succeed or not (use traceloop to see the syscalls). You may
include a kernel call stack for more context with --print-stack
.
(If we see additional SYS_ADMIN
checks we can ignore them since only priviledged pods
have this capability and it's not a default capability.)