feat: Username for process_exec events #2369

anfedotoff · 2024-04-23T17:12:05Z

Username is useful when tetragon works on host. On different hosts the same username can have different UIDs.
Approach based on resolving username with pure Go os/user LGTM. But there is some restrictions:

/etc/passwd is available
tetragon agent and process for which UID is resolved are in the same mount/user namespace

TODO:

Check if tetragon agent and process for which UID is resolved are in the same mount/user namespace
Username resolving by a flag (maybe?) // I think flag is not needed.
Test // Add username check in TestEventExecve
Docs // Field description exists in reference. Maybe we need to find some place in docs to describe restrictions of username resolving

netlify · 2024-04-23T17:12:50Z

✅ Deploy Preview for tetragon ready!

Name	Link
🔨 Latest commit	`465f368`
🔍 Latest deploy log	https://app.netlify.com/sites/tetragon/deploys/664346fd1f20980008b831da
😎 Deploy Preview	https://deploy-preview-2369--tetragon.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

anfedotoff · 2024-04-26T13:12:59Z

@jrfastab, please, have a look. I think the PR is ready.

jrfastab · 2024-05-02T06:04:14Z

hi 👋 sorry for the delay I'm going to look at it in more detail tomorrow quick scan looks good to me.

tixxdz

I agree on the feature since users want it ;-)

But some changes since it is not that easy and we have to predict things.

tixxdz · 2024-05-02T11:28:19Z

api/v1/tetragon/tetragon.proto

@@ -259,6 +259,8 @@ message Process {
    ProcessCredentials process_credentials = 17;
    // Executed binary properties. This field is only available on ProcessExec events.
    BinaryProperties binary_properties = 18;
+    // User name associated with process.


After some thoughts and let's see what @jrfastab thinks too, but since username is something abstracted from a user space (name) services and not kernel related that could do network lookups in future, let's separate the things.

I suggest something like this:

// User records message UserRecord { // The UNIX username for this record. Corresponds to `pw_name` field of [struct passwd](https://man7.org/linux/man-pages/man3/getpwnam.3.html) // and the `sp_namp` field of [struct spwd](https://man7.org/linux/man-pages/man3/getspnam.3.html). name = 1; } // Unix user account information message UserAccount { // UserRecord user = 1; }

Then inside message Process:

// Unix user account information. Available only on host deployments when Tetragon is running as // a system service or directly on host. Accounts information is retrieved from traditional user database // `/etc/passwd` and no name services lookups are performed. UserAccount account = 19;

@jrfastab I don't like the 'account' name field, feel free to suggest.

All this is to separate things as username is something that is available depending on the workflow but also to be extensible in future with where things are going in the userspace stack:

https://systemd.io/USER_GROUP_API/

https://systemd.io/USER_RECORD/

Also eventually there are some fields there that we could add to records so we offer cool features to our users ;-)

Thank you for such detailed review! Your proposed solution with UserAccont struct looks great. It allows us to extend event with some other useful information in future. I could't find a better naming for 'Account'. So I keep this name for now.

What do we get by having this couple layers of structures here?

UserRecord {
UserName {
name string
}
}

I get having one struct any reason to not just embed name in UserRecord?

UserRecord {
name string
}

If user record grows fine.

Yeh having such layering may seem too much, but in the other hand the abstraction at userspace has always been separate users and groups, you have two files for /etc/passwd and /etc/groups, at systemd layer two separate records, glibc sgrp and spwd, NSS etc... and I'm trying to position tetragon be autocompatible as possible... ;-)

Since you point this, I did another review of those user space fields, systemd is the most advanced one. We won't do NSS lookups in tetragon. So taking current distros support:

Arch Linux seems to have already adopted systemd-userdb service JSON User/Group Record Query Multiplexer/NSS Compatibility as part of the systemd core.

Ubuntu has https://manpages.ubuntu.com/manpages/lunar/man8/systemd-userdbd.8.html but it is provided by another package. Anyway the end result is a socket on /run/systemd/userdb/ that provides json records.

Debian issue closed by packaging systemd-userdb in its own package https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=976960

So I see two choices here, and obviously I go with json systemd compatible one as it will allow us to support NSS and all new stuff https://systemd.io/USER_RECORD/ and https://systemd.io/GROUP_RECORD/ by querying the socket automatically, then mapping fields to the tetragon fields automatically.

JSON systemd-userdb compatible two structs: per user and one per group:

// User records UserRecord { name string ... } // Group records GroupRecord { ... } message Process { // Unix user account information. Available only on host deployments when Tetragon is running as // a system service or directly on host. Accounts information is retrieved from traditional user database // `/etc/passwd` and no name services lookups are performed. UserRecord user = 19; // Unix group account information.... GroupRecord group = xx; }

Or we do simple one struct and embed everything, and prefix fields with either user/group for ones that have same name and also same meaning we let it grow

// Unix user account information AccountRecord { username string // The UNIX user name for this record. groupname string // The UNIX group name for this record. } message Process { // Unix user account information. Available only on host deployments when Tetragon is running as // a system service or directly on host. Accounts information is retrieved from traditional user database // `/etc/passwd` and no name services lookups are performed. AccountRecord user/account = 19 }

Again for obvious reasons, I prefer 1. WDYT @jrfastab @anfedotoff ?

I vote for the first one:). Btw, if we add group record, I think I can fill it the same way as user record.

I vote for the first one:). Btw, if we add group record, I think I can fill it the same way as user record.

Yes same way, just mention it in the code doc.

Let's wait for @jrfastab ack and I hope we have sorted out all stuff ;-)

1 works for me. For simplicity we could start with just the User record in t his PR and follow up with Group?

1 works for me. For simplicity we could start with just the User record in t his PR and follow up with Group?

It's no problem to get primary group name for the user using LookupGroupId. But primary group name is often the same as user name. So I'm not sure if it is useful to have only this info.

So, I vote to keep only User record for now, but for me it is OK to add Group record as I described. @tixxdz, what do you think?

@anfedotoff Up to you if you can make it shorter, document it and covered with tests then go with it, otherwise we can merge this PR first, then you follow up.

pkg/sensors/exec/exec.go

tixxdz

So LGTM in general, minor fixes again and I think we can merge it.

Thank you for doing it ;-)

tixxdz · 2024-05-04T11:09:22Z

pkg/sensors/exec/exec.go

+	mode := cgroups.GetDeploymentMode()
+	if (mode == cgroups.DEPLOY_SD_SERVICE || mode == cgroups.DEPLOY_SD_USER) &&
+		namespace.GetCurrentNamespace().Mnt.Inum == m.Unix.Msg.Namespaces.MntInum &&
+		namespace.GetCurrentNamespace().User.Inum == m.Unix.Msg.Namespaces.UserInum {


Sorry here I missed that you should replace the namespace check with:
https://github.com/cilium/tetragon/blob/main/pkg/reader/namespace/namespace.go#L270 get host namespaces there and if no error (to make code look correct for static analysis) and that mnt and user ns of exec match host namespaces then you are good to go.

No need to log errors, since normally tetragon will fail at startup if InitHostNamespaces() fails, but let's check the code so go tools won't error

Hmm got confused here! Why do we check if the exec event is running in same host namespace as tetragon? do you have a particular reason for that?

Or do you think due to usernamespace mapping users will get confused?

From execve event in bpf we return the kernel global uid view anyway, so we will always match the host view in /etc/passwd if the uid exists there. Can you please add that in the proto API documentation?

Then from code we can just call here: namespace.GetCurrentNamespace() that is tetragon and then check if User.IsHost==true and Mnt.IsHost==true (also cache result in static global var there). This way we are running in host and resolving uids according to host original mapping of /etc/passwd

Hmm got confused here! Why do we check if the exec event is running in same host namespace as tetragon? do you have a particular reason for that?

I thought if there is a case, when tetragon is running not in host ns. For this case I check if exec and tetragon are in the same namespace. Yes, let's consider only host ns.

pkg/sensors/exec/exec.go

pkg/sensors/exec/exec_test.go

anfedotoff · 2024-05-08T09:21:18Z

@tixxdz , please, have a look. I think PR is ready:).

tixxdz

So LGTM still two things pointed out that we need to fix before ;-)

But really looks great!

tixxdz · 2024-05-08T13:54:58Z

pkg/sensors/exec/exec.go

@@ -68,6 +71,19 @@ func msgToExecveKubeUnix(m *processapi.MsgExecveEvent, exec_id string, filename
 	return kube
 }

+func msgToExecveAccountUnix(m *exec.MsgExecveEventUnix) {
+	mode := cgroups.GetDeploymentMode()
+	if ns, err := namespace.GetMsgNamespaces(m.Unix.Msg.Namespaces); err == nil {


So one last fix please here:

I mean to cache results of Tetragon and check if tetragon is running in host, not the target process that is received in unix message.

So can you please cherry pick this patch here: 69092c5 , the repo is here: https://github.com/cilium/tetragon/compare/pr/tixxdz/2024-05-check-current-namespaces , just the last patch, make it first patch then you add yours on top.

func msgToExecveAccountUnix(m *exec.MsgExecveEventUnix) { mode := cgroups.GetDeploymentMode() ns := namespace.GetCurrentNamespace() if (mode == cgroups.DEPLOY_SD_SERVICE || mode == cgroups.DEPLOY_SD_USER) && (ns && ns.Mnt.IsHost && ns.User.IsHost) { ... } }

@tixxdz, thanks for comments. Maybe I misunderstand something. But look at the case:
Tetragon is running on host. I ran a docker container. I this container I created a user with uid 1001. Than just called whoami.

"process_exec": { "process": { "exec_id": "YW5mZWRvdG9mZi1uaXg6MTEwNzA5MDEyMjAyMjc2MjoyOTU2ODM=", "pid": 295683, "uid": 1001, "cwd": "/", "binary": "/usr/bin/whoami", "flags": "execve rootcwd clone", "start_time": "2024-05-08T14:13:12.359738563Z", "auid": 4294967295, "docker": "ead4b91477a0d8affbde3db2317d89b", "parent_exec_id": "YW5mZWRvdG9mZi1uaXg6NDQ3NDYxOTUzMDAwMDAwMDoyOTQ1OTA=", "tid": 295683 },

I think we don't want to resolve usernames for processes that are not from host NS.

This is /etc/passwd file from container:

nginx:x:101:101:nginx user:/nonexistent:/bin/false user:x:1000:1000::/home/user:/bin/sh user2:x:1001:1001::/home/user2:/bin/sh

On host, for example uid 1000 is me, anfedotoff:). So, If I start a process from user (1000) I will recieve my host user (anfedotoff) in event... So that's why I check if process belongs to host NS. What do you think about such case?

Hm so:

You create a docker container that's running with uid 1001 right? since tetragon is reporting that. The uid is global in Tetragon view, even if you map it with user namespaces, we always report uids in the host user namespace.

Then with the username lookup logic we will resolve it to host /etc/passwd, so inside host:/etc/passwd if uid 1001 is userA and inside container:/etc/passwd uid 1001 is userB , in tetragon we will report userA , I guess that's the part you find confusing right?

I agree that users could be confused, let's then do it one step, we go with resolve only if process is in host NS, while we think about it, and we may just document it.

So to merge this, please keep your current change, just fix the test skipped, and we are good ;-)

Hm so:

You create a docker container that's running with uid 1001 right? since tetragon is reporting that. The uid is global in Tetragon view, even if you map it with user namespaces, we always report uids in the host user namespace.

Then with the username lookup logic we will resolve it to host /etc/passwd, so inside host:/etc/passwd if uid 1001 is userA and inside container:/etc/passwd uid 1001 is userB , in tetragon we will report userA , I guess that's the part you find confusing right?

Yes, exactly. Let's resolve username only if process is in host NS for now!

tixxdz · 2024-05-08T14:03:43Z

pkg/sensors/exec/exec_test.go

+	ns := namespace.GetCurrentNamespace()
+	if (mode != cgroups.DEPLOY_SD_SERVICE && mode != cgroups.DEPLOY_SD_USER) ||
+		!ns.Mnt.IsHost || !ns.User.IsHost {
+		t.Skip()


The test is being skipped here: https://github.com/cilium/tetragon/actions/runs/8984334822/job/24676304855?pr=2369#step:6:1345 so you probably need to initialize the deployment as pointed out.

Try to run the test on your local machine with the following command:

make go test -exec "sudo" -p 1 -parallel 1 -gcflags= -timeout 20m -failfast -cover -run TestEventExecveWithUsername ./pkg/sensors/exec/... -v

It should not be skipped.

kkourt

Thanks!

I have one change request.

kkourt · 2024-05-10T06:19:49Z

pkg/sensors/exec/exec.go

+func msgToExecveAccountUnix(m *exec.MsgExecveEventUnix) {
+	mode := cgroups.GetDeploymentMode()
+	if ns, err := namespace.GetMsgNamespaces(m.Unix.Msg.Namespaces); err == nil {
+		if (mode == cgroups.DEPLOY_SD_SERVICE || mode == cgroups.DEPLOY_SD_USER) &&


Instead of checking (mode == cgroups.DEPLOY_SD_SERVICE || mode == cgroups.DEPLOY_SD_USER) can we introduce a user flag for this?

The flag will allow us to document the limitations of just reading /etc/passwd. I would call the flag something like: --resolve-username=unix-basic and document the limitations.

I would personally have the default be --resolve-username=disabled, but I'm fine to set it to "unix-basic" if (mode == cgroups.DEPLOY_SD_SERVICE || mode == cgroups.DEPLOY_SD_USER).

But why introduce the flag? what's purpose it is serving? we are already document that in host deployments we are reading /etc/passwd?

But why introduce the flag? what's purpose it is serving? we are already document that in host deployments we are reading /etc/passwd?

I agree with @tixxdz . I don't know is it reasonable to add a flag only for documentation purpose. Maybe we can find a place in docs to describe the restrictions of username resolving? It might help users to be not surprised when for some process event will not have a username field. @kkourt what do you think?
P.S. For me it's ok to add on option if you think so.

Ok after an offline discussion with @kkourt , indeed it could be problematic on setups that go with specific NSS, even if we ignore non existent entries in /etc/passwd, this falls short on duplicate or even if users want to override entries.

Instead of checking (mode == cgroups.DEPLOY_SD_SERVICE || mode == cgroups.DEPLOY_SD_USER) can we introduce a user flag for this?

The deployment is a guard, cause if in container deployment or something else, it does not make sense to check that container passwd file.

The flag will allow us to document the limitations of just reading /etc/passwd. I would call the flag something like: --resolve-username=unix-basic and document the limitations.

Makes sense, what if we make it: --metadata-username=unix ? to reflect on https://linux.die.net/man/8/pam_unix and metadata enrichment? calling it --resolve-username is fine too.

I would personally have the default be --resolve-username=disabled, but I'm fine to set it to "unix-basic" if (mode == cgroups.DEPLOY_SD_SERVICE || mode == cgroups.DEPLOY_SD_USER).

Yes that works, let's go with disabled meaning just that, we check deployment mode and we set it, and if in future we want to disable it even for host deployments we could do that.

BTW some notes as follow up:

Cache the entries. Taking the sssd architecture https://sssd.io/docs/architecture.html it has a cache by default, I think entries could expire, but for entries below 1000 , and referencing systemd uids https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md#summary we can cache those in small map.

If uid 0 resolves to root which is the convention we should not query the file again, but for later.

The primary motivation for having a flag is:

allow users to turn off this feature (this is aimed at environments that rely on other methods. e.g., NSS)

allow us to support other methods in the future

I prefer --username-metadata=unix to --metdata-username.

One of the extensions we might want to do is --username-metdata=unix:<file> and then mount /etc/passwd inside the tetragon's pod to location <file>. Doing so will allow us to get host username events even if we deploy via a K8s daemonset.

Yes --username-metadata=unix is good too.

One of the extensions we might want to do is --username-metdata=unix: and then mount /etc/passwd inside the tetragon's pod to location . Doing so will allow us to get host username events even if we deploy via a K8s daemonset.

Hmm I dont think we should do that, security scanners may flag tetragon. Would be better to ask users that want this, install systemd-userdb then mount the socket from /run/systemd/userdb/ then do json queries on the records. Also container dynamic userids IIRC are not registered somewhere or propagated so we can look them up IIRC.

Guys, I added --username-metadata option. I also added a cache for username lookups by UID below 1000.
I looked at this table and I'm not sure that systemd UIDs greater than 1000 are not changing. Maybe cache only UIDs from [0; 1000) ?

Guys, I added --username-metadata option

Thanks @anfedotoff.
I'll let @tixxdz answer about the cache, but in my opinion we can merge the PR without it.

Guys, I added --username-metadata option

Thanks @anfedotoff. I'll let @tixxdz answer about the cache, but in my opinion we can merge the PR without it.

I agree, let's first merge what we have now, and you can add the optimization to not open every time /etc/passwd in follow up PR

api/v1/tetragon/tetragon.proto

tixxdz

LGTM! Can you please add @kkourt suggestion about UserRecord doc

Also remove the cache let's please make it in separate PR.

Thanks for fixing the test, much appreciated ;-)

tixxdz · 2024-05-14T10:36:54Z

pkg/sensors/config/confmap/confmap.go

+			"deployment.mode": mode.String(),
+		}).Warn("Username resolution is not available for given deployment mode")
+	}
+


Ok checking this here is good, thanks

Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>

tixxdz

Thank you ;-)

tixxdz · 2024-05-14T19:07:52Z

The CI failures seem unrelated

anfedotoff requested a review from a team as a code owner April 23, 2024 17:12

anfedotoff requested a review from olsajiri April 23, 2024 17:12

anfedotoff marked this pull request as draft April 23, 2024 17:14

anfedotoff force-pushed the username-proccess-events branch 2 times, most recently from 5d48ea0 to f997d95 Compare April 24, 2024 16:19

anfedotoff marked this pull request as ready for review April 26, 2024 13:09

anfedotoff changed the title ~~[WIP] feat: Username for process_exec events~~ feat: Username for process_exec events Apr 26, 2024

tixxdz self-requested a review May 1, 2024 22:31

tixxdz requested changes May 2, 2024

View reviewed changes

tixxdz added the release-note/major This PR introduces major new functionality label May 2, 2024

anfedotoff force-pushed the username-proccess-events branch from ee6be8f to 0906a0c Compare May 3, 2024 15:51

tixxdz reviewed May 4, 2024

View reviewed changes

anfedotoff force-pushed the username-proccess-events branch from 0906a0c to 94042c3 Compare May 7, 2024 11:12

tixxdz reviewed May 8, 2024

View reviewed changes

anfedotoff force-pushed the username-proccess-events branch from 94042c3 to a11c9fc Compare May 9, 2024 20:52

kkourt requested changes May 10, 2024

View reviewed changes

anfedotoff force-pushed the username-proccess-events branch from a11c9fc to cf44dc4 Compare May 14, 2024 08:43

anfedotoff requested a review from mtardy as a code owner May 14, 2024 08:43

anfedotoff force-pushed the username-proccess-events branch 2 times, most recently from 86c385b to c478473 Compare May 14, 2024 09:18

kkourt reviewed May 14, 2024

View reviewed changes

api/v1/tetragon/tetragon.proto Outdated Show resolved Hide resolved

tixxdz reviewed May 14, 2024

View reviewed changes

anfedotoff added 2 commits May 14, 2024 14:09

Username for process_exec events

0a30da6

Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>

Check username in TestEventExecveWithUsername

465f368

Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>

anfedotoff force-pushed the username-proccess-events branch from c478473 to 465f368 Compare May 14, 2024 11:11

tixxdz approved these changes May 14, 2024

View reviewed changes

kkourt approved these changes May 15, 2024

View reviewed changes

kkourt merged commit da825b5 into cilium:main May 15, 2024
44 of 45 checks passed

feat: Username for process_exec events #2369

feat: Username for process_exec events #2369

Conversation

anfedotoff commented Apr 23, 2024 • edited Loading

netlify bot commented Apr 23, 2024 • edited Loading

✅ Deploy Preview for tetragon ready!

anfedotoff commented Apr 26, 2024

jrfastab commented May 2, 2024

tixxdz left a comment

Choose a reason for hiding this comment

tixxdz May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anfedotoff May 6, 2024 • edited Loading

Choose a reason for hiding this comment

anfedotoff May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tixxdz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anfedotoff commented May 8, 2024

tixxdz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anfedotoff May 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tixxdz May 8, 2024 • edited Loading

Choose a reason for hiding this comment

kkourt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkourt May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tixxdz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tixxdz left a comment

Choose a reason for hiding this comment

tixxdz commented May 14, 2024

anfedotoff commented Apr 23, 2024 •

edited

Loading

netlify bot commented Apr 23, 2024 •

edited

Loading

tixxdz May 2, 2024 •

edited

Loading

anfedotoff May 6, 2024 •

edited

Loading

anfedotoff May 6, 2024 •

edited

Loading

anfedotoff May 8, 2024 •

edited

Loading

tixxdz May 8, 2024 •

edited

Loading

kkourt May 13, 2024 •

edited

Loading

tixxdz left a comment •

edited

Loading