Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auditbeat] Login metricset #9327

Merged
merged 54 commits into from
Jan 30, 2019
Merged

[Auditbeat] Login metricset #9327

merged 54 commits into from
Jan 30, 2019

Conversation

cwurm
Copy link
Contributor

@cwurm cwurm commented Dec 2, 2018

This adds the login metricset to the Auditbeat system module. It's the last of the six initial metricsets. It only works on Linux, and detects not just user logins and logouts, but also system boots and shutdowns.

It works by reading the /var/log/wtmp and /var/log/btmp file (and rotated files) present on Linux systems. In reading a file, it is similar to Filebeat, except that UTMP is a binary format, so reading happens using a binary Go reader. See utmp(5) for the format of that file.

The logic is roughly as follows:

  1. The config parameters login.utmp_file_pattern and login.btmp_file_pattern will contain the pattern matching the wtmp (good logins, as well as system shutdowns and boots) and btmp (bad/failed logins) files and rotated files (if desired). The defaults are /var/log/wtmp* and /var/log/btmp*. These are expanded using filepath.Glob and the files are sorted lexicographically in reverse order (i.e. /var/log/wtmp.1 will come before /var/log/wtmp) so that we read older login records first - reading in order is required for matching login and logout records, see next steps.
  2. Every Fetch it checks for new entries: Any new files are read from the beginning, while known files are read from a saved offset. To that purpose, the last offset per file is saved and persisted to disk in beat.db. A new file is one that has an unknown inode, but files are also read completely if their newSize < oldSize for some reason (that should make it work with any potential inode reuse - very unlikely since this will never read a lot of files but still possible).
  3. New UTMP records are converted to events (LoginRecord in the code). Boot and shutdown events are fairly straightforward, user login and logout events have to be matched using their tty- so there is a loginSessions map that stores logins to enrich the logouts, and is also persisted to disk.

Note: This dataset also introduces event.origin containing the file the event came from, e.g. /var/log/wtmp.1. In other cases, it would be something like procfs or netlink. It's useful to know where information comes from, e.g. to know how reliable it is.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/secops

x-pack/auditbeat/module/system/login/utmp.go Outdated Show resolved Hide resolved
x-pack/auditbeat/module/system/login/utmp.go Outdated Show resolved Hide resolved
x-pack/auditbeat/module/system/login/utmp.go Show resolved Hide resolved
x-pack/auditbeat/module/system/login/login.go Show resolved Hide resolved
x-pack/auditbeat/module/system/login/login.go Outdated Show resolved Hide resolved
@cwurm cwurm mentioned this pull request Dec 2, 2018
21 tasks

// Save new state to disk
if len(loginRecords) > 0 {
err := ms.utmpReader.saveStateToDisk()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning it mostly for awareness: if I'm not wrong, the Auditbeat report.Event()doesn't guarantee sending at least once. If Elasticsearch is down, for example, it will retry 3 times, and then drop the event. This means that we can potentially lose login events, despite saving the state. I think we could solve this after the Filebeat refactoring, but I suggest documenting this as a limitation for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's a shame. It guess this affects all metricsets, and we might lose process starts, socket connections, password changes, audit events, FIM events, etc.

The internal queue does not keep events until they can be sent (or it overflows)? That would seem desirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, and one can also use spooling to disk to make the queue larger. But Filebeat offers an extra guarantee by waiting for the ACK and only afterwards update the state on disk. This is not currently possible in Auditbeat. That’s the difference I wanted to highlight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, maybe if at some point report.Event() would return false if it couldn't write to the queue or otherwise cannot guarantee sending the event eventually - so we can retry later.

@cwurm cwurm requested review from tsg and removed request for webmat December 3, 2018 19:22
@tsg
Copy link
Contributor

tsg commented Dec 3, 2018

We should think through what the experience is on non-Linux systems. I'm thinking two things:

  • If you enable the login module on macOS/Windows, you should get an unimplemented error in the logs
  • The default windows/macOS configs should not enable this metricset.

Is the first point already happening? I couldn't test because of the user.go error.

@cwurm
Copy link
Contributor Author

cwurm commented Dec 3, 2018

* If you enable the `login` module on macOS/Windows, you should get an unimplemented error in the logs

That should be what's happening (in login_other.go). I've added some extra cgo build flags in case it's not enabled when building.

* The default windows/macOS configs should not enable this metricset.

It shouldn't - there's an if/else around it in the config.yml.tmpl.

I couldn't test because of the user.go error.

I opened #9368 to address that - it compiles for me after applying that.

I've changed to using message as well.

Copy link
Contributor

@tsg tsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for addressing the comments.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a useful metricset. I left a few comments that need to be addressed now, and some others that are just suggestions. If you have any questions let me know.

x-pack/auditbeat/module/system/login/login.go Outdated Show resolved Hide resolved
x-pack/auditbeat/module/system/login/login.go Outdated Show resolved Hide resolved
x-pack/auditbeat/module/system/login/utmp.go Show resolved Hide resolved
x-pack/auditbeat/module/system/login/utmp.go Outdated Show resolved Hide resolved
x-pack/auditbeat/module/system/login/utmp.go Show resolved Hide resolved
if utAddrV6[1] != 0 || utAddrV6[2] != 0 || utAddrV6[3] != 0 {
// IPv6
b := make([]byte, 16)
binary.LittleEndian.PutUint32(b[:4], utAddrV6[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean for big-endian architectures? Will this code work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good point and I don't know. Do we support any big-endian systems?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised Go doesn't have a binary.MachineEndian.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package main

import (
	"encoding/binary"
	"fmt"
	"unsafe"
)

var MachineEndian = getByteOrder()

func getByteOrder() binary.ByteOrder {
	var b [2]byte
	*((*uint16)(unsafe.Pointer(&b[0]))) = 1
	if b[0] == 1 {
		return binary.LittleEndian
	}
	return binary.BigEndian
}

func main() {
	var b [4]byte
	MachineEndian.PutUint32(b[:], 0x12345678)
	fmt.Printf("%x %x %x %x\n", b[0], b[1], b[2], b[3])
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just revisiting this. I'm thinking that it might not be likely that this would be used on a big-endian system that uses UTMP login records. In the interest of keeping the code simple I'd leave it as is. If there's a need to change things we always can.

@adriansr getByteOrder() looks like a useful function that Go itself should have, indeed. Or maybe we should have it in a shared location within Beats so code like this here can use it easily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's likely that someone will use it on big-endian. We've had several contributions to go-libaudit to make auditbeat work on big-endian.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added support for it. Thanks @adriansr for the code snippet!

x-pack/auditbeat/module/system/login/utmp.go Outdated Show resolved Hide resolved
x-pack/auditbeat/module/system/login/utmp.go Outdated Show resolved Hide resolved
x-pack/auditbeat/module/system/login/utmp.go Outdated Show resolved Hide resolved
// but will return -1 instead.
func lookupUsername(username string) int {
if username != "" {
user, err := user.Lookup(username)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine for low volumes. But we have seen from the auditd module that this call is expensive. Some caching could be a useful future enhancement.

IP *net.IP
Timestamp time.Time
Origin string
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, long name and a many exported symbols that we might not need somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoginRecord is encoded and decoded by gob so it needs to be exported, unfortunately.


// FileRecord represents a UTMP file at a point in time.
type FileRecord struct {
Inode Inode
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In filebeat we also store the device id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now it's unlikely that UTMP files would be stored on separate file systems. I.e. it would be mean different parts of /var/log are on different devices. I'm not even sure that's possible.

}

// ReadNew returns any new UTMP entries in any files matching the configured pattern.
func (r *UtmpFileReader) ReadNew() ([]LoginRecord, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big do we expect these files to be? Looks like this function will load all available logs into memory on first run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've assumed they're usually quite small, but that might not be true in all cases. The /etc/logrotate.conf on my Ubuntu VM rotates it monthly. On that VM that's getting a number of daily logins/logouts, they're <100K. On a real-world machine with a lot of logins, I assume it could be more, I wonder how many records it would take to be significant.

Nevertheless, I guess we could use a channel between Fetch and ReadNew and output records as events as they come.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more or less what filebeat does as well. Have one go-routine per file and forward event by event. That is, the module/input is more or less active all the time. Not sure how well this fits the 'Fetch' semantics in the metricbeat framework. But a bounded channel can indeed help keeping memory usage in place. Given the complexity this introduces I wonder if that's really a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to using channels now, and storing offsets of the files (instead of re-reading until the last read record). So hopefully that will make it scalable even if the files are very large.

r.log.Warnf("Unexpectedly, the file %v did not contain the saved login record %v - reading whole file.",
path, *lastKnownRecord)

return r.readAfter(path, nil)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are ignoring all messages read so far if we end up here (e.g. we read while copytruncate like logrotation happens). We still want to publish known records? Alternatively we can just error and return here, but rely on fetch to reset the file offset and read new contents the next time Fetch is executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, maybe I was a bit too overzealous to make sure we never miss an event here and just returning is better.

However, I'm curious about copytruncate: As I understand it, the logfile is copied (e.g. cp /var/log/wtmp /var/log/wtmp.1) and then truncated. If that happens, I suspect the metricset as it is at the moment (with the file pattern being /var/log/wtmp* by default) would read all of /var/log/wtmp.1 (because it'll have a different inode) - even though it has read all of those records already. Does Filebeat handle this in some way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can wtmp be copy-truncated? I thought it's guaranteed to be renamed with moves. I wouldn't want to bring all the complexity of Filebeat's corner cases in here :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename with move should be ok. But them we don't get into this code path.

Instead of having a tail recursive call we could use a for loop and reset some state and use continue in order to jump to beginning of the reader and not drop events. But then it's an edge case that normally should not occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is now using offsets and if there is some problem in jumping to the stored offset it will jump back to the beginning and re-read the file. This is maybe a bit overzealous, but I think at least for now I'd prefer it if events were sent twice rather than not at all. If there are any problems we can always change.

@cwurm cwurm added in progress Pull request is currently in progress. and removed review labels Dec 11, 2018
@cwurm cwurm mentioned this pull request Jan 16, 2019
10 tasks
Copy link
Contributor

@adriansr adriansr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, suggesting minor refactor, feel free to do it on a separate PR.


var byteOrder = getByteOrder()

func getByteOrder() binary.ByteOrder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out gosigar has a GetEndian() method for this.

func GetEndian() binary.ByteOrder {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, thanks!

@cwurm cwurm merged commit 1566e66 into elastic:master Jan 30, 2019
@cwurm cwurm added v6.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Feb 3, 2019
cwurm pushed a commit to cwurm/beats that referenced this pull request Feb 3, 2019
Adds the login metricset to the Auditbeat system module as the last of the six initial metricsets. It only works on Linux, and detects not just user logins and logouts, but also system boots and shutdowns.

It works by reading the /var/log/wtmp and /var/log/btmp file (and rotated files) present on Linux systems. In reading a file, it is similar to Filebeat, except that UTMP is a binary format, so reading happens using a binary Go reader.

(cherry picked from commit 1566e66)
cwurm pushed a commit that referenced this pull request Feb 4, 2019
Adds the login metricset to the Auditbeat system module as the last of the six initial metricsets. It only works on Linux, and detects not just user logins and logouts, but also system boots and shutdowns.

It works by reading the /var/log/wtmp and /var/log/btmp file (and rotated files) present on Linux systems. In reading a file, it is similar to Filebeat, except that UTMP is a binary format, so reading happens using a binary Go reader.

(cherry picked from commit 1566e66)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants