Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock all PAM operations to the startup thread #4133

Merged
merged 1 commit into from
Jul 28, 2020
Merged

Conversation

awly
Copy link
Contributor

@awly awly commented Jul 28, 2020

This fixes the pam_loginuid.so flakiness.
Tested with:

  • teleport
    • running as root on an Ubuntu VM
    • with pam_loginuid.so configured in PAM policy
  • 3 parallel tsh clients on a different machine
    • each running tsh ssh user@ubuntu true 1000 times in a loop
  • running for ~30min

Without the fix, this setup reproduced the sporadic problem. With the fix, no errors reported.
Below is the comment I left in the code:

Lock all PAM commands to the startup thread. From LockOSThread docs:

All init functions are run on the startup thread. Calling LockOSThread from
an init function will cause the main function to be invoked on that thread.

This is needed for pam_loginuid.so. It writes "/proc/self/loginuid"
which, on Linux, depends on being called from a specific thread. If
it's not running on the right thread, pam_loginuid.so may fail with
EPERM sporadically.

Why the startup thread specifically?

The kernel does some validation based on the thread context. I could
not find what the kernel uses specifically. Some relevant code:
https://github.com/torvalds/linux/blob/9d99b1647fa56805c1cfef2d81ee7b9855359b62/kernel/audit.c#L2284-L2317
Locking to the startup thread seems to make the kernel happy.

Why not call LockOSThread from pam.Open?

By the time pam.Open gets called, more goroutines could've been
spawned. This means that the main goroutine (running pam.Open) could
get re-scheduled to a different thread.

Why does pam.Open run on the main goroutine?

This is an assumption. As of today, this is true because teleport
re-executes itself and calls pam.Open synchronously. If we change this
later, loginuid can become flaky again.

What does OpenSSH do?

OpenSSH has a separate "authentication thread" which does all the PAM
stuff:
https://github.com/openssh/openssh-portable/blob/598c3a5e3885080ced0d7c40fde00f1d5cdbb32b/auth-pam.c#L470-L474

Updates #2476

Lock all PAM commands to the startup thread. From LockOSThread docs:

    All init functions are run on the startup thread. Calling LockOSThread from
    an init function will cause the main function to be invoked on that thread.

This is needed for pam_loginuid.so. It writes "/proc/self/loginuid"
which, on Linux, depends on being called from a specific thread.  If
it's not running on the right thread, pam_loginuid.so may fail with
EPERM sporadically.

> Why the startup thread specifically?

  The kernel does some validation based on the thread context. I could
  not find what the kernel uses specifically. Some relevant code:
  https://github.com/torvalds/linux/blob/9d99b1647fa56805c1cfef2d81ee7b9855359b62/kernel/audit.c#L2284-L2317
  Locking to the startup thread seems to make the kernel happy.

> Why not call LockOSThread from pam.Open?

  By the time pam.Open gets called, more goroutines could've been
  spawned.  This means that the main goroutine (running pam.Open) could
  get re-scheduled to a different thread.

> Why does pam.Open run on the main goroutine?

  This is an assumption. As of today, this is true because teleport
  re-executes itself and calls pam.Open synchronously. If we change this
  later, loginuid can become flaky again.

> What does OpenSSH do?

  OpenSSH has a separate "authentication thread" which does all the PAM
  stuff:
  https://github.com/openssh/openssh-portable/blob/598c3a5e3885080ced0d7c40fde00f1d5cdbb32b/auth-pam.c#L470-L474
@awly awly added this to the 4.4 "Rome" milestone Jul 28, 2020
Copy link
Contributor

@webvictim webvictim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that the fix is three lines and the comment is 36 lines. LGTM!

Copy link
Contributor

@fspmarshall fspmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants