Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAM integration and loginuid #2476

Closed
vad opened this issue Jan 10, 2019 · 22 comments
Closed

PAM integration and loginuid #2476

vad opened this issue Jan 10, 2019 · 22 comments
Assignees
Labels
PAM Label related to Pluggable Authentication Module (PAM) Submethod.
Milestone

Comments

@vad
Copy link

vad commented Jan 10, 2019

What happened:

Cannot open a teleport session with PAM enabled and loginuid.

Tsh output:

error: Cannot make/remove an entry for the specified sessionerror: ssh: could not start shell

Teleport logs:

Jan 10 10:20:32 myhost teleport[21659]: [NODE]    Service is starting on 0.0.0.0:3022.
Jan 10 10:20:45 myhost teleport[21659]: pam_loginuid(sshd:session): Error writing /proc/self/loginuid: Operation not permitted
Jan 10 10:20:45 myhost teleport[21659]: pam_loginuid(sshd:session): set_loginuid failed
Jan 10 10:20:45 myhost teleport[21659]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0)
Jan 10 10:20:45 myhost teleport[21659]: ERRO             Cannot make/remove an entry for the specified session regular/sshserver.go:1117

How to reproduce it (as minimally and precisely as possible):

Relevant teleport.yml:

ssh_service:
  enabled: "yes"
  listen_addr: 0.0.0.0:3022
  pam:
    enabled: "yes"
$ grep loginuid /etc/pam.d/sshd 
# Set the loginuid process attribute.
session    required     pam_loginuid.so

A simple workaround is to comment out the loginuid line.

Environment:

  • Teleport version (use teleport version): Teleport v3.0.1 git:v3.0.1-0-g4ff9a7b0
  • Tsh version (use tsh version): Teleport v3.0.1 git:v3.0.1-0-g4ff9a7b0
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.1 LTS
@russjones
Copy link
Contributor

@vad Your workaround is accurate.

At the moment Teleport does not support the pam_loginuid.so PAM module due to Teleports process forking model. Calling pam_loginuid.so would just change the loginuid of the master process whereas for something like sshd the master process would have this unset and the child would set it on itself.

This means if you are using auditd, this means the auid (and potentially ses) fields may be incorrect.

We're investigating how we can fix this.

@benarent benarent added the PAM Label related to Pluggable Authentication Module (PAM) Submethod. label Aug 5, 2019
@jkendzorra
Copy link
Contributor

It would be good to add a hint about pam_loginuid.so to https://gravitational.com/teleport/docs/admin-guide/#pam-integration.

@russjones
Copy link
Contributor

@vad @jkendzorra This should be fixed in Teleport 4.2. I tested it in 4.2.0-rc.1 and it appears to work. Can you give it a shot and tell me what you find?

@jkendzorra
Copy link
Contributor

@russjones tested with 4.2.0-rc.2, can confirm it works for me, too.

@benarent
Copy link
Contributor

We pushed 4.2.0 yesterday, I'm going to close the issue now it's GA.

@zedtux
Copy link

zedtux commented Jul 9, 2020

I had random failure while deploying with capistrano and applying this workaround seem to have solve the issue and I'm running Teleport v4.2.11 git:v4.2.11-0-g244ec16b7 go1.13.2.

I will continue deploying many times and let you know.

@sskousen
Copy link

sskousen commented Jul 9, 2020

@zedtux I've also seen that issue with capistrano (and normal SSH) on all the 4.2 releases, though never able to reliably reproduce. Removing pam_loginuuid.so also solved the problem for me.

It think it's more noticeable with capistrano because each command opens a new ssh/teleport exec session, so if there's a 5% chance on every session and you run 20 commands via capistrano, there's a ~100% likelihood of the error occurring (I know probability is more complicated than that, but it illustrates the point :) )

@awly
Copy link
Contributor

awly commented Jul 9, 2020

@zedtux @sskousen could you file a new issue with any relevant logs and setup details please?

@sskousen
Copy link

sskousen commented Jul 9, 2020

@awly I created Zendesk ZD#1751 last month about this, with logs and setup config.

@awly
Copy link
Contributor

awly commented Jul 9, 2020

Found it.
Looks like the top post here contains the same error and logs, so I'll reopen this bug.

@awly awly reopened this Jul 9, 2020
@awly awly added this to the 4.4 "Rome" milestone Jul 9, 2020
@awly awly self-assigned this Jul 9, 2020
@zedtux
Copy link

zedtux commented Jul 10, 2020

Thank you @sskousen for joining me 😃 and @awly for reopening 👍

@awly awly modified the milestones: 4.4 "Rome", 4.3.1 "Londinium" Jul 14, 2020
@russjones
Copy link
Contributor

Best: 3
Realistic: 8

@awly
Copy link
Contributor

awly commented Jul 16, 2020

@zedtux @sskousen I'm having a hard time reproducing this.
Ran 10 workers for 5min in an infinite loop hitting a teleport instance executing cat /proc/self/loginuid. 100% success rate over 3000 commands.

Could you provide a few more details:

  • OS/version of the node
  • which user does the teleport process run as
  • which used are you (or capistrano) logging in as
  • are there any relevant logs in /var/log/auth.log

Also, if you could run tsh bench --duration=5m --threads=10 $username@$nodename cat /proc/self/loginuid (replace $username and $nodename as appropriate) with the pam_loginuid.so line uncommented in your PAM config, that would be a useful datapoint.

@sskousen
Copy link

Client - Various OpenSSH clients (from OSX and Ubuntu 18.04); Capistrano 2
OS/version of the node - Ubuntu 14.04 and 18.04
which user does the teleport process run as - I'm using the teleport-ent package, which is root
which user are you (or capistrano) logging in as - seth.skousen
are there any relevant logs in /var/log/auth.log Just these two:

Jun 18 15:13:14 sb-sand-longbow1 teleport: pam_loginuid(teleport:session): Error writing /proc/self/loginuid: Operation not permitted
Jun 18 15:13:14 sb-sand-longbow1 teleport: pam_loginuid(teleport:session): set_loginuid failed

@awly
Copy link
Contributor

awly commented Jul 16, 2020

Thanks!
I'm able to reproduce now. For some reason tsh bench doesn't hit it, while running tsh ssh in a loop does.
Here's an correlation - every time this error happens on the tsh/PAM side, I see a DNS-related error in server's syslog:

# client
$ for i in (seq 1000); tsh ssh awly2@node.ubuntu true || date; end
Failed to launch: Cannot make/remove an entry for the specified session.
error: Process exited with status 255
Thu 16 Jul 2020 02:56:52 PM PDT
Failed to launch: Cannot make/remove an entry for the specified session.
error: Process exited with status 255
Thu 16 Jul 2020 02:56:54 PM PDT
Failed to launch: Cannot make/remove an entry for the specified session.
error: Process exited with status 255
Thu 16 Jul 2020 02:57:08 PM PDT
# server
$ tail -f  /var/log/syslog | grep repeated
Jul 16 21:56:52 ubuntu systemd-resolved[505]: message repeated 143 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]
Jul 16 21:56:53 ubuntu systemd-resolved[505]: message repeated 107 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]
Jul 16 21:57:08 ubuntu systemd-resolved[505]: message repeated 863 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]

A quick search suggests it has something to do with systemd-resolved.
Following the suggestion in https://askubuntu.com/questions/1058750/new-alert-keeps-showing-up-server-returned-error-nxdomain-mitigating-potential, I re-linked /etc/resolv.conf to /run/systemd/resolve/resolv.conf.
This got rid of the syslog errors, but didn't help with loginuid stuff.

I'll keep digging.

@sskousen
Copy link

That's definitely a suspicious correlation. Systemd isn't on 14.04, and my company disables system-resolved on 18.04, so I'd be shocked if it's resolved itself, but maybe something else DNS related?

@awly
Copy link
Contributor

awly commented Jul 16, 2020

It's something memory related even.
I injected some debug logs into pam_loginuid.so and it's failing on a write syscall to /proc/self/loginuid with EFAULT (which means buf is outside your accessible address space. where buf is the buffer passed in to write).
It happens here: https://github.com/linux-pam/linux-pam/blob/master/libpam/pam_modutil_ioloop.c#L40 (called by https://github.com/linux-pam/linux-pam/blob/master/modules/pam_loginuid/pam_loginuid.c#L92).

I'm starting to suspect it's a bug in libpam or the kernel that somehow gets triggered more often by teleport than openssh.
Hopefully this ends up as some silly bug in teleport as usual.

@awly
Copy link
Contributor

awly commented Jul 17, 2020

Actually, I instrumented it wrong 🤦‍♂️
It does return EPERM.

@awly
Copy link
Contributor

awly commented Jul 17, 2020

@webvictim yeah, found that one too.
unfortunately it doesn't really explain why that fix works.
and I wouldn't want to turn off all auditing support in the kernel which SELinux seems to rely on.

I'll try a few more things to localize the problem:

  • try pure openssh with public keys
  • try openssh with teleport certificates
  • try making a local repro without SSH in the mix at all

@russjones russjones modified the milestones: 4.3.1 "Londinium", 4.3.2 Jul 24, 2020
@awly
Copy link
Contributor

awly commented Jul 29, 2020

This was fixed.
4.3.3 will include the fix.

@awly awly closed this as completed Jul 29, 2020
@zedtux
Copy link

zedtux commented Jul 29, 2020

Thank you, I'll give it a try as soon as it is released!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PAM Label related to Pluggable Authentication Module (PAM) Submethod.
Projects
None yet
Development

No branches or pull requests

8 participants