Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not store local service account token and CA to config. #122

Merged
merged 11 commits into from
Jan 7, 2022

Conversation

tsaarni
Copy link
Contributor

@tsaarni tsaarni commented Oct 14, 2021

Overview

Kubernetes 1.21 switched to new ID tokens which are now bound to a specific pod. When pod is deleted and recreated the old token will be invalidated and not accepted by Kubernetes API server anymore. Tokens will also expire and Kubernetes will renew them periodically. The new tokens cause major problem with Vault since currently, even when using local tokens, the token and CA certificate are read when the authentication method is configured and stored persistently to config.

Kubernetes authentication backend will stop working when:

  • the pod where the token happened to be copied from restarts (=pod gets recreated)
  • token expires

Vault will not recover automatically, it requires reconfiguration of the authentication method.

Design of Change

This change avoids storing local token and local CA certificate in config. Instead, they are loaded lazily when needed and kept in memory only. Token will also re-read periodically to facilitate token rotation and to avoid authentication failure due to expired token.

Related Issues/Pull Requests

Fixes #121
Updates #95

Tests

Acceptance tests pass.
New tests added.
Manual test on Kubernetes pass..

Backwards compatibility

I believe that previously GET /auth/kubernetes/config returned a copy of local CA file. This is not done anymore. The justification is that local file is not part of stored configuration. CA file is still returned if user explicitly configured it with POST /auth/kubernetes/config.

Contributor Checklist

When defaulting to local JWT token and CA certificate in a pod, always read
them from local filesystem and do not store them persistenly with the config.

Token will be re-read periodically to avoid using expired token.

The change allows running Vault on Kubernetes 1.21 and newer, which switched
to ID token that is bound to the pod and will expire.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Show resolved Hide resolved
@tsaarni
Copy link
Contributor Author

tsaarni commented Nov 10, 2021

@hsimon-hashicorp Kindly asking if someone could take a look at this PR? Currently Vault's Kubernetes service account authentication method is broken and it would be really great to get it working again!

@heatherezell
Copy link

Chatted with @tvoran about this briefly; this is a note to myself to circle back on this later.

@tomhjp
Copy link
Contributor

tomhjp commented Nov 24, 2021

Hi @tsaarni, thanks for the PR! Sorry for the delay reviewing, but we're definitely interested in this feature. I haven't reviewed in detail yet, but will try to do so this week.

Copy link
Contributor

@tomhjp tomhjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍 I'd like to remove the CA changes from the scope if possible, and I also want to make sure this is as robust as practically possible, so one comment on the caching approach.

path_config.go Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@tsaarni tsaarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @tomhjp, looking forward for further comments & doing final adjustments to the PR!

BTW Sometimes I find github really confusing 🤔 My response to ^^^ above ^^^ review comment is only visible in Files Changed view...

Copy link
Contributor

@tomhjp tomhjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay - I got part-way through a review and then got to the point where I was going to need to do some more involved testing for backwards compatibility/different deployment scenarios etc. I'm still yet to do that, so some of the comments are only semi-actionable. I'll leave what I've got here for now and revisit once I've had a chance to do the testing.

backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
caching_file_reader.go Outdated Show resolved Hide resolved
path_config_test.go Outdated Show resolved Hide resolved
path_config.go Show resolved Hide resolved
backend.go Show resolved Hide resolved
caching_file_reader_test.go Outdated Show resolved Hide resolved
@tsaarni
Copy link
Contributor Author

tsaarni commented Dec 7, 2021

BTW I think it could be nice if there was a test environment with "fake" Kubernetes API server, allowing more thorough testing without compiling this plugin into Vault and running that against actual Kubernetes, in cluster and outside.

Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach! Left a few comments, but so far it has tested well for me for both short and long-lived JWTs, and when using the client token as the reviewer JWT.

I think in the future we may want to expose the local JWT and CA paths as options, but I think we can worry about that in another PR when the need arises.

path_config.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
path_config.go Outdated Show resolved Hide resolved
path_config.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
@tsaarni
Copy link
Contributor Author

tsaarni commented Dec 14, 2021

I've realized that I should rework the lock handling a bit: I need to have write lock for backend, when loading local files and assigning them to variables in kubeAuthBackend.loadConfig().

backend.go Outdated Show resolved Hide resolved
@tsaarni
Copy link
Contributor Author

tsaarni commented Dec 18, 2021

I wrote my manual test procedures here https://gist.github.com/tsaarni/66b085e0244fe1b0635cc1f806b5c334

Copy link
Contributor

@tomhjp tomhjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the test updates are really nice. Thanks for contributing such a significant feature! I'll let @tvoran have one last look but I think we can land this soon.

Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and worked well for me when I did some testing. Just one more small suggestion and then I think it'll be good to go.

path_login.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

@tvoran tvoran merged commit d289258 into hashicorp:main Jan 7, 2022
tvoran pushed a commit that referenced this pull request Jan 18, 2022
When defaulting to local JWT token and CA certificate in a pod, always read
them from local filesystem and do not store them persistently with the config.

Token will be re-read periodically to avoid using expired token.

The change allows running Vault on Kubernetes 1.21 and newer, which switched
to ID token that is bound to the pod and will expire.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>

* review comment fix: load only token or ca cert if other is given in config

* changed the reload period to 1 minute

* fixed review comments

* take lock also on alias lookahead path

* more review fixes

* proposal to fix the read/write lock issue

* fixed typo

* cachedFile by value to avoid mutation while not holding log

* acquire lock in same place as in pathLogin

* added debug log entry when local token is not found and falling back to client token
tvoran added a commit that referenced this pull request Jan 19, 2022
When defaulting to local JWT token and CA certificate in a pod, always read
them from local filesystem and do not store them persistently with the config.

Token will be re-read periodically to avoid using expired token.

The change allows running Vault on Kubernetes 1.21 and newer, which switched
to ID token that is bound to the pod and will expire.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>

* review comment fix: load only token or ca cert if other is given in config

* changed the reload period to 1 minute

* fixed review comments

* take lock also on alias lookahead path

* more review fixes

* proposal to fix the read/write lock issue

* fixed typo

* cachedFile by value to avoid mutation while not holding log

* acquire lock in same place as in pathLogin

* added debug log entry when local token is not found and falling back to client token

Co-authored-by: Tero Saarni <tero.saarni@est.tech>
@zohebs341
Copy link

Hi All,

We are facing a similar issue with Vault in our production workloads & and creating downtime. Can someone assist me with the vault part? I mean resolution for this.

A few days back we upgraded our Azure AKS Cluster from 1.20.7 to 1.21.7 and updated "disable_iss_validation=true" in vault config. It worked for some time, but intermittently we are facing issues with application pods which are crashing with permission denied errors and in vault logs, I can see below errors.

After going through multiple issues/docs, what I understood is vault is not able to load/read dynamically the service account/JWT tokens of pods, as it's changing frequently/expiring every hour in K8s version 1.21.7. Please assist me on this
Error From Vault pod logs:

[ERROR] auth.kubernetes.auth_kubernetes_34b3b46a: login unauthorized due to: lookup failed: service account unauthorized; this could mean it has been deleted or recreated with a new token
Error From Application Pod Logs:

Error making API request.

Code: 403. Errors:

  • 1 error occurred:
  • permission denied
    (retry attempt 7 after "16s")

Config File:

Config File:

aultWriteAuthKubeConfig() {

echo && echo "[INFO]:: Writing Kubernetes auth config..."

vault write auth/kubernetes/config \

token_reviewer_jwt="$(cat ${KUBERNETES_SERVICE_TOKEN})" \

kubernetes_host=${KUBERNETES_HOST} \

kubernetes_ca_cert=@${KUBERNETES_CA_CERT} \

disable_iss_validation=true

@krishnamohan987
Copy link

@zohebs341 upgrade vault to V1.9.3 or + , it will support for dynamically reloading short-lived tokens.

@zohebs341
Copy link

@krishnamohan987 Thank you Krishna. We will try upgrading version.

@OneideLuizSchneider
Copy link

@krishnamohan987 Hi, the helm chart is still on v1.9.2, does it take a while to update there?

Screen Shot 2022-02-19 at 23 27 28

.

@krishnamohan987
Copy link

@OneideLuizSchneider you can override image version in values yaml while you apply the chart. it works. downlaod the image ready and push to your registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vault should support reloading service account token from disk
9 participants