Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault 1.4.0 won't start when a seal stanza is added in aws eks #8844

Closed
corbesero opened this issue Apr 24, 2020 · 16 comments
Closed

Vault 1.4.0 won't start when a seal stanza is added in aws eks #8844

corbesero opened this issue Apr 24, 2020 · 16 comments
Labels
bug Used to indicate a potential bug core/seal

Comments

@corbesero
Copy link

Describe the bug

When I add a seal stanza (awskms) to a vault configuration via the vault-helm chart (0.5.0), the vault does not become available in the containers.

To Reproduce

  1. I do a helm install without the seal stanza. Vaults (one active and one standby) will come up and I can unseal manually.
  2. Add the seal stanza to access an existing kms key
  3. Do a helm update
  4. Delete the pods so they get recreated with the new configuration.

Expected behavior

The vaults should come up so that I can do the unseal migrate.

Environment:
Vault 1.4.0
AWS EKS 1.15
vault-helm chart at tag 0.5.0

Vault server configuration file(s):

disable_mlock = true
ui = true
log_level = "trace"
listener "tcp" {
  tls_disable = 1
  address = "[::]:8200"
  cluster_address = "[::]:8201"
}
storage "raft" {
  path = "/vault/data"
  retry_join {
    leader_api_addr = "http://vault-0.vault-internal:8200"
  }
  retry_join {
    leader_api_addr = "http://vault-1.vault-internal:8200"
  }
}
seal "awskms" {
 region     = "us-east-1"
 kms_key_id = "7899be0b-3be8-4dd7-a5b5-32b85c02c406"
}

Also, see attached values file for helm

Additional context

No log output is produced.

This is the output of a ps on the container

/ $ ps awx | grep vault
    1 vault     0:00 /bin/sh -ec sed -E "s/HOST_IP/${HOST_IP?}/g" /vault/config/extraconfig-from-values.hcl > /tmp/storageconfig.hcl; sed -Ei "s/POD_IP/${POD_IP?}/g" /tmp/storageconfig.hcl; /usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/storageconfig.hcl  
    8 vault     0:00 {docker-entrypoi} /usr/bin/dumb-init /bin/sh /usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/storageconfig.hcl
    9 vault     0:00 vault server -config=/tmp/storageconfig.hcl
  208 vault     0:00 /bin/sh
  254 vault     0:00 ps awx
  255 vault     0:00 grep vault

I have attached the output of kubectl describe pods -n vault for the vault-0 and vault-1 pods. The vault-0 is when the pod is still there, but the vault-1 sows the output after a while when the container completely fails.

If I comment out the seal stanza, I can do a helm upgrade, delete the pods, and the new ones come up and can be unsealed.

kubectl-describe-pod-vault-0.txt
kubectl-describe-pod-vault-1.txt
values.yaml.txt

@calvn
Copy link
Member

calvn commented Apr 25, 2020

@corbesero thanks for opening a separate issue to follow up on this. We've done some initial investigation on our end, and believe that it might be an issue with the instance profile not being detected correctly.

Can you create IAM Access Keys (with the proper permissions), and provide them directly to test things out?

server:
  # extraSecretEnvironmentVars is a list of extra enviroment variables to set with the stateful set.
  # These variables take value from existing Secret objects.
  extraSecretEnvironmentVars:
  - envName: AWS_ACCESS_KEY_ID
    secretName: vault-aws
    secretKey: AWS_ACCESS_KEY_ID
  - envName: AWS_SECRET_ACCESS_KEY
    secretName: vault-aws
    secretKey: AWS_SECRET_ACCESS_KEY
​
  ha:
    enabled: true
    raft:
      enabled: true
      config: |
        ui = true
​
        listener "tcp" {
          tls_disable = 1
          address = "[::]:8200"
          cluster_address = "[::]:8201"
        }
​
        storage "raft" {
          path = "/vault/data"
          retry_join {
            leader_api_addr = "http://vault-0.vault-internal:8200"
          }
          retry_join {
            leader_api_addr = "http://vault-1.vault-internal:8200"
          }
        }
​
        service_registration "kubernetes" {}
​
        seal "awskms" {
          region     = "us-east-1"
          kms_key_id = "<aws-kms-key-id>"
        }

Side note: If you're doing this on a test environment, make sure to delete to volumes that were created on the last attempt (via kubectl delete pvc <name of claim>) since helm does not do that automatically.

@corbesero
Copy link
Author

I did that. I created a new AWS key pair with the policy to allow kms access, and vault did come up. I did not unseal it, but I saw log messages. This is not exactly an identical test, since I didn't go through the step of first letting it come up w/o the awskms seal. I will try that on Monday.

But, this does imply the vault is not happy depending on the profile of the instance role or pod OIDC from the service account.

We were really expecting to be able to use that feature since our other EKS services use that mechanism.

@corbesero
Copy link
Author

I can confirm my original scenario. I have an AWS access/secret key pair set in the helm and the secret in the namespace. If I create a vault without the unseal stanza, it can start up. If I then add the seal stanza, the new containers do come up. I as able to do the migrate, and afterwards the containers did seem to do the auto unseal correctly.

This seems to strongly imply that vault is having a problem coming up when depending on the instance or service account profile instead of an explicit AWS key configuration.

@calvn
Copy link
Member

calvn commented Apr 27, 2020

Thanks for doing the setup to verify things! I don't want to draw conclusions yet, but it may be related to #8847 (also an issue with instance profile metadata not being picked up).

@calvn calvn added the bug Used to indicate a potential bug label Apr 27, 2020
@corbesero
Copy link
Author

@calvn I think it is the same issue. I noticed #8847 recently too. When we were installing vault 1.3, the pods were only getting the instance profile of the worker node, not the role specified in the service account via the OIDC. Our Switching to 1.4 just didn't expose the underlying profile issue until I added the aws unseal, which completely broke the instance profile being used.

@inkblot
Copy link

inkblot commented May 2, 2020

I opened #8847. I am also using an AWS KMS seal with 1.4.0, and vault successfully uses the ECS task role and not the EC2 instance profile to acquire AWS credentials for using KMS to unseal. Even so, credential acquisition is not working for the AWS auth backend.

@rubroboletus
Copy link
Contributor

Any progress here? We have the same issue on AWS EKS 1.15 with OIDC mapped to serviceaccount. Creating new vault 1.4.1 from scratch, deploying from fresh git helm chart. SA is annotated according to EKS documentation, pods have AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE environment variables set, tokens are mounted, access to EC2 instance profile is disabled (drop any packets to 169.254.169.254).

@kalafut
Copy link
Contributor

kalafut commented May 15, 2020

My comment on the linked issue might apply here too: #8847 (comment)

@michaeljohnalbers
Copy link

michaeljohnalbers commented May 20, 2020

I'm seeing pretty much the same problem, but on a manually created Kubernetes cluster (not EKS, kops or anything, just plain EC2 instances with kubeadm). As soon as I add the awskms seal Vault starts but does not output any logs (regardless of log level) nor does it open the 8200 port.

I've tried attaching a completely wide open IAM role to the EC2 instance as well as using a secret key/access key pair for a role which is constrained to just the KMS operations listed in the docs. I verified these keys work with KMS operations when used with the aws cli.

Kubernetes version: 1.18.2
Vault Version: 1.4.0
Helm Chart version: 0.5.0

@chancez
Copy link

chancez commented Jul 1, 2020

I believe #7738 fixes this

@michaeljohnalbers
Copy link

@chancez I just tried the new 1.4.3 vault image and it doesn't appear to have fixed the issue. I'm still seeing the exact same symptoms as I described above. I'm also using version 0.6.0 of the Helm chart.

@chancez
Copy link

chancez commented Jul 11, 2020

I had to set AWS_ROLE_SESSION_NAME to make it work with 1.4.3

@kalafut
Copy link
Contributor

kalafut commented Jul 11, 2020

This is good info @chancez , and relates to #9415.

@tvoran
Copy link
Member

tvoran commented Jul 16, 2020

Now that #9416 has been merged to fix #9415, setting AWS_ROLE_SESSION_NAME should no longer be required in vault 1.4.4 and 1.5.0 (when they're released, that is).

@tvoran
Copy link
Member

tvoran commented Jul 30, 2020

Hi @corbesero, have you had a chance to try vault 1.5.0 to see if that resolves the issue? Or 1.4.3 w/AWS_ROLE_SESSION_NAME set?

@tvoran
Copy link
Member

tvoran commented Aug 20, 2020

Closing for now.

@tvoran tvoran closed this as completed Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/seal
Projects
None yet
Development

No branches or pull requests

8 participants