Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for non-EKS IRSA #3560

Open
thefirstofthe300 opened this issue Jun 27, 2022 · 15 comments · May be fixed by #4094 or #5109
Open

Support for non-EKS IRSA #3560

thefirstofthe300 opened this issue Jun 27, 2022 · 15 comments · May be fixed by #4094 or #5109
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@thefirstofthe300
Copy link
Contributor

/kind feature

IRSA appears to have been complete for EKS clusters in #2054. My company currently uses non-EKS clusters in combination with IRSA. I'd like the ability to install IRSA using the provider. The main piece that is tricky to coordinate is the certificate authority used to sign and validate the JWTs. If these pieces could be automated as part of the ignition config (we also use Flatcar), our burden of installation/maintenance would be greatly decreased.

Environment:

  • Cluster-api-provider-aws version: v1.4
  • Kubernetes version: (use kubectl version): v1.24
  • OS (e.g. from /etc/os-release): Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo)
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 27, 2022
@thefirstofthe300
Copy link
Contributor Author

I managed to POC using the built-in well-known OID configuration endpoint in the API server (kubernetes/kubernetes#98553); however, it currently requires bringing my own infra and knowing the CA cert's fingerprint. Here's my configuration:

  • BYO load balancer listening on port 443 and forwarding traffic to the API server
  • BYO load balancer security group with the proper configuration to allow traffic to the ELB on port 443
  • Add a cluster-role allowing unauthenticated users to fetch the /.well-known/openid-configuration endpoint kubectl create clusterrolebinding oidc-reviewer --clusterrole=system:service-account-issuer-discovery --group=system:unauthenticate
  • Create the OIDC provider with the CA cert's fingerprint

Unfortunately, I don't really have time to take a stab at contributing this feature to CAPA right now or I'd try to add this to CAPA myself.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2022
@thefirstofthe300
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2022
@luthermonson
Copy link
Contributor

I just had to spec this out and will code it into CAPA shortly. I'll add my approach and I would like some more eyes on it before I spend too much time coding. First, I'm basing more of my knowledge on how to even get this working on an example written by @smalltown located here: https://github.com/smalltown/aws-irsa-example and accompanying blog article: https://medium.com/codex/irsa-implementation-in-kops-managed-kubernetes-cluster-18cef84960b6

Approach

  1. Users will have to create an iaas cluster with bucket management turned on, this is a feature used in capa to manage an s3 bucket for ignition userdata and we will reuse this existing functionality to create/delete a bucket per cluster.
  2. Make associateOidcProvider: true work for iaas clusters, it will use cert manager to gen certs and add keys.json and .well-know/openid-configuration and make them public read and then create the oidc provider in IAM and generate a trust policy and store it in a config map which you can use later in your roles which is idtentical to how managed clusters work.
  3. Install https://github.com/aws/amazon-eks-pod-identity-webhook by taking the manifests and moving them into ./config, modify it to use a cert from cert-manager and have capa deploy it to the cluster.
  4. As per expectations in EKS' implementation, capa will not build the irsa roles for you. The user is expected to use the trust policy located in a configmap and do this themselves. The user will create the role with the trust policies they need for their service and add the contents from this configmap to make a working role. The user is also expected to take the arn from that role and use it as an annotation eks.amazonaws.com/role-arn when they make a service account and assign it to a pod.

@thefirstofthe300 please read the above and see if you agree with the approach, while I like your idea of using the ServiceAccountIssuerDiscovery but I feel having capa spin up any extra infra is a bit too much to ask of users just to get IRSA support when the s3 buckets are already being used/managed by capa and have free public URLs. The only pieces capa would need to manage are some certs, iam oidc provider, webhook and the contents of the files in s3.

@thefirstofthe300
Copy link
Contributor Author

Either way will work. I think the main question comes down to how certs should work.

In essence, the way I went about creating things is identical to the way CAPA creates it's infrastructure; however, instead of generating everything using port 6443 for the API server ELB serving port, it uses port 443. I'd think providing the ability to set the serving port to 443 over 6443 would be relatively simple in the CAPA code and remove the need for the BYO infra.

The two things I've had to shove into my configs are adding the root ca cert to the API serving cert chain using Ignition and adding the cluster binding to allow unauthenticated users to view the openid config.

Doing this removes the need to rely on S3 to serve the configuration.

The main piece that I have a question on is whether people want to serve the root cert with their serving chain. Ultimately, I don't think it opens any security holes as the certificates themselves are basically a public key and no change is made to the TLS private key.

I'd be interested in hearing some input from someone more in tune with the security side.

@dlipovetsky
Copy link
Contributor

How much of this can we solve with documenation vs. code and cluster/infrastructure component template changes?

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jan 23, 2023
@luthermonson
Copy link
Contributor

@dlipovetsky fair question but all the pieces together make this a complex feature as it mixes aws resources (s3, oidc provider) and kube resources (cert, deployment) and the one thing which knows about all those things is capa and can only be built at cluster creation time. When I get this written we will have feature parity between iaas and eks for IRSA and i feel it's only fair that capa does the work to make feature parity between the two cluster types

@Skarlso
Copy link
Contributor

Skarlso commented Feb 1, 2023

@dlipovetsky I think I agree with Luther. I don't believe a document change can handle this. The IRSA logic requires a fair amount of work between the pods, assigning things, creating connections, adding the webhook, generating certificates etc.

Honestly, this is quite a large piece of work. @luthermonson and @thefirstofthe300, do you think we could break this down into several more minor issues and use this issue as an epic to track each piece?

Such as creating the configuration, supporting 443, adding the webhook installation process, etc.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2023
@Skarlso
Copy link
Contributor

Skarlso commented May 6, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2023
@Skarlso
Copy link
Contributor

Skarlso commented May 6, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 6, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels May 5, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2024
@sl1pm4t sl1pm4t linked a pull request Aug 28, 2024 that will close this issue
5 tasks
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
6 participants