Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vault azure dynamic engine can leak role assignments. #118

Closed
dnozay opened this issue Dec 5, 2022 · 3 comments
Closed

vault azure dynamic engine can leak role assignments. #118

dnozay opened this issue Dec 5, 2022 · 3 comments

Comments

@dnozay
Copy link

dnozay commented Dec 5, 2022

There are 2 ways to use the azure engine wrt service principals:

  1. static AAD service principal - which is supplying the application object id.
  2. dynamic AAD service principal - which is you provide the role info, principal and role assignments get created.

So again, when using dynamic principals, the service principal is created, then a role assignment is done.
Sometimes this fail, and can fail consistently.

How we found this issue:

  • We knew about the AAD propagation delay about creation of service principal and wanted to make that more reliable
  • We use terraform to create the static AAD service principal and role assignment, this failed....
╷
│ Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="RoleAssignmentLimitExceeded" Message="No more role assignments can be created."
│ 
│   with module.azure-service-principal-xxxxxxxx-xxxxx-xxxx.azurerm_role_assignment.service-principal-built-in-roles["/providers/Microsoft.Management/managementGroups/xxxxx-xxxxx-xxxx-xxxx-xxxxxxxxxx.Name of my Service"],
│   on modules/cross_subscription_service_principal/main.tf line 109, in resource "azurerm_role_assignment" "service-principal-built-in-roles":
│  109: resource "azurerm_role_assignment" "service-principal-built-in-roles" {
│ 
╵

I suspect maybe we misconfigured permanently_delete option or that the chosen ttl can be an issue with hitting our RoleAssignment quota before old objects are deleted / GCed.

However what can happen if you use a kubernetes deployment and for some reason that deployment is failing, each restart is going to create a new service principal and a new role assignment, this can also lead to resource exhaustion.

So an operator may fix the leak by going to the azure portal, checking role assignments, deleting old ones, etc.
image

When role unassignment is performed, if it fails it does not retry:
https://github.com/hashicorp/vault-plugin-secrets-azure/blob/main/path_service_principal.go#L276-L296
This can also be a source of leaks.

@austingebauer
Copy link
Member

Hi @dnozay! We recently addressed some of the leaking role assignment concerns in #110. Does this address your concerns? If not, specific steps to reproduce leaking role assignments would be helpful here. Thanks!

@dnozay
Copy link
Author

dnozay commented May 9, 2023

@austingebauer - thanks for letting me know - I'll inform the team, got back from vacation recently.

@austingebauer
Copy link
Member

@dnozay - I'm going to close this issue as we believe we've fixed this. Feel free to reopen this or a new issue if you've discovered otherwise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants