Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auth/mount: fetch single vault resource on read #2145

Merged
merged 5 commits into from
Feb 26, 2024

Conversation

fairclothjm
Copy link
Contributor

@fairclothjm fairclothjm commented Feb 16, 2024

Description

This changes the READ operations in vault_auth_backend and vault_mount to use the GET /sys/auth/:path and GET /sys/mounts/:path APIs respectively. Previously, these resources were calling LIST which could result in a substantially higher CPU and memory footprint for the provider in cases where a given Vault server has a large number of secret/auth mounts.

At this time, there are no helpers for these API paths in the Vault api package. See hashicorp/vault#25499 which proposes to add them.

TF Config used in the performance tests
terraform {
  required_providers {
    vault = {
      source  = "hashicorp/vault"
      version = "~> 3.23.0"
    }
  }
}

provider "vault" {
  address = "http://localhost:8200"
}

variable "mount_name_count" {
  type    = number
  default = 1000
}

resource "vault_mount" "kvv2-example" {
  count = var.mount_name_count
  path  = "kv-mount${count.index}"
  type  = "kv-v2"
  options = {
    version = "2"
    type    = "kv-v2"
  }
  description = "This is an example KV Version 2 secret engine mount"
}

resource "vault_auth_backend" "userpass-example" {
  count = var.mount_name_count
  type  = "userpass"
  path  = "userpass${count.index}"
  tune {
    max_lease_ttl      = "90000s"
    listing_visibility = "unauth"
  }
  description = "This is an example userpass auth mount"
}

Details on the CPU performance improvements

As expected, the CPU profile shows the biggest improvements in the Provider's READ operations which were spending most of the CPU time listing and decoding mount data.

pprof CPU profile flame graph

Before:
before-cpu

After:
after-cpu

Additionally, from the CPU profile we can see a big difference in the time spent in the call to mallocgc:

Before:

      flat  flat%   sum%        cum   cum%
     2.33s  2.94% 49.05%      8.15s 10.29%  runtime.mallocgc

After:

      flat  flat%   sum%        cum   cum%
     0.07s  0.79% 88.25%      0.27s  3.05%  runtime.mallocgc

So let's take a look at the pprof memory profile...

Details on the Memory performance improvements

Interestingly, the pprof memory profile does not shed any light on the performance improvements that we would predict based on the previous CPU profile analysis. However, both before and after point to the AWS SDK init functions as being very memory hungry.

pprof Memory profile

Before:

(pprof) top
Showing nodes accounting for 5680.34kB, 100% of 5680.34kB total
Showing top 10 nodes out of 33
      flat  flat%   sum%        cum   cum%
 1544.18kB 27.18% 27.18%  1544.18kB 27.18%  github.com/aws/aws-sdk-go/aws/endpoints.init
 1026.81kB 18.08% 45.26%  1026.81kB 18.08%  regexp.onePassCopy
  544.67kB  9.59% 54.85%   544.67kB  9.59%  regexp/syntax.(*compiler).inst (inline)
  516.01kB  9.08% 63.93%   516.01kB  9.08%  regexp/syntax.appendRange
  512.20kB  9.02% 72.95%   512.20kB  9.02%  runtime.malg
  512.16kB  9.02% 81.97%   512.16kB  9.02%  github.com/hashicorp/terraform-provider-vault/vault.approleAuthBackendLoginResource
  512.16kB  9.02% 90.98%   512.16kB  9.02%  github.com/hashicorp/terraform-provider-vault/vault.getCommonManagedKeysSchema
  512.16kB  9.02%   100%   512.16kB  9.02%  github.com/hashicorp/terraform-provider-vault/vault.identityOIDCOpenIDConfigDataSource

After:

(pprof) top
Showing nodes accounting for 8218.13kB, 84.25% of 9754.22kB total
Showing top 10 nodes out of 65
      flat  flat%   sum%        cum   cum%
 3075.29kB 31.53% 31.53%  3075.29kB 31.53%  github.com/aws/aws-sdk-go/aws/endpoints.init
 1028.88kB 10.55% 42.08%  1028.88kB 10.55%  regexp.onePassCopy
  520.04kB  5.33% 47.41%   520.04kB  5.33%  go.opencensus.io/stats/view.NewMeter
  517.33kB  5.30% 52.71%   517.33kB  5.30%  regexp/syntax.(*compiler).inst
  514.63kB  5.28% 57.99%   514.63kB  5.28%  cloud.google.com/go/monitoring/apiv3/v2/monitoringpb.init
  512.88kB  5.26% 63.24%   512.88kB  5.26%  unicode.map.init.0
  512.56kB  5.25% 68.50%   512.56kB  5.25%  encoding/json.typeFields
  512.20kB  5.25% 73.75%   512.20kB  5.25%  runtime.malg
  512.16kB  5.25% 79.00%   512.16kB  5.25%  github.com/hashicorp/terraform-provider-vault/vault.azureAuthBackendRoleResource
  512.16kB  5.25% 84.25%   512.16kB  5.25%  github.com/hashicorp/terraform-provider-vault/vault.tokenResource

Relates OR Closes #0000

Checklist

  • Added CHANGELOG entry (only for user-facing changes)
  • Acceptance tests where run against all supported Vault Versions

@fairclothjm
Copy link
Contributor Author

Once hashicorp/vault#25499 is merged and the API is tagged we can pull that in

@fairclothjm fairclothjm added this to the 3.26.0 milestone Feb 20, 2024
@fairclothjm fairclothjm requested a review from a team February 22, 2024 19:50
Copy link
Contributor

@vinay-gopalan vinay-gopalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including the details in the memory profiles! The improvements are visually very clear to understand via the flame graphs, appreciate it

Had some questions around how some of the changes might affect different token policy cases, and wondering if we should document any changes in token policies for LIST vs GET on sys paths. Excited about adding these improvements 😄

vault/resource_mount.go Outdated Show resolved Hide resolved
Copy link
Contributor

@vinay-gopalan vinay-gopalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* add mountutil and GetMount

* fix nomad backend existence check

* fix context arg

* use GET instead of LIST for auth mount fetching

* normalize error response

* changelog

* fix github auth path

* fix auth existence checking

* use contexts and add log

* fix build and rebase

* make golangci-lint happy
@github-actions github-actions bot added size/XL and removed size/S labels Feb 26, 2024
@fairclothjm fairclothjm merged commit 48ef44c into main Feb 26, 2024
12 of 13 checks passed
@fairclothjm fairclothjm deleted the VAULT-21523/get-mount branch February 26, 2024 21:48
@pree
Copy link

pree commented Mar 15, 2024

Just encountered an issue after upgrading the vault provider that my data resource for vault_auth_backend now errors with * permission denied.

I had to extend the policy my policy with sudo now for this to work:

path "sys/auth*" {
  capabilities = ["read", "list", "sudo"]
}

This seems like overkill, especially as it's not documented in the vault API docs.
Also, the lack of a breaking change documentation isn't nice either ...

@fairclothjm
Copy link
Contributor Author

@pree Hello, I am sorry you are having trouble! Thanks for pointing this out. We did mention policy changes in the Changelog and we have published a v4.0.0 Upgrade Guide but it looks like we missed this nuance.

I think we may be able get a patch in to remove the sudo requirement, however you will still need to update your policy path to use sys/mounts/auth/*.

@pree
Copy link

pree commented Mar 15, 2024

Thanks for the reply @fairclothjm. I would love a # Breaking Change header in the Changelog to make this more clear.
Removing the sudo requirement would be the right approach imho.

@kkronenb
Copy link

I have started to play around with this version and so far, it is a night and day difference in our environment. Significantly less CPU (100%->40%) and roughly 3x faster to run though all of our resources.

Thank you so much and kudos @fairclothjm

@fairclothjm
Copy link
Contributor Author

Thanks and glad to hear that @kkronenb !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants