You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When rotating keys, Vault's Identity engine creates a new key and immediately starts using it for new JWTs. This means that all applications that are validating Vault JWTs must refresh their keys immediately, which causes a stampeding herd of requests for public keys. Instead, it should create a new key long before it starts using it. This spreads the refresh load on Vault over time. It also makes applications that use these keys less linked to Vault being available. If the keys are rotated early, then clients can tolerate Vault downtime.
My personal policy suggestion: Vault should always publish 3 keys: the next key, the current key, and the previous one. This is more than strictly necessary, but means that clients can just refresh every rotation_period and always be sure to get the keys that are needed.
Related minor issues:
Vault should probably randomize the Cache-Control header's max-age parameter that it returns to clients to some interval after the next key will be available. Today, it returns the same expiration time to all clients. Again, this causes a thundering herd at the same moment.
Vault returns Cache-Control: no-store between when one key has expired and the next has been generated. This means a "well-behaved" client will need to hit vault on every request to get public keys! I observed two rotations one where this period of time lasted around 2 minutes. Generating the "next" key before it is needed will also mean Vault can always return some "reasonable" max-age, limiting the amount of requests clients need to make to read keys.
Details
I wrote a program that requests JWTs from Vault every minute, and loads the public keys and prints the Cache-Control header. This is what I observed at the rotation boundary:
Before expiration
2020/06/10 23:06:40 vault issued token for role=... kid=d4171f46-17ed-882f-9e2d-c0de753048ac ttl=3600
2020/06/10 23:06:40 public keys Cache-Control:max-age=28 kids:17e89607-4e00-24c1-4465-85d8971d0b83,37ff698e-ae60-7c9b-10f4-597578d7b848,77b0a6a6-7501-40c0-474b-89ade66f7ac2,d4171f46-17ed-882f-9e2d-c0de753048ac
The first line shows that it issued a token using kid=d4171f46-17ed-882f-9e2d-c0de753048ac with a TTL of 3600 seconds. The second link shows the KIDs that were in the public document. This response is only cachable for 28 seconds.
After expiration
2020/06/10 23:07:40 vault issued token for role=... kid=d4171f46-17ed-882f-9e2d-c0de753048ac ttl=3600
2020/06/10 23:07:40 public keys Cache-Control:no-store kids:17e89607-4e00-24c1-4465-85d8971d0b83,37ff698e-ae60-7c9b-10f4-597578d7b848,77b0a6a6-7501-40c0-474b-89ade66f7ac2,d4171f46-17ed-882f-9e2d-c0de753048ac
This shows Vault is still issuing new tokens with kid= kid=d4171f46-17ed-882f-9e2d-c0de753048ac, but has not generated new keys yet. The Cache-Control header says that client must not store this response at all.
2020/06/10 23:08:40 vault issued token for role=... kid=c1c52c48-6ff2-b7d2-5b74-43cb8a7d7b56 ttl=3600
2020/06/10 23:08:40 public keys Cache-Control:max-age=77368 kids:37ff698e-ae60-7c9b-10f4-597578d7b848,77b0a6a6-7501-40c0-474b-89ade66f7ac2,c1c52c48-6ff2-b7d2-5b74-43cb8a7d7b56,d4171f46-17ed-882f-9e2d-c0de753048ac
Vault is now issuing JWTs with the new kid=c1c52c48-6ff2-b7d2-5b74-43cb8a7d7b56. This key now appears in the public list, and kid=17e89607-4e00-24c1-4465-85d8971d0b83 was removed. The max-age is back to a reasonable amount of time.
Environment:
Vault Server Version (retrieve with vault status): 1.4.1
The text was updated successfully, but these errors were encountered:
@evanj - Thanks again for opening this issue. We ended up addressing a number of the concerns here in #12414. There is some additional context in #11438. I'm going to close this issue now. Feel free to reopen if you feel we've missed anything important!
When rotating keys, Vault's Identity engine creates a new key and immediately starts using it for new JWTs. This means that all applications that are validating Vault JWTs must refresh their keys immediately, which causes a stampeding herd of requests for public keys. Instead, it should create a new key long before it starts using it. This spreads the refresh load on Vault over time. It also makes applications that use these keys less linked to Vault being available. If the keys are rotated early, then clients can tolerate Vault downtime.
My personal policy suggestion: Vault should always publish 3 keys: the next key, the current key, and the previous one. This is more than strictly necessary, but means that clients can just refresh every
rotation_period
and always be sure to get the keys that are needed.Related minor issues:
Vault should probably randomize the
Cache-Control
header'smax-age
parameter that it returns to clients to some interval after the next key will be available. Today, it returns the same expiration time to all clients. Again, this causes a thundering herd at the same moment.Vault returns
Cache-Control: no-store
between when one key has expired and the next has been generated. This means a "well-behaved" client will need to hit vault on every request to get public keys! I observed two rotations one where this period of time lasted around 2 minutes. Generating the "next" key before it is needed will also mean Vault can always return some "reasonable" max-age, limiting the amount of requests clients need to make to read keys.Details
I wrote a program that requests JWTs from Vault every minute, and loads the public keys and prints the
Cache-Control
header. This is what I observed at the rotation boundary:Before expiration
The first line shows that it issued a token using
kid=d4171f46-17ed-882f-9e2d-c0de753048ac
with a TTL of 3600 seconds. The second link shows the KIDs that were in the public document. This response is only cachable for 28 seconds.After expiration
This shows Vault is still issuing new tokens with kid=
kid=d4171f46-17ed-882f-9e2d-c0de753048ac
, but has not generated new keys yet. TheCache-Control
header says that client must not store this response at all.Vault is now issuing JWTs with the new
kid=c1c52c48-6ff2-b7d2-5b74-43cb8a7d7b56
. This key now appears in the public list, andkid=17e89607-4e00-24c1-4465-85d8971d0b83
was removed. Themax-age
is back to a reasonable amount of time.Environment:
vault status
): 1.4.1The text was updated successfully, but these errors were encountered: