Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Error listing tags for Config Rule in Secure-baselines #6486

Closed
richgreen-moj opened this issue Mar 14, 2024 · 9 comments
Closed

Bug: Error listing tags for Config Rule in Secure-baselines #6486

richgreen-moj opened this issue Mar 14, 2024 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@richgreen-moj
Copy link
Contributor

richgreen-moj commented Mar 14, 2024

Expected Behavior

The Terraform: Scheduled Baseline workflow hasn't run completely successfully for about a week. In particular the secure-baselines jobs have not all been succeeding due to an error when trying to list tags for config rule sin a terraform plan.

Actual Behavior

Example of error:
Error: listing tags for Config Config Rule (arn:aws:config:eu-west-2:767123802783:config-rule/config-rule-mzxknb): operation error Config Service: ListTagsForResource, failed to get rate limit token, retry quota exceeded, 3 available, 5 requested

See Slack thread for more info.

Steps to Reproduce the Problem

No response

Version

No response

Modules

No response

Account

No response

@richgreen-moj richgreen-moj added the bug Something isn't working label Mar 14, 2024
@richgreen-moj richgreen-moj self-assigned this Mar 14, 2024
@richgreen-moj
Copy link
Contributor Author

See this TF AWS provider bug which seems to be related hashicorp/terraform-provider-aws#34669

@richgreen-moj
Copy link
Contributor Author

Also see this... hashicorp/terraform-provider-aws#36024

@richgreen-moj
Copy link
Contributor Author

I have raised #6494 to pin the TF provider to v5.38.0 as we didn't have this sort of failure occurring previously. This worked successfully under this run ... https://github.com/ministryofjustice/modernisation-platform/actions/runs/8341764086
See v5.38.0 in TF init....
https://github.com/ministryofjustice/modernisation-platform/actions/runs/8341764086/job/22828878548#step:6:144

@richgreen-moj
Copy link
Contributor Author

I've run a job on main branch (so latest TF provider v5.41.0 e.g. https://github.com/ministryofjustice/modernisation-platform/actions/runs/8342287044/job/22830537939#step:6:145) and it's failed again

https://github.com/ministryofjustice/modernisation-platform/actions/runs/8342287044 96 out of 175 secure-baseline jobs failed.

@richgreen-moj
Copy link
Contributor Author

PR #6494 has been merged to main which has pinned the provider for the secure-baseline code to v5.38.0

This has added some stability to the job; it runs through on a single attempt.

I'm considering whether or not that's enough for now, I think that something must have been introduced by the provider which has been acknowledged here hashicorp/terraform-provider-aws#36024 and will hopefully be addressed in future.

I did try some of the suggested workarounds e.g. setting retry_mode to adaptive and also setting token_bucket_rate_limiter_capacity to a very large number but these suggestions didn't seem to help.

@richgreen-moj
Copy link
Contributor Author

Agreed to keep this ticket open for now to test that this change has added stability and in case there is any progress on the underlying issue with the TF provider. I will take a look again next week.

@richgreen-moj
Copy link
Contributor Author

richgreen-moj commented Mar 25, 2024

Since the TF AWS Provider bug mentioned above has now been closed an integrated to v5.42.0 I am testing it out on a scheduled baseline run https://github.com/ministryofjustice/modernisation-platform/actions/runs/8419490112 on branch fix/remove-pinned-provider-in-secure-baselines

^^^ This test failed with similar errors to before e.g.

Error: listing tags for Config Config Rule (arn:aws:config:eu-west-2:172753231260:config-rule/config-rule-kwxmg0): operation error Config Service: ListTagsForResource, exceeded maximum number of attempts, 25, https response error StatusCode: 400, RequestID: c17f1153-756e-4968-bd57-e38a51792366, api error ThrottlingException: Rate exceeded

@richgreen-moj
Copy link
Contributor Author

richgreen-moj commented Mar 25, 2024

In this PR I tried setting max_retries to 100 (the default is 25) which was being hit in the previous run.

It worked ... https://github.com/ministryofjustice/modernisation-platform/actions/runs/8422101284

twice... https://github.com/ministryofjustice/modernisation-platform/actions/runs/8422873826

@richgreen-moj
Copy link
Contributor Author

#6598 is merged to main and ran successfully ...
https://github.com/ministryofjustice/modernisation-platform/actions/runs/8435420976

I think this should be enough to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

1 participant