Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove resource group asynchronously and do not wait for completion #1146

Merged
merged 3 commits into from
Oct 29, 2020
Merged

Remove resource group asynchronously and do not wait for completion #1146

merged 3 commits into from
Oct 29, 2020

Conversation

benbp
Copy link
Member

@benbp benbp commented Oct 27, 2020

Since we have background jobs that poll for zombie resource groups already, we don't need to block on resource group deletion. This should save us a couple minutes in our live test pipelines.

Example improvement on a test pipeline (the step takes 4 seconds now):
https://dev.azure.com/azure-sdk/internal/_build/results?buildId=593112&view=logs&j=011e1ec8-6569-5e69-4f06-baf193d1351e&t=5431112d-2b61-5a5f-7042-ef698f761043

I also verified in the activity log that we had a successful deletion event.

Resolves #754

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
js - template
net - template
python - template
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@mikeharder
Copy link
Member

This will be a huge improvement. Some pipelines like java - cosmos - tests will be shortened by over 10 minutes:

https://dev.azure.com/azure-sdk/internal/_build/results?buildId=592562&view=logs&j=e7056bdf-d539-5f73-0825-e93d19e15e58&t=4e5e84e6-f9ed-56c3-ca81-9d8dc79b00ad

@@ -138,8 +138,8 @@ if (![string]::IsNullOrWhiteSpace($ServiceDirectory)) {
}

Log "Deleting resource group '$ResourceGroupName'"
if (Retry { Remove-AzResourceGroup -Name "$ResourceGroupName" -Force:$Force }) {
Write-Verbose "Successfully deleted resource group '$ResourceGroupName'"
if (Remove-AzResourceGroup -Name "$ResourceGroupName" -Force:$Force -AsJob) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct option? Wouldn't this fail most of the time as soon as the pwsh process is closed? Or would it block the PS process from shutting down?

I was hoping there was some async call we could make on the azure side to start this process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You and me both @weshaggard! I was surprised, because the python azure cli has --no-wait, but the powershell cmdlets only have AsJob, so I was frustrated by the inconsistency.

That said, I was worried about this same thing. In my testing I found that the below code was sufficient enough to trigger a resource group deletion. I'll do a little more investigation though, as I'd like to know whether we're just getting lucky with timing, or if it only delegates the async polling to the background job.

Remove-AzResourceGroup -Name <rg> -Force -AsJob ; Exit

What I'm removing is a retry on some 4xx/5xx failures, but I figured it would be sufficient to have the background poller remove those. That said, if we see some persistent ARM issues, it could cause a lot of resources to pile up, though I would argue that's a better case than blocking the pipeline timing anyway (and as @mikeharder suggested somewhere we may want to bump the deletion poller frequency anyway).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice it might not matter, but to be safe I think we should block until Get-AzResourceGroup returns ProvisioningState: Deleting. This is very fast (usually less than one second) and confirms the service has received the request. Here's a timeline:

> New-AzResourceGroup -name foo -location westus2

> Get-AzResourceGroup -name foo
ProvisioningState : Succeeded

> Remove-AzResourceGroup -Name foo -Force -AsJob

> Get-AzResourceGroup -name foo
ProvisioningState : Deleting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this. There are a couple scenarios to handle re: making sure we don't block on waiting for "Deleting" when it will never reach that state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please check out the latest changes.

Copy link
Member

@mikeharder mikeharder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should block until Get-AzResourceGroup returns ProvisioningState: Deleting.

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
js - template
net - template
python - template
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
js - template
net - template
python - template
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@benbp
Copy link
Member Author

benbp commented Oct 29, 2020

/check-enforcer override

@benbp benbp merged commit 6e902f3 into Azure:master Oct 29, 2020
@benbp benbp deleted the benbp/remove-resources-async branch October 29, 2020 23:26
@weshaggard
Copy link
Member

@benbp for future attempts you should be able to merge the miss-behaving sync PRs nd then re-run the step in sync pipeline and it should turn green. We should try to do that instead of overriding check-enforcer if possible.

@benbp
Copy link
Member Author

benbp commented Oct 30, 2020

@weshaggard thanks, I will do that next time!

benbp added a commit that referenced this pull request Dec 1, 2020
…1146)

* Remove resource group asynchronously and do not wait for completion

* Verify resource group has reached Deleting state before exiting script

* Use $_ instead of $Error[0] for resource group removal handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource Cleanup Improvements
5 participants