Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"coreclr Pri0 Runtime Tests Run windows arm64 check" timeouts #72182

Closed
EgorBo opened this issue Jul 14, 2022 · 6 comments
Closed

"coreclr Pri0 Runtime Tests Run windows arm64 check" timeouts #72182

EgorBo opened this issue Jul 14, 2022 · 6 comments

Comments

@EgorBo
Copy link
Member

EgorBo commented Jul 14, 2022

This JOB timeouts in various PRs, e.g.: #72141, #72168, #72084, #72167

@EgorBo EgorBo added the tenet-performance Performance related issue label Jul 14, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 14, 2022
@EgorBo EgorBo added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' and removed tenet-performance Performance related issue labels Jul 14, 2022
@ghost

This comment was marked as off-topic.

@ghost
Copy link

ghost commented Jul 14, 2022

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

This JOB timeouts in various PRs, e.g.: #72141, #72168, #72084, #72167

Author: EgorBo
Assignees: -
Labels:

area-Infrastructure-coreclr, blocking-clean-ci, untriaged

Milestone: -

@tlakollo
Copy link
Contributor

All the timeouts seem to come from a machine in particular DDARM64-104 in the windows.10.arm64v8.open pool. Contacted first responders to get more information about it.

@MattGal
Copy link
Member

MattGal commented Jul 18, 2022

The problems with DDARM64-104 are already resolved as of Thursday, but the issue it's hitting is dotnet/arcade#10013. We'll introduce a reboot when this is seen in the future to make it more resilient but the actual root cause (other than "ARM64 machine has slightly too much uptime") is not really known.

That said, the problem DDARM64-104 experienced, while irritating, can't cause timeouts. YOu can even see in Kusto there hasn't been a single work item timeout in the last 3000 work items it ran:

WorkItems
| where MachineName == "DDARM64-104"
| top 3000 by Finished desc 
| summarize count() by Status

as well as the longest any work item with this exit code running , in history, being about 8 minutes:

WorkItems
| where MachineName == "DDARM64-104"
| where ExitCode == -1073741502
| extend rantime = Finished-Started
| summarize avg(rantime), min(rantime), max(rantime)

So, if your jobs are having timeouts, pick a specific one and share with me its correlation id and we can investigate together.

@tlakollo tlakollo removed blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' untriaged New issue has not been triaged by the area owner labels Jul 18, 2022
@agocke
Copy link
Member

agocke commented Jul 19, 2022

Sounds like this has been resolved.

@agocke agocke closed this as completed Jul 19, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants