Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows 10 Helix machines not heartbeating due to failure while importing _rust #1884

Closed
3 tasks
v-parose opened this issue Jan 24, 2024 · 6 comments
Closed
3 tasks

Comments

@v-parose
Copy link

Hello dotnet team, while checking the Helix machines I noticed that most of the machines in the "windows.10.amd64.android.open" queue are not heart beating and are in a reboot loop. Looking at the machines they have the below error in the Helix agent cmd window.

image

Looking at the heartbeat table I see all the affected machines have the same "LastJobStartTime" of 2024-01-24T15:07:09Z.

Affected machines are DNCENGWIN-094 to DNCENGWIN-122 if you need to connect to troubleshoot. Please let me know if you need any more info. Thank you!

Release Note Category

  • Feature changes/additions
  • Bug fixes
  • Internal Infrastructure Improvements

Release Note Description

@garath
Copy link
Member

garath commented Feb 6, 2024

Current Heartbeat table data shows that only one of the 38 machines in this queue have reported in since Jan 29. Only dncengwin-105 has beat recently, but doesn't seem to be taking jobs.

@garath
Copy link
Member

garath commented Feb 6, 2024

I'm not able to KVM to any of the machines listed in this issue. If they're down or in a reboot loop, that makes sense. I can't rule out that I'm just not holding KVM right, though.

@garath
Copy link
Member

garath commented Feb 6, 2024

@v-parose without any ability to get to the machine to debug, I think the best next step is to rebuild one of the machines from scratch and see if it ends up in this state again. Is that something you can start? Do you need me to open an ICM for it?

@v-parose
Copy link
Author

v-parose commented Feb 6, 2024

Hi @garath There is a known issue with KVM on these machines MLS is still looking into so I can't get to them either. Also I'm not sure this is an issue that needs to be looked at further? I did receive a separate ticket to start reimaging these machines to Windows 11 and get them added to the "windows.11.amd64.android.open" queue. @ilyas1974 can you confirm that the "windows.10.amd64.android.open" queue is being fully deprecated?

My reimage ICM: https://portal.microsofticm.com/imp/v3/incidents/details/457737520/home

@garath
Copy link
Member

garath commented Feb 6, 2024

I did receive a separate ticket to start reimaging these machines to Windows 11

Ah! Yes, let's let @ilyas1974 update status!

@garath
Copy link
Member

garath commented Feb 9, 2024

I spoke with Ilya and indeed all machines are going to the Windows 11 queue. I'm closing this issue as there's no work to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants