Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Net.Sockets.Tests.SendPacketsAsync.SendPacketsElement_FileZeroCount_Success sometimes fails #60017

Closed
runfoapp bot opened this issue Oct 5, 2021 · 16 comments · Fixed by #63702
Closed
Labels
area-System.Net.Sockets os-windows test-bug Problem in test source code (most likely)
Milestone

Comments

@runfoapp
Copy link

runfoapp bot commented Oct 5, 2021

Runfo Tracking Issue: System.Net.Sockets.Tests.SendPacketsAsync.SendPacketsElement_FileZeroCount_Success sometimes fails

Build Definition Kind Run Name Console Core Dump Test Results Run Client
1553059 runtime PR 63737 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1552952 runtime PR 63441 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1552874 runtime PR 62740 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1550460 runtime PR 63726 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1550007 runtime PR 63546 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1549369 runtime PR 63696 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1548949 runtime PR 63633 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1548671 runtime PR 60368 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1547777 runtime PR 63653 net6.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1547409 runtime PR 62663 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1546282 runtime PR 63200 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1544839 runtime Rolling net6.0-windows-Release-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1543936 runtime PR 63558 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1543759 runtime PR 63556 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1543618 runtime PR 63445 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1543043 runtime PR 63538 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1542765 runtime PR 63320 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1542116 runtime PR 63520 net6.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1541126 runtime PR 63491 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1540432 runtime PR 62663 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1540241 runtime PR 63373 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1539554 runtime PR 62663 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1538109 runtime PR 62663 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1537712 runtime PR 63410 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1534603 runtime PR 62141 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1534132 runtime PR 63200 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1534080 runtime PR 62385 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1533865 runtime PR 63316 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1533865 runtime PR 63316 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1533736 runtime PR 63313 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1533278 runtime PR 63226 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1533023 runtime PR 63112 net6.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1532939 runtime Rolling net6.0-windows-Release-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1531686 runtime PR 63231 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1531595 runtime PR 63259 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1530239 runtime PR 63200 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1530089 runtime PR 62663 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1529646 runtime PR 63200 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1527655 runtime PR 63156 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1526282 runtime PR 63129 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1526233 runtime PR 63128 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1524360 runtime PR 62690 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1522824 runtime PR 63015 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1522110 runtime PR 63032 net6.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1519744 runtime PR 62982 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1518054 runtime PR 61196 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1517707 runtime PR 62934 net7.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py
1517293 runtime Rolling net7.0-windows-Release-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1515094 runtime PR 62757 net6.0-windows-Debug-x64-CoreCLR_release-Windows.11.Amd64.ClientPre.Open console.log runclient.py

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
3 17 47
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net.Sockets untriaged New issue has not been triaged by the area owner labels Oct 5, 2021
@ghost
Copy link

ghost commented Oct 5, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Runfo Creating Tracking Issue (data being generated)

Author: runfoapp[bot]
Assignees: -
Labels:

area-System.Net.Sockets, untriaged

Milestone: -

@scalablecory scalablecory added test-bug Problem in test source code (most likely) and removed untriaged New issue has not been triaged by the area owner labels Oct 6, 2021
@scalablecory scalablecory added this to the 7.0.0 milestone Oct 6, 2021
@agocke
Copy link
Member

agocke commented Oct 28, 2021

@danmoseley This appears to be flaky. Could we get retry enabled for this test?

@danmoseley
Copy link
Member

@karelz ? (BTW, what granularity do we enable retries? I had assumed it was eg "all networking tests")

@danmoseley
Copy link
Member

@antonfirsov just noting they're not all Windows 11.

@antonfirsov
Copy link
Member

Ah yeah missed that there are a bunch of Win-7 failures too. Interestingly Win7 only started to fail after 10-06. SendPacketsElement_FileStreamMultiPartMixed_MultipleFileStreams_Success (#58898) has the same pattern now, win7 failures appeared after 10-06.

@MattGal are you aware of any OS updates or any other changes to the Win7 queues around that date?

@MattGal
Copy link
Member

MattGal commented Oct 28, 2021

Ah yeah missed that there are a bunch of Win-7 failures too. Interestingly Win7 only started to fail after 10-06. SendPacketsElement_FileStreamMultiPartMixed_MultipleFileStreams_Success (#58898) has the same pattern now, win7 failures appeared after 10-06.

@MattGal are you aware of any OS updates or any other changes to the Win7 queues around that date?

Yes, we did change the image that day. Also, you can also tag @dotnet/dnceng for greater visibility if needed such as when I am out of the office.

On that day we moved from using Server 2008 R2 images from the Azure Gallery to "homemade" Win7-SP1-Enterprise "N" SKU images, because the 2008 ones had been removed from the Azure gallery, which blocks us from regenerating them. We will likely need to change one more time if and when we find the "long term, private support" version of Windows 7 images, but this change was unfortunately not one made by choice.

  • If you need assistance getting a repro machine via the HelixReproVMs DTL, ping us on the .NET Core Engineering "First Responders" channel and we'll do our best to unblock you.
  • If there is a configuration problem with this image that should be addressed (it ostensibly has the same artifacts, but started from a different base image), please file an issue on the dotnet/core-eng repo and tag me so I can make sure it get addressed.
  • One interesting thing about the "N" SKU of windows (not a choice, it's the image we had) is the exclusion of Internet Explorer from the image. This does have some impact on networking behavior for .NET / Powershell and might be a relevant clue.

@danmoseley
Copy link
Member

@antonfirsov should retries be helping here -- or perhaps it's failing on the retry?

@MattGal
Copy link
Member

MattGal commented Jan 12, 2022

I will say I'd rather not change stuff without an underlying theory of what we're changing making sense, but there is a slightly different Windows 7 client (non-N) SKU now available in the Azure Gallery (initial attempts to use it failed but this was invariant on the image we're now using, so it may be possible to try). Ideally I'd want to have confirmed using experimental VMs that this solves the problem first; that is, it needs to be expressable as "this OS needs change because..."

@wfurt
Copy link
Member

wfurt commented Jan 12, 2022

I would not cry if we disable this on Win7. I don't see big risk or value. I think we should debug on Win11 and understand what is going on.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 12, 2022
@antonfirsov
Copy link
Member

antonfirsov commented Jan 12, 2022

I believe there is some weird threading issue that only repros on the following queues:

windows.7.amd64.open.svc, windows.7.amd64.open.rt, windows.11.amd64.clientpre.open, windows.11.amd64.clientpre.open.svc

If #63702 works out, we will have a proof for this theory, and also a method to get clean runs on servicing branches, so I recommend to merge that PR ASAP if the SendPackets tests succeed after a couple of CI re-runs.

@dotnet/dnceng is there anything common in these Win7/Win11 queues? (Some hw characteristics like number/type of CPU-s maybe?)

@MattGal
Copy link
Member

MattGal commented Jan 12, 2022

@dotnet/dnceng is there anything common in these Win7/Win11 queues? (Some hw characteristics like number/type of CPU-s maybe?)

They're about as far apart code-wise as two Windowses can be. They do, however, all run on the same 2-core AMD EPIC Da_v4-series Azure VMs. You can use the repro machines DTL to get one yourself, since you are unlikely to have an EPYC sitting on your desk; some (rare) native-code cases, especially where SIMD or threading optimizations might occur, have noted slight variance between AMD and Intel.

@antonfirsov
Copy link
Member

They're about as far apart code-wise as two Windowses can be. They do, however, all run on the same 2-core AMD EPIC Da_v4-series Azure VMs.

This is what I suspected, would prove the threading issue idea. Are there other queues that run on AMD EPIC Da_v4 ? (And is there some easy way to list them ..?)

@MattGal
Copy link
Member

MattGal commented Jan 12, 2022

This is what I suspected, would prove the threading issue idea. Are there other queues that run on AMD EPIC Da_v4 ? (And is there some easy way to list them ..?)

Oops I was unclear. EVERY Helix test queue uses D2a_v4 currently, with a few rare exceptions like for Android emulators (which can't run there) or ones that use D4as. So, while there could be an AMD EPYC threading difference, it would need to also not reproduce for Windows 8.1 (really Server 2k12 R2) nor Windows 10 on identical hardware.

@danmoseley
Copy link
Member

Wild brainstorming -- perhaps there's a test that only runs on Win 10 (doesn't apply to Win7 and disabled for Win 11 for whatever reason) that when it runs somehow prevents the test failing?

@wfurt
Copy link
Member

wfurt commented Jan 12, 2022

I can reproduce easily on my 2 core Windows11 VM when running whole set in loop.

   System.Net.Sockets.Tests.SendPacketsAsync.SendPacketsElement_FileZeroCount_Success [FAIL]
      Timed out
      Expected: True
      Actual:   False
      Stack Trace:
        C:\Users\test\github\wfurt-runtime\src\libraries\System.Net.Sockets\tests\FunctionalTests\SendPacketsAsync.cs(691,0): at System.Net.Sockets.Tests.SendPacketsAsync.SendPackets(SendPacketsElement[] elements, SocketError expectedResult, Int32 bytesExpected, Byte[] contentExpected)
        C:\Users\test\github\wfurt-runtime\src\libraries\System.Net.Sockets\tests\FunctionalTests\SendPacketsAsync.cs(663,0): at System.Net.Sockets.Tests.SendPacketsAsync.SendPackets(SendPacketsElement element, Int32 bytesExpected, Byte[] contentExpected)
        C:\Users\test\github\wfurt-runtime\src\libraries\System.Net.Sockets\tests\FunctionalTests\SendPacketsAsync.cs(307,0): at System.Net.Sockets.Tests.SendPacketsAsync.SendPacketsElement_FileZeroCount_Success()
      Output:
        Created file C:\Users\test\AppData\Local\Temp\e53bawhk.vnf_.ctor_37 with size: 1024
      System.Net.Sockets.Tests.SendPacketsAsync.SendPacketsElement_FileLargeOffset_Throws [FAIL]
        Timed out
        Expected: True
        Actual:   False
        Stack Trace:
          C:\Users\test\github\wfurt-runtime\src\libraries\System.Net.Sockets\tests\FunctionalTests\SendPacketsAsync.cs(692,0): at System.Net.Sockets.Tests.SendPacketsAsync.SendPackets(SendPacketsElement[] elements, SocketError expectedResult, Int32 bytesExpected, Byte[] contentExpected)
          C:\Users\test\github\wfurt-runtime\src\libraries\System.Net.Sockets\tests\FunctionalTests\SendPacketsAsync.cs(669,0): at System.Net.Sockets.Tests.SendPacketsAsync.SendPackets(SendPacketsElement element, SocketError expectedResult, Int32 bytesExpected)
          C:\Users\test\github\wfurt-runtime\src\libraries\System.Net.Sockets\tests\FunctionalTests\SendPacketsAsync.cs(333,0): at System.Net.Sockets.Tests.SendPacketsAsync.SendPacketsElement_FileLargeOffset_Throws()
        Output:
          Created file C:\Users\test\AppData\Local\Temp\x0vakmxk.ayk_.ctor_37 with size: 1024

perhaps timeouts are too aggressive...?

@antonfirsov
Copy link
Member

@wfurt I wouldn't try to address this with timeouts, if I remember my previous investigation correctly it did not help. Will come back to it soon and investigate further.

Since we need a quick solution now: If you still have that repro config at your hands, can you do me a favor and check if the changes in #63702 help? If yes, can I get an approval on that PR, so we get clean CI?

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 15, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets os-windows test-bug Problem in test source code (most likely)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants