Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around https://github.com/dotnet/arcade/issues/7371 #7457

Merged
merged 1 commit into from
Jun 2, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os
import re
import sys
import time
import traceback
import logging
from queue import Queue
Expand Down Expand Up @@ -35,11 +36,9 @@ def __print(self, msg):
def __process(self, batch):
self.publisher.upload_batch(batch)
self.total_uploaded = self.total_uploaded + len(batch)
self.__print('uploaded {} results'.format(self.total_uploaded))

def run(self):
global workerFailed, workerFailedLock
self.__print("starting...")
while True:
try:
item = self.queue.get()
Expand Down Expand Up @@ -146,26 +145,20 @@ def main():
worker.daemon = True
worker.start()

log.info("Beginning to read test results...")
# https://github.com/dotnet/arcade/issues/7371 - trying to avoid contention for stdout
time.sleep(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sleep feels... very odd. The workers haven't actually done anything yet (we haven't even put anything into the queue). This is just adding 5 seconds to every job for no real reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar comment here #7457 (comment). The sleep should permit the threads to start up and wait for work. I think the bug this is working around was due to the threads not all starting, all work completing, and the process shut down while remaining threads started and wrote to the console. If the threads are no longer writing to the console, except when handling work items and we're confident in our work item synchronization, then maybe it doesn't help.

Another thing that comes to mind is we can probably avoid starting up worker threads if we know we're not going to use them. I wonder if a calculation can be done up front to determine maximum number of worker threads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code is being deleted in about a week, so I wouldn't bother with another PR unless this is critical. I'm pretty sure this sleep is uncessary, but I'd like to avoid toughing this dead-man-walking code any more than is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that conversation didn't make it to my inbox for some reason. Oh well, situation... somethinged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like just deleting the "starting" would have been enough, but a bit of waiting isn't the end of the world.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code is being deleted in about a week, so I wouldn't bother with another PR unless this is critical. I'm pretty sure this sleep is uncessary, but I'd like to avoid toughing this dead-man-walking code any more than is necessary.

I had lots of conversations before you joined, and due to the nature of dependency flow in Arcade It takes a very long time to try one thing, so I'm trying multiple things. Between the amount of deleted logging and the sleep I think we have a very good chance of not dealing with this any more until it's all replaced on 6/10 ("About a week").


# In case the user puts the results in HELIX_WORKITEM_UPLOAD_ROOT for upload, check there too.
all_results = read_results([os.getcwd(),
get_env("HELIX_WORKITEM_UPLOAD_ROOT")])
all_results = read_results([os.getcwd(), get_env("HELIX_WORKITEM_UPLOAD_ROOT")])

batch_size = 1000
batches = batch(all_results, batch_size)

log.info("Uploading results in batches of size {}".format(batch_size))

for b in batches:
q.put(b)

log.info("Main thread finished queueing batches")

q.join()

log.info("Main thread exiting")

with workerFailedLock:
if workerFailed:
if check_passed_to_workaround_ado_api_failure([os.getcwd(), get_env("HELIX_WORKITEM_UPLOAD_ROOT")]):
Expand Down