Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

Actively tail worker stdio from supervisor agent #588

Merged
17 commits merged into from
Feb 26, 2021

Conversation

ranweiler
Copy link
Member

@ranweiler ranweiler commented Feb 23, 2021

In the supervisor agent, incrementally read from the running worker agent's redirected stderr and stdout, instead of waiting until it exits.

The worker agent's stderr and stdout are piped to the supervisor when tasks are run. The supervisor's WorkerRunner does not use wait_with_output(), which handles this (at the cost of blocking). Instead, it makes repeated calls to to try_wait() on timer-based state transitions, and does not try to read the pipes until the worker exits. But when one of the child's pipes is full, the child can block forever waiting on a write(2), such as in a log facade implementation.

This bug has not been caught because we control the child worker agent, and until recently, it mostly only wrote to these streams using env_logger at its default log level. But recent work: (1) set more-verbose INFO level default logging, (2) logged stderr/stdout lines of child processes of the worker, and (3) some user targets logged very verbosely for debugging. This surfaced the underlying issue.

@ranweiler
Copy link
Member Author

This is probably the root cause for #542.

@ranweiler ranweiler changed the title Don't pipe worker stdio to supervisor agent Actively tail worker stdio from supervisor agent Feb 23, 2021
@ranweiler ranweiler marked this pull request as draft February 23, 2021 23:43
@ranweiler ranweiler marked this pull request as ready for review February 26, 2021 01:36
@ghost
Copy link

ghost commented Feb 26, 2021

Hello @bmc-msft!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 419ca05 into microsoft:main Feb 26, 2021
@ranweiler ranweiler deleted the worker-stdio branch February 26, 2021 20:17
@bmc-msft bmc-msft linked an issue Mar 1, 2021 that may be closed by this pull request
@ghost ghost locked as resolved and limited conversation to collaborators Apr 16, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LibFuzzer coverage task failed to synchronize coverage to container
2 participants