-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] Improve isTaskManagerReady checks for telemetry #52446
[Telemetry] Improve isTaskManagerReady checks for telemetry #52446
Conversation
@@ -74,7 +74,9 @@ function addEvents(prevEvents: Record<string, number>, newEvents: Record<string, | |||
} | |||
|
|||
async function isTaskManagerReady(server: Server) { | |||
return (await getLatestTaskState(server)) !== null; | |||
const result = await getLatestTaskState(server); | |||
const runs = get(result, '[0].state.runs', 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have this same line in a separate file, const runs = get(result, '[0].state.runs', 0);
, and it seems specific enough to warrant a helper function with a comment. For example, I'm not entirely sure what it's doing, why, or whether or not it's correct, so a function + comment would be helpful.
isTaskManagerReady
and runs being greater than 0 seem like orthogonal things to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, and honestly, this is an open discussion about a right fix for this.
isTaskManagerReady
could be renamed to something like isReadyToCollectTelemetry
and that can return what we really care about - if we can now query the task manager and it will return us the telemetry data. We don't want to query the task manager too soon (before the task actually ran), but this idea of too soon is something I just recently observed in testing. Perhaps this is not the root issue?
Tests will continue to fail until #52834 is merged |
💔 Build FailedHistory
To update your PR or re-run it, just comment with: |
Related to https://github.com/elastic/elastic-stack-testing/pull/451
In a few plugins, we are using the task manager as a way to asynchronously fetch telemetry data. In order for telemetry collection to know to wait for this asynchronous action, there is an
isReady
check on each usage collector that returns true/false. All of these plugins use a naive check to see if the task manager is running, but this is apparently not enough.It's a race condition, but there are time where
isReady()
is returningtrue
, but the task manager hasn't actually ran the task to fetch the telemetry data.I think we can avoid this by not returning
true
fromisReady()
until the task has actually been ran.As a quick side note, I discovered #52439 while working on this which affects maps telemetry