Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Improve isTaskManagerReady checks for telemetry #52446

Closed

Conversation

chrisronline
Copy link
Contributor

Related to https://github.com/elastic/elastic-stack-testing/pull/451

In a few plugins, we are using the task manager as a way to asynchronously fetch telemetry data. In order for telemetry collection to know to wait for this asynchronous action, there is an isReady check on each usage collector that returns true/false. All of these plugins use a naive check to see if the task manager is running, but this is apparently not enough.

It's a race condition, but there are time where isReady() is returning true, but the task manager hasn't actually ran the task to fetch the telemetry data.

I think we can avoid this by not returning true from isReady() until the task has actually been ran.

As a quick side note, I discovered #52439 while working on this which affects maps telemetry

@@ -74,7 +74,9 @@ function addEvents(prevEvents: Record<string, number>, newEvents: Record<string,
}

async function isTaskManagerReady(server: Server) {
return (await getLatestTaskState(server)) !== null;
const result = await getLatestTaskState(server);
const runs = get(result, '[0].state.runs', 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have this same line in a separate file, const runs = get(result, '[0].state.runs', 0);, and it seems specific enough to warrant a helper function with a comment. For example, I'm not entirely sure what it's doing, why, or whether or not it's correct, so a function + comment would be helpful.

isTaskManagerReady and runs being greater than 0 seem like orthogonal things to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, and honestly, this is an open discussion about a right fix for this.

isTaskManagerReady could be renamed to something like isReadyToCollectTelemetry and that can return what we really care about - if we can now query the task manager and it will return us the telemetry data. We don't want to query the task manager too soon (before the task actually ran), but this idea of too soon is something I just recently observed in testing. Perhaps this is not the root issue?

@chrisronline
Copy link
Contributor Author

chrisronline commented Dec 9, 2019

Tests will continue to fail until #52834 is merged

@kibanamachine
Copy link
Contributor

💔 Build Failed

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@elastic elastic deleted a comment from elasticmachine Oct 28, 2020
@chrisronline chrisronline deleted the telemetry_task_manager_fix branch September 21, 2021 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants