Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Usage collectors not using isReady correctly #81944

Closed
4 tasks done
chrisronline opened this issue Oct 28, 2020 · 23 comments · Fixed by #89109
Closed
4 tasks done

[Telemetry] Usage collectors not using isReady correctly #81944

chrisronline opened this issue Oct 28, 2020 · 23 comments · Fixed by #89109
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.11.0

Comments

@chrisronline
Copy link
Contributor

chrisronline commented Oct 28, 2020

The following collectors need to change their isReady to properly wait for the task manager state to exist:

They are intentionally catching errors where the task manager hasn't ran their task yet, but that should be used to indicate if the usage collector is ready or not.

cc @TinaHeiligers

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-telemetry (Team:KibanaTelemetry)

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Oct 28, 2020

Thanks @chrisronline!
Related to #52446

@TinaHeiligers TinaHeiligers self-assigned this Oct 28, 2020
@TinaHeiligers TinaHeiligers added the bug Fixes for quality problems that affect the customer experience label Oct 28, 2020
@timroes timroes added v7.11.0 and removed v7.11 labels Oct 29, 2020
@TinaHeiligers
Copy link
Contributor

@chrisronline the actions, alerts and lens collectors explicitly comment that they're fine with not returning any usage data the first time that their collectors' fetch methods are called. I could still follow up with them with draft "fixes" to double check that they're ok with skipping data.

@chrisronline
Copy link
Contributor Author

@chrisronline the actions, alerts and lens collectors explicitly comment that they're fine with not returning any usage data the first time that their collectors' fetch methods are called. I could still follow up with them with draft "fixes" to double check that they're ok with skipping data.

Sure, but we don't have to settle for that and I don't know why we would. We have a way to ensure the telemetry data can always be consistent. My guess is that the authors of those collectors aren't aware that a "miss" might mean there is no telemetry data for that user the entire day.

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Nov 3, 2020

My guess is that the authors of those collectors aren't aware that a "miss" might mean there is no telemetry data for that user the entire day.

@chrisonline Gotcha! I'll update the README and change the example code to make it clearer.

@chrisronline
Copy link
Contributor Author

FWIW, we've had this ability for some time now and there is some history in the original PR: #36153

@TinaHeiligers
Copy link
Contributor

@chrisronline The updated README gives more info on using isReady. Is that sufficient to close this issue or would you prefer to wait on the teams involved?

@chrisronline
Copy link
Contributor Author

@TinaHeiligers Yea that's great, thanks for adding that!

cc @mikecote for any thoughts on if the alerting team should update the alerts/actions collectors

@TinaHeiligers
Copy link
Contributor

cc @wylieconlon Lens
cc @dgieselaar APM

You might want to reconsider the implementation of isReady in these collectors.

@dgieselaar
Copy link
Member

@TinaHeiligers this is intentional - our telemetry collection can take a long time (minutes). The advice from the telemetry team (IIRC) was to not block telemetry collection on startup. Is the recommendation now different?

@chrisronline
Copy link
Contributor Author

The telemetry collectors will wait up to 1 minute before attempting to send usage data without that collector; however, AFAIK, that time period is somewhat arbitrary and possibly changeable.

@dgieselaar Just curious, can you provide more information on why it takes so long?

@mikecote
Copy link
Contributor

mikecote commented Nov 9, 2020

I created this issue #82947 for the alerting team to explore / fix this.

@dgieselaar
Copy link
Member

@chrisronline heavy queries (aggregations on billions of documents), and we run them sequentially to be on the safe side as to not overloading the cluster.

@chrisronline
Copy link
Contributor Author

Thanks @dgieselaar.

Once this data comes back, is it persisted in memory so that the next telemetry collection cycle picks it up? If so, does it persist in memory until Kibana is restarted? Is it ever re-fetched?

@afharo
Copy link
Member

afharo commented Nov 18, 2020

@chrisronline AFAIK, they use taskManager to periodically run those queries and they store the results in a savedObject. Then they retrieve the results from the SOs in the fetch method. Maybe @dgieselaar can confirm. If that's the behaviour, I think it is safe to restarts 🙂

@dgieselaar
Copy link
Member

That's correct @afharo, thank you - and apologies to @chrisronline for missing this earlier :). We run our queries every 12hrs.

@chrisronline
Copy link
Contributor Author

I see, okay. It seems we need to make an exception for APM. I'll update the issue

@TinaHeiligers
Copy link
Contributor

@wylieconlon Have you had a chance to review the Lens usage collector's implementation of isReady yet?
cc @flash1293

@afharo afharo added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc and removed Team:KibanaTelemetry labels Dec 10, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@flash1293
Copy link
Contributor

@TinaHeiligers I think this fell through the cracks, is there a deadline for this?

@TinaHeiligers
Copy link
Contributor

@flash1293 There isn't a strict deadline from the Core team as such, since it's up to plugin owners to decide how important their data is for them. As with most things telemetry-related, the sooner we ensure we're capturing consistent data, the better 😄

@flash1293
Copy link
Contributor

I checked the code and it seems like we have the same structure as other consumers listed (although implemented a little different)

@gmmorris Seems like you implemented this, could you confirm it doesn't matter whether isReady is async itself (like here https://github.com/elastic/kibana/blob/master/x-pack/plugins/actions/server/usage/actions_usage_collector.ts#L33-L36 ) or returning false synchronously until it's ready (like here https://github.com/elastic/kibana/blob/master/x-pack/plugins/lens/server/usage/collectors.ts#L19-L23 )?

@YulNaumenko
Copy link
Contributor

I checked the code and it seems like we have the same structure as other consumers listed (although implemented a little different)

@gmmorris Seems like you implemented this, could you confirm it doesn't matter whether isReady is async itself (like here https://github.com/elastic/kibana/blob/master/x-pack/plugins/actions/server/usage/actions_usage_collector.ts#L33-L36 ) or returning false synchronously until it's ready (like here https://github.com/elastic/kibana/blob/master/x-pack/plugins/lens/server/usage/collectors.ts#L19-L23 )?

@flash1293 you are right it doesn't matter as the approach, because you are doing the same check if the Task Manager is ready or not and isReady method support both boolean and Promise<boolean>. But in your case for keeping the code cleaner and straightforward the better usage of isReady will be to have it async, because you have the your check as async.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.11.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants