Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search Sessions] Implement cancel on search session monitoring task, fetch and process sessions page by page #96321

Merged
merged 13 commits into from
Apr 12, 2021

Conversation

Dosant
Copy link
Contributor

@Dosant Dosant commented Apr 6, 2021

Summary

Partially addressed #96131

  • Cancel sessions monitoring task if it takes too long
  • Also expose trackingTimeout config
  • Fetches and processes sessions page by page now. Previously we first fetched all the sessions but then processed them.

Checklist

  • Unit or functional tests were updated or added to match the most common scenarios
  • After PR is merged, expose new setting in cloud config

For maintainers

@Dosant Dosant closed this Apr 6, 2021
@Dosant Dosant deleted the dev/cancel-session-processing branch April 6, 2021 15:41
@Dosant Dosant reopened this Apr 6, 2021
@Dosant Dosant force-pushed the dev/cancel-session-processing branch from b6f8d1b to 73ad72f Compare April 7, 2021 14:04
@Dosant Dosant changed the title Dev/cancel session processing [Search Sessions] Cancel sessions monitoring task if takes too long Apr 7, 2021
* trackingTimeout controls for how long task manager waits for search session monitoring task to complete before considering it timed out,
* If tasks timeouts it receives cancel signal and next task starts in "trackingInterval" time
*/
trackingTimeout: schema.duration({ defaultValue: '5m' }),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'5m' is the current default in task manager. I don't change the default and expose a way to override it.

deps: CheckRunningSessionsDeps,
config: SearchSessionsConfig
): Promise<void> {
): Observable<void> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changing from Promise to Observable to be able to abort

@@ -48,12 +52,17 @@ function searchSessionRunner(
logger,
},
sessionConfig
);
)
.pipe(takeUntil(aborted$))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cancelation idea is the following:

We can use pageConfig to process sessions page by page. When abort is triggered we will process the last received page and won't fetch for the one that are left

@Dosant Dosant added bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana Team:AppServices v7.12.1 v7.13.0 v8.0.0 release_note:skip Skip the PR/issue when compiling release notes labels Apr 7, 2021
@Dosant Dosant marked this pull request as ready for review April 7, 2021 15:09
@Dosant Dosant requested a review from a team as a code owner April 7, 2021 15:09
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@Dosant
Copy link
Contributor Author

Dosant commented Apr 8, 2021

@elasticmachine merge upstream

@Dosant Dosant requested a review from a team April 8, 2021 08:14
Copy link
Member

@lukasolson lukasolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor nits below, but overall LGTM

@@ -29,7 +32,8 @@ interface SearchSessionTaskDeps {
function searchSessionRunner(
core: CoreSetup<DataEnhancedStartDependencies>,
{ logger, config }: SearchSessionTaskDeps
) {
): TaskRunCreatorFunction {
const aborted$ = new Subject<void>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have one global aborted$ for all instances of this task, doesn't this mean that if one instance hits the timeout, any others in-progress will also be aborted? What if we moved the initialization of aborted$ between lines 37/38?

Copy link
Contributor Author

@Dosant Dosant Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it inside the task factory. Now we for sure have a single aborted$ instance per task

I still would like @elastic/kibana-alerting-services to review how I use their API 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look global to me, actually.
A TaskRunner is creates whenever a task is picked up by Kibana, so each task will have its own abort observable, which will be GCed when that task completes.

Looking at this usage, this PR LGTM 👍

@Dosant
Copy link
Contributor Author

Dosant commented Apr 12, 2021

@elasticmachine merge upstream

@Dosant Dosant changed the title [Search Sessions] Cancel sessions monitoring task if takes too long [Search Sessions] Implement cancel on search session monitoring task, fetch and process sessions page by page Apr 12, 2021
Copy link
Contributor

@lizozom lizozom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and I see the behavior is correct, mostly that pages are fetched and updated correctly, but I didn't test any negative scenarios. I don't have time to dive deeper today. If this is urgent, could you ask @lukasolson to take another look?

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@Dosant Dosant merged commit c4b3dfd into elastic:master Apr 12, 2021
Dosant added a commit to Dosant/kibana that referenced this pull request Apr 12, 2021
Dosant added a commit to Dosant/kibana that referenced this pull request Apr 12, 2021
Dosant added a commit that referenced this pull request Apr 12, 2021
Dosant added a commit that referenced this pull request Apr 12, 2021
@@ -39,6 +43,8 @@ function searchSessionRunner(
logger.debug('Search sessions are disabled. Skipping task.');
return;
}
if (aborted$.getValue()) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK, but it's worth clarifying that if (for some reason) you return here before Task Manager actually calls cancel, then you'll be wiping out your State as TM will think you've just completed the task without returning any state.

It doesn't look like you're using task state, so I think this is fine, but I felt it was worth flagging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the flag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana release_note:skip Skip the PR/issue when compiling release notes v7.12.1 v7.13.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants