Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Add a tie breaker field to alerts #62002

Closed
FrankHassanabad opened this issue Mar 31, 2020 · 22 comments
Closed

[Alerting] Add a tie breaker field to alerts #62002

FrankHassanabad opened this issue Mar 31, 2020 · 22 comments
Labels
discuss Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting NeededFor:Detections and Resp Project:ImproveAlertingManagementUX Alerting team project for improving the management experience of alerting. Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@FrankHassanabad
Copy link
Contributor

Describe the feature:

Provide a tie_breaker_id by copying the _id field so we can have stable sort/export in batches.

Describe a specific use case for the feature:

For Saved objects and for SIEM rules we ran into issues with sorting and paging where we ended up with duplicates and in some cases missing data. We solved this by loading all records into memory at once and then exporting. For our UI tables we allow the user to view 300 records at a time to minimize the chances of duplicates or misses when doing paging but it is possible when they have above 300.

We have replicated this issue by paging through rules 10 at a time and looking at each rule and seeing we have found duplicates. We have replicated this issue also using CURL and eliminating any chances it was a UI issue.

There are docs such as this one which highlight the problem and a solution (In the important section)
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-search-after

If the alerting framework provide a copy of _id as a tiebreaker field called

tie_breaker_id

as suggested there for us to sort on, export on, etc... that would be a small mapping change utilizing copy_to:
https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html

@FrankHassanabad FrankHassanabad added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Mar 31, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@FrankHassanabad FrankHassanabad changed the title [Alerting] Add a tie breaker to alerts [Alerting] Add a tie breaker field to alerts Mar 31, 2020
@mikecote
Copy link
Contributor

@elastic/kibana-platform Is there any plans to have an alternative to sorting by _id now that Elasticsearch doesn't support this? some sort of tie breaker field?

@FrankHassanabad
Copy link
Contributor Author

FrankHassanabad commented Apr 15, 2020

FWIW, I created a tie_breaker_id keyword field for large list support API endpoints I am working on @mikecote, and then during document inserts I use a uuid for it. It's not going to be the fastest and I have an extra field consuming disk space compared to just being able to copy over the _id but that's what I'm doing now in my code.

@pgayvallet
Copy link
Contributor

Is there any plans to have an alternative to sorting by _id now that Elasticsearch doesn't support this? some sort of tie breaker field?

Not that I'm aware of. @rudolf?

@rudolf
Copy link
Contributor

rudolf commented Apr 17, 2020

I wasn't aware of this duplicating behaviour, but I think we should treat it as a bug in Core. @FrankHassanabad suggestion for using copy_to sounds like a good suggestion.

@FrankHassanabad What is the impact on users, could showing a duplicate mislead a user to take the wrong actions or is this mostly an annoyance? What's the priority for getting this fixed? It doesn't sound like a lot of work, but our roadmap is quite full.

Theoretically, generating a saved object ndjson export with duplicates could lead to failure on import if overwrite=false which then leads to an id collision. I will have to go through the source code, but I think we use a Set which would remove duplicates and mitigate this problem. We don't use search_after directly for saved object queries, so this should only impact custom ES queries against the saved object index that uses search_after.

@rudolf rudolf added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Apr 17, 2020
@rudolf
Copy link
Contributor

rudolf commented Apr 17, 2020

I read to quickly and didn't see that data could be either missing or duplicated, we can probably sometimes live with duplicates but the missing data wouldn't be acceptable.

@rudolf
Copy link
Contributor

rudolf commented Apr 17, 2020

copy_to isn't possible on the _id field, so we will have to either use an ingest pipeline or do it inside the saved objects repository. I'm not sure if we can rely on ingest processing always being enabled on ES so doing it client-side might be safer, especially for a minor release.

There's no immediate requirement to implement this in Core/Saved Objects, we could only implement this in task manager for now since Saved Objects only uses from + to for paging. I'm not aware of any saved object types that currently expect to be able to page over more than 10k results, but there are plans to move larger indexes to saved objects in which case we might have to switch to search_after for paging find results.

@FrankHassanabad
Copy link
Contributor Author

@rudolf, I generate a UUID client side at the moment for the a tie breaker field before pushing the record into the index within Kibana/NodeJS

. I'm not aware of any saved object types that currently expect to be able to page over more than 10k results

For large lists since they can be above 10k we had to switch from SO to another data index so I could use search after for our SIEM large list support. Also for a few other Elastic Search abilities such as delete by query, etc...

If SO is adding support for above 10k and a few other features and everyone feels comfortable we can add large volumes of data into it I would be 👍 for moving back to it. I started with it, but quickly ran into issues with the 10k support and missing things such as delete by query where I had to move to a data index.

@rudolf rudolf removed the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label May 6, 2020
@FrankHassanabad
Copy link
Contributor Author

"Reviewed by Frank Hassanabad on 7/29/2020, still valid as of this date"

@yctercero
Copy link
Contributor

Just wanted to double back on this ticket. We're adding auto refresh to our all rules table and the inconsistencies in sorting without a tiebreaker_id are now very evident. Would be much appreciated if we could bump this up :)

@pmuellr
Copy link
Member

pmuellr commented Nov 6, 2020

Is there any chance you could use an existing alerting SO field like createdAt as the tie breaker? Of course, that's not guaranteed to be unique, but is maybe unique enough? Or maybe that's already your primary search term. Is there another unique-ish field that is stable for your uses, like name, that you could use as an additional sort to get the uniqueness (ie, is the name something a user can change ad-hoc, or are you creating these programmatically and they don't change over time)? How stable do these sorts need to be over time? Just during a user session, paging through UI's? Seems like something like sorting by [createdAt, name] would probably be good enough in that case.

It sounds like the only other option is for us to add a new field to the alert SO, which would presumably be a UUID, as the tie breaker.

I'm worried about the cardinality explosion here, as the internal index on that new field will literally be as large as the # of alerts, so reusing an existing field(s) would help out there (I think!). We could potentially make this field optional, and probably not available for editing in the UI, so that alertsClient users could specify a value if they want, but the default would be that we don't use the field.

Also wondering if this will bite us in the end, for things like the alerts list UI. No matter how we end up sorting the alerts in that list (eg, status, name, etc), seems like we may also be in the same situation - with an unstable sort, so missing/dup entries while paging - but perhaps that's a bit more survivable for customers than Frank's use case.

@pmuellr
Copy link
Member

pmuellr commented Nov 17, 2020

@FrankHassanabad , any thoughts on my comment ^^^

@FrankHassanabad
Copy link
Contributor Author

Thanks, I did a sampling of data from the prepackaged rules install and the createdAt field could help keep us unique or mostly unique to lessen the problems and impacts:

2020-11-05T19:46:41.552Z
2020-11-05T19:46:41.406Z
2020-11-05T19:46:41.753Z
2020-11-05T19:46:41.716Z
2020-11-05T19:46:42.533Z

It's not guaranteed as you can see some of those are very close to being non-unique and we might end up with some that are maybe not unique but I think at the very least for a workaround we can use them to help keep things more stable or as close to stable as we can.

@yctercero
Copy link
Contributor

@FrankHassanabad thanks for checking that!

I think we do still need a way to pass multiple fields to sort by though right? So for our rules example, when the user selects to sort by enabled, we pass that through as the sort_field but can't specify any secondary field.

@FrankHassanabad
Copy link
Contributor Author

Yeah, we need to allow comma separated values to sort by within the find_rules_route and then we can send multiples down and sort in the order given to us.

@rudolf
Copy link
Contributor

rudolf commented Nov 20, 2020

It sounds like the only other option is for us to add a new field to the alert SO, which would presumably be a UUID, as the tie breaker.

I'm worried about the cardinality explosion here, as the internal index on that new field will literally be as large as the # of alerts, so reusing an existing field(s) would help out there (I think!).

Adding another field is the only way to guarantee no duplicates (ES team might add a virtual tie breaker field at some point in the future elastic/elasticsearch#56828).

Although every field adds overhead, I don't think the "cost" of this is prohibitively high, instead of N it will now be 2N.

We might have to add a tie breaker field to all saved objects for #77961

@pmuellr
Copy link
Member

pmuellr commented Nov 23, 2020

Ah, adding a tie breaker to SO's themselves seems like the way to do this, as this is a generic SO issue IIRC, not specific to alerts. I'd certainly hate to add one to alerts in vX only to have it added to SO's in v>X, since then we'd have two tie breakers (and more wasted index space).

@lukeelmers
Copy link
Member

With the changes in elastic/elasticsearch#68833, a tiebreaker no longer needs to be provided to Elasticsearch (as of 7.12) as long as you are searching using a point-in-time... ES now automatically applies an internally-created tiebreaker for you.

In #89915, we introduced the ability to use SavedObjects.find with the point-in-time API and search_after, so you can now query over large numbers of SOs with consistent results. When #91175 lands, we'll have a helper exposed on the SO client to make this process easier.

I think this means this issue can be closed, but I'll let @FrankHassanabad confirm.

@pmuellr
Copy link
Member

pmuellr commented Feb 23, 2021

Not entirely clear to me if the tiebreaker needs to be consistent over time, or just over the "initial load" of the list being generated. If it needs to be consistent over time, then the per-PIT tiebreaker won't help.

We also recently made the alert params a flattened mapping type, instead of object enabled:false. Which looks like can be used for sorting. So it seems like alert types could use a param as a tie-breaker; perhaps it would be some "hidden" param, not shown in the UI.

For a case like this, you'd like the process of alert creation to call into the alert type to explicitly set such a field to some random value. But we don't really have a way of doing that now. We've talked about "hooks", where an alert type could be involved in processes like this, which seems like would be a good way to handle it. Eg, there would be a "pre-save" hook that an alert type could implement which would be passed the alert data just before it's created/updated, with a chance to modify it.

For now though, setting such a field in the alert type UI or via the alerts client is probably good enough.

@gmmorris gmmorris added NeededFor:Detections and Resp Project:ImproveAlertingManagementUX Alerting team project for improving the management experience of alerting. Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework labels Jun 30, 2021
@ymao1
Copy link
Contributor

ymao1 commented Jul 29, 2021

@FrankHassanabad @yctercero Is a tie breaker field still needed given the changes to the core's SavedObjects.find api detailed in #62002 (comment)?

@yctercero
Copy link
Contributor

@FrankHassanabad @yctercero Is a tie breaker field still needed given the changes to the core's SavedObjects.find api detailed in #62002 (comment)?

I don't believe so. We're working on moving over to PIT, though don't have details on timeline.

@ymao1
Copy link
Contributor

ymao1 commented Aug 3, 2021

Closing as don't need.

@ymao1 ymao1 closed this as completed Aug 3, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting NeededFor:Detections and Resp Project:ImproveAlertingManagementUX Alerting team project for improving the management experience of alerting. Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests