Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Actions] Add action group for notifying users of alert execution failure #83748

Closed
spong opened this issue Nov 19, 2020 · 7 comments
Closed
Labels
discuss enhancement New value added to drive a business result estimate:medium Medium Estimated Level of Effort Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting Feature:Rule Actions Security Solution Rule Actions feature NeededFor:Detections and Resp NeededFor:ML Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@spong
Copy link
Member

spong commented Nov 19, 2020

This is an enhancement request coming from user feedback for the option to be notified on alert/rule failure so they can immediately investigate what went wrong and resolve the issue.

Granularity of configuration wasn't specified, however it would make sense to support both a single onError action configured for all rules, or additional onError actions applied to individual alerts/rules (e.g. a rule writer may only want to notify themself via slack when their specific rule fails).

cc @pmuellr @marrasherrier

@spong spong added enhancement New value added to drive a business result Feature:Alerting Feature:Actions Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Detections and Resp Security Detection Response Team Feature:Rule Actions Security Solution Rule Actions feature labels Nov 19, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member

pmuellr commented Nov 19, 2020

My immediate thought on this was that it probably makes sense to have an 'error' action group "built-in" like we now have "resolved". But of course, that would be painful to have to add that to every rule/alert. A "global" one would make more sense. But "global" is a problem, as in RBAC issues. Even a "space-specific" one seems like would have RBAC issues.

We've certainly mentioned things like this as part of "meta-alerting" - #49410 - and maybe this should also involve the new health bits - #79056

Which then maybe makes this feel like maybe a new alert, but maybe you'd need admin access to use. Perhaps the alert could just be to notify that there are problems, without indicating what they are specifically - just a link to a page that would provide relevant info given their roles.

@spong
Copy link
Member Author

spong commented Nov 19, 2020

Which then maybe makes this feel like maybe a new alert, but maybe you'd need admin access to use. Perhaps the alert could just be to notify that there are problems, without indicating what they are specifically - just a link to a page that would provide relevant info given their roles.

That seems reasonable to me. I imagine most use cases will either be an individual user wanting to keep tabs on rules/alerts they created, or a small set of admin/manager users needing to know of any failure, in which case a link to a page with details should be sufficient for their needs.

@pmuellr
Copy link
Member

pmuellr commented Nov 23, 2020

Those use cases ^^^ sound right to me.

@darnautov
Copy link
Contributor

This definitely would be useful for the Anomaly detection alert type. Many things might happen after the alert has been created, e.g. datafeed has been stopped or anomaly detection job has been deleted. Throwing an error and showing it in the Alert and Action UI doesn't suffice because depending on the significance of this alert, the user should take actions to resolve it, hence receiving a notification is critical.
We can create a special action group for this case (see #93009) but seems like having a notification about the Error state makes more sense and better UX in general.

@pmuellr
Copy link
Member

pmuellr commented Mar 2, 2021

@darnautov Another thing you might want to look into (if you haven't already), is alert navigation. This provides the ability for an alert type to provide link backs to another Kibana application, from the centralized alerting UIs.

Some doc here: https://github.com/elastic/kibana/blob/master/x-pack/plugins/alerts/README.md#alert-navigation

If you run Kibana with --run-examples, there's an example alert - people in space - which uses the alert navigation; source for that here: https://github.com/elastic/kibana/tree/master/x-pack/examples/alerting_example

I'm thinking this would be useful in the error cases, when a user ends up in the alerts UI, at least they will have a link into your app, without having to separately find and bring up that UI.

@gmmorris gmmorris added Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework NeededFor:Detections and Resp NeededFor:ML and removed Feature:Actions labels Jul 1, 2021
@gmmorris gmmorris added the loe:large Large Level of Effort label Jul 14, 2021
@gmmorris gmmorris added the estimate:medium Medium Estimated Level of Effort label Aug 18, 2021
@gmmorris gmmorris removed the loe:large Large Level of Effort label Sep 2, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
@mikecote
Copy link
Contributor

Closing in favour of #49410.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New value added to drive a business result estimate:medium Medium Estimated Level of Effort Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting Feature:Rule Actions Security Solution Rule Actions feature NeededFor:Detections and Resp NeededFor:ML Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

7 participants