Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Alerting health status pusher by using task manager and status pooler for Kibana status plugins 'kibanahost/api/status' #79056

Merged

Conversation

YulNaumenko
Copy link
Contributor

@YulNaumenko YulNaumenko commented Oct 1, 2020

Current PR include the next features:

  1. Exposed AlertsClient method getHealth, which is using executionStatus alert property to verify if any of alerts in the system has an errors of specific reason.
  2. Added a task 'alerting_health_check' for checking framework decryption failures, which is executing for every hour and using AlertsClient method getHealth().hasDecryptionFailures.
  3. Extended alerts plugin setup with updating core status, by using API core.status.set. Ne core status get the latest health state from 'alerting_health_check' task execution result.
  4. Extended api/alerts/_health with the new property alertingFrameworkHeath.

img1
img2

Resolve #75042

…tatus pooler for Kibana status plugins 'kibanahost/api/status'
@YulNaumenko YulNaumenko added Feature:Alerting v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.10.0 labels Oct 1, 2020
@YulNaumenko YulNaumenko requested a review from a team as a code owner October 1, 2020 06:12
@YulNaumenko YulNaumenko self-assigned this Oct 1, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

…k-health-status

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
…k-health-status

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
…k-health-status

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
…k-health-status

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@gmmorris gmmorris self-requested a review October 8, 2020 10:17
Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments, which suggest some different shape changes on things - which may not be right, not sure :-). Perhaps would be worth a call to go over, I may be misunderstanding the design.

x-pack/plugins/alerts/server/alerts_client.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/alerts_client.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/get_state.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/common/index.ts Show resolved Hide resolved
x-pack/plugins/alerts/server/plugin.ts Show resolved Hide resolved
@pmuellr pmuellr added v7.11.0 and removed v7.10.0 labels Oct 8, 2020
…k-health-status

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@@ -464,7 +465,7 @@ describe('utils', () => {
status: 'error',
lastExecutionDate: foundRule.executionStatus.lastExecutionDate,
error: {
reason: 'read',
reason: AlertExecutionStatusErrorReasons.Read,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…k-health-status

# Conflicts:
#	x-pack/plugins/triggers_actions_ui/public/application/sections/alerts_list/components/alerts_list.test.tsx
Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few comments/questions

) {
try {
const interval = (await config).healthCheck.interval;
await taskManager.ensureScheduled({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was anxious to see this running, so hacked an alert executor to throw an error, and then the config to run every minute :-). Makes me wonder if we want to schedule an additional one-time call at startup (maybe 1 minute after startup), so we can get the latest data shortly after startup, without having to wait for the hourly task to run. (could be another issue/PR)

Copy link
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting much closer :)

There are a few things that still bug me (some duplication of code and nit. stuff) but the thing that is still keeping me from approving is that I don't understand why getHealth is on the AlertsClient instead of the Plugin Contract.

It doesn't use any user specific privileges, requires a fake request to be used in the task, and feels like a Plugin level API, so I think it should be moved there...

x-pack/plugins/alerts/server/health/task.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/get_state.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/task.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/get_state.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/get_state.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/get_state.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/task.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/health/get_state.ts Outdated Show resolved Hide resolved
x-pack/plugins/alerts/server/plugin.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I think some dead code has slipped through in plugin.ts, but other than that this is looking great! Well done :)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

distributable file count

id before after diff
default 42731 42736 +5

page load bundle size

id before after diff
alerts 87.8KB 88.4KB +665.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@YulNaumenko YulNaumenko merged commit 802c6dc into elastic:master Nov 7, 2020
YulNaumenko added a commit to YulNaumenko/kibana that referenced this pull request Nov 7, 2020
…tatus pooler for Kibana status plugins 'kibanahost/api/status' (elastic#79056)

* Implemented Alerting health status pusher by using task manager and status pooler for Kibana status plugins 'kibanahost/api/status'

* Exposed health task registration to alerts plugin

* Fixed type error

* Extended health API endpoint with info about decryption failures, added correct health task implementation

* adjusted query

* Tested locally and got it working as expected, fixed tests and type check

* Added unit tests

* Changed AlertExecutionStatusErrorReasons to be enum

* Uppercase the enum

* Replaced string values to enum

* Fixed types

* Extended AlertsClient with getHealth method

* added return type to healthStatus$

* Added configurable health check interval and timestamps

* Extended update core status interval to 5mins

* Fixed failing tests

* Registered alerts config

* Fixed date for ok health state

* fixed jest test

* fixed task state

* Fixed due to comments, moved getHealth to a plugin level

* fixed type checks

* Added sorting to the latest Ok state last update

* adjusted error queries

* Fixed jest tests

* removed unused

* fixed type check
YulNaumenko added a commit that referenced this pull request Nov 7, 2020
…tatus pooler for Kibana status plugins 'kibanahost/api/status' (#79056) (#82907)

* Implemented Alerting health status pusher by using task manager and status pooler for Kibana status plugins 'kibanahost/api/status'

* Exposed health task registration to alerts plugin

* Fixed type error

* Extended health API endpoint with info about decryption failures, added correct health task implementation

* adjusted query

* Tested locally and got it working as expected, fixed tests and type check

* Added unit tests

* Changed AlertExecutionStatusErrorReasons to be enum

* Uppercase the enum

* Replaced string values to enum

* Fixed types

* Extended AlertsClient with getHealth method

* added return type to healthStatus$

* Added configurable health check interval and timestamps

* Extended update core status interval to 5mins

* Fixed failing tests

* Registered alerts config

* Fixed date for ok health state

* fixed jest test

* fixed task state

* Fixed due to comments, moved getHealth to a plugin level

* fixed type checks

* Added sorting to the latest Ok state last update

* adjusted error queries

* Fixed jest tests

* removed unused

* fixed type check
gmmorris added a commit to gmmorris/kibana that referenced this pull request Nov 9, 2020
* master: (68 commits)
  [Fleet] Make stream id unique in agent policy (elastic#82447)
  skip flaky suite (elastic#82915)
  skip flaky suite (elastic#75794)
  Copy `dateAsStringRt` to observability plugin (elastic#82839)
  [Maps] rename connected_components/map folder to mb_map (elastic#82897)
  [Security Solution] Fix EventsViewer DnD cypress tests (elastic#82619)
  [Security Solution] Adds logging and performance fan out API for threat/Indicator matching (elastic#82546)
  Implemented Alerting health status pusher by using task manager and status pooler for Kibana status plugins 'kibanahost/api/status' (elastic#79056)
  [APM] Adds new configuration 'xpack.apm.maxServiceEnvironments' (elastic#82090)
  Move single use function in line (elastic#82885)
  [ML] Add unsigned_long support to data frame analytics and anomaly detection (elastic#82636)
  Add flot_chart dependency from shared_deps to Shareable Runtime (elastic#81649)
  [Security Solution][Detections] - Auto refresh all rules/monitoring tables (elastic#82062)
  [APM] Fix apm e2e runner script commands (elastic#82798)
  [Ingest Manager] Move cache functions to from registry to archive (elastic#82871)
  Update webpack-dev-server and webpack-cli (elastic#82844)
  [Uptime] Migrate to new es client (elastic#82003)
  Move parseAndVerify* functions to validation.ts (elastic#82845)
  Remove yeoman & yo (elastic#82825)
  [Canvas] Fix elements not being updated properly when filter is changed on workpad (elastic#81863)
  ...
phillipb added a commit to phillipb/kibana that referenced this pull request Nov 10, 2020
…e-details-overlay

* 'master' of github.com:elastic/kibana: (201 commits)
  Added `defaultActionMessage` to index threshold alert UI type definition (elastic#80936)
  [ILM] Migrate Delete phase and name field to Form Lib (elastic#82834)
  skip flaky suite (elastic#57426)
  [Alerting] adds an Run When field in the alert flyout to assign the action to an Action Group (elastic#82472)
  [APM] Expose APM event client as part of plugin contract (elastic#82724)
  [Logs UI] Fix errors during navigation (elastic#78319)
  Enable send to background in TSVB (elastic#82835)
  SavedObjects search_dsl: add match_phrase_prefix clauses when using prefix search (elastic#82693)
  [Ingest Manager] Unify install* under installPackage (elastic#82916)
  [Fleet] Make stream id unique in agent policy (elastic#82447)
  skip flaky suite (elastic#82915)
  skip flaky suite (elastic#75794)
  Copy `dateAsStringRt` to observability plugin (elastic#82839)
  [Maps] rename connected_components/map folder to mb_map (elastic#82897)
  [Security Solution] Fix EventsViewer DnD cypress tests (elastic#82619)
  [Security Solution] Adds logging and performance fan out API for threat/Indicator matching (elastic#82546)
  Implemented Alerting health status pusher by using task manager and status pooler for Kibana status plugins 'kibanahost/api/status' (elastic#79056)
  [APM] Adds new configuration 'xpack.apm.maxServiceEnvironments' (elastic#82090)
  Move single use function in line (elastic#82885)
  [ML] Add unsigned_long support to data frame analytics and anomaly detection (elastic#82636)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting needs_docs release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Users of alerting are not notified when the framework is failing
7 participants