Skip to content

Commit

Permalink
[Alerting] Initial implementation of alerting task cancel() (elasti…
Browse files Browse the repository at this point in the history
…c#114289)

* Added cancel() to alerting task runner and writing event log document

* Updating rule saved object with timeout execution status

* Skip scheduling actions and logging event log for alerts if rule execution is cancelled

* Adding config for disabling skipping actions

* Fixing types

* Adding flag for rule types to opt out of skipping acitons

* Using task runner uuid to differentiate between task instances

* Adding functional test

* Default to timestamp when startedAt is not available

* Reverting previous change and updating task pool filter instead

* Fixing functional test

* Adding debug logging

* Fixing unit tests

* Fixing unit tests

* Adding rule name to event log doc and rule type timeout to log messages

* Simplifying register logic and adding check to see if already cancelled

* Updating task uuid based on PR comments

* Removing observable

* Fixing functional test

* Adding to docs

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
  • Loading branch information
2 people authored and mbondyra committed Nov 16, 2021
1 parent 00d2f53 commit 53cdde4
Show file tree
Hide file tree
Showing 26 changed files with 1,278 additions and 67 deletions.
3 changes: 3 additions & 0 deletions docs/settings/alert-action-settings.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -191,3 +191,6 @@ Specifies the default timeout for the all rule types tasks. The time is formatte
`<count>[ms,s,m,h,d,w,M,Y]`
+
For example, `20m`, `24h`, `7d`, `1w`. Default: `60s`.

`xpack.alerting.cancelAlertsOnRuleTimeout`::
Specifies whether to skip writing alerts and scheduling actions if rule execution is cancelled due to timeout. Default: `true`. This setting can be overridden by individual rule types.
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ kibana_vars=(
xpack.alerting.invalidateApiKeysTask.interval
xpack.alerting.invalidateApiKeysTask.removalDelay
xpack.alerting.defaultRuleTaskTimeout
xpack.alerting.cancelAlertsOnRuleTimeout
xpack.alerts.healthCheck.interval
xpack.alerts.invalidateApiKeysTask.interval
xpack.alerts.invalidateApiKeysTask.removalDelay
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ The following table describes the properties of the `options` object.
|producer|The id of the application producing this rule type.|string|
|minimumLicenseRequired|The value of a minimum license. Most of the rules are licensed as "basic".|string|
|ruleTaskTimeout|The length of time a rule can run before being cancelled due to timeout. By default, this value is "5m".|string|
|cancelAlertsOnRuleTimeout|Whether to skip writing alerts and scheduling actions if a rule execution is cancelled due to timeout. By default, this value is set to "true".|boolean|
|useSavedObjectReferences.extractReferences|(Optional) When developing a rule type, you can choose to implement hooks for extracting saved object references from rule parameters. This hook will be invoked when a rule is created or updated. Implementing this hook is optional, but if an extract hook is implemented, an inject hook must also be implemented.|Function
|useSavedObjectReferences.injectReferences|(Optional) When developing a rule type, you can choose to implement hooks for injecting saved object references into rule parameters. This hook will be invoked when a rule is retrieved (get or find). Implementing this hook is optional, but if an inject hook is implemented, an extract hook must also be implemented.|Function
|isExportable|Whether the rule type is exportable from the Saved Objects Management UI.|boolean|
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/common/alert.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ export enum AlertExecutionStatusErrorReasons {
Execute = 'execute',
Unknown = 'unknown',
License = 'license',
Timeout = 'timeout',
}

export interface AlertExecutionStatus {
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/server/config.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ describe('config validation', () => {
const config: Record<string, unknown> = {};
expect(configSchema.validate(config)).toMatchInlineSnapshot(`
Object {
"cancelAlertsOnRuleTimeout": true,
"defaultRuleTaskTimeout": "5m",
"healthCheck": Object {
"interval": "60m",
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/server/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ export const configSchema = schema.object({
defaultValue: DEFAULT_MAX_EPHEMERAL_ACTIONS_PER_ALERT,
}),
defaultRuleTaskTimeout: schema.string({ validate: validateDurationSchema, defaultValue: '5m' }),
cancelAlertsOnRuleTimeout: schema.boolean({ defaultValue: true }),
});

export type AlertsConfig = TypeOf<typeof configSchema>;
36 changes: 35 additions & 1 deletion x-pack/plugins/alerting/server/plugin.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -73,6 +74,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -145,14 +147,43 @@ describe('Alerting Plugin', () => {
});
});

it('should apply default config value for ruleTaskTimeout', async () => {
it('should apply default config value for ruleTaskTimeout if no value is specified', async () => {
const ruleType = {
...sampleAlertType,
minimumLicenseRequired: 'basic',
} as AlertType<never, never, never, never, never, 'default', never>;
await setup.registerType(ruleType);
expect(ruleType.ruleTaskTimeout).toBe('5m');
});

it('should apply value for ruleTaskTimeout if specified', async () => {
const ruleType = {
...sampleAlertType,
minimumLicenseRequired: 'basic',
ruleTaskTimeout: '20h',
} as AlertType<never, never, never, never, never, 'default', never>;
await setup.registerType(ruleType);
expect(ruleType.ruleTaskTimeout).toBe('20h');
});

it('should apply default config value for cancelAlertsOnRuleTimeout if no value is specified', async () => {
const ruleType = {
...sampleAlertType,
minimumLicenseRequired: 'basic',
} as AlertType<never, never, never, never, never, 'default', never>;
await setup.registerType(ruleType);
expect(ruleType.cancelAlertsOnRuleTimeout).toBe(true);
});

it('should apply value for cancelAlertsOnRuleTimeout if specified', async () => {
const ruleType = {
...sampleAlertType,
minimumLicenseRequired: 'basic',
cancelAlertsOnRuleTimeout: false,
} as AlertType<never, never, never, never, never, 'default', never>;
await setup.registerType(ruleType);
expect(ruleType.cancelAlertsOnRuleTimeout).toBe(false);
});
});
});

Expand All @@ -169,6 +200,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
const plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -210,6 +242,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
const plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -265,6 +298,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
const plugin = new AlertingPlugin(context);

Expand Down
47 changes: 25 additions & 22 deletions x-pack/plugins/alerting/server/plugin.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ export const EVENT_LOG_ACTIONS = {
newInstance: 'new-instance',
recoveredInstance: 'recovered-instance',
activeInstance: 'active-instance',
executeTimeout: 'execute-timeout',
};
export const LEGACY_EVENT_LOG_ACTIONS = {
resolvedInstance: 'resolved-instance',
Expand Down Expand Up @@ -285,14 +286,13 @@ export class AlertingPlugin {
if (!(alertType.minimumLicenseRequired in LICENSE_TYPE)) {
throw new Error(`"${alertType.minimumLicenseRequired}" is not a valid license type`);
}
if (!alertType.ruleTaskTimeout) {
alertingConfig.then((config) => {
alertType.ruleTaskTimeout = config.defaultRuleTaskTimeout;
ruleTypeRegistry.register(alertType);
});
} else {

alertingConfig.then((config) => {
alertType.ruleTaskTimeout = alertType.ruleTaskTimeout ?? config.defaultRuleTaskTimeout;
alertType.cancelAlertsOnRuleTimeout =
alertType.cancelAlertsOnRuleTimeout ?? config.cancelAlertsOnRuleTimeout;
ruleTypeRegistry.register(alertType);
}
});
},
getSecurityHealth: async () => {
return await getSecurityHealth(
Expand Down Expand Up @@ -375,21 +375,24 @@ export class AlertingPlugin {
return alertingAuthorizationClientFactory!.create(request);
};

taskRunnerFactory.initialize({
logger,
getServices: this.getServicesFactory(core.savedObjects, core.elasticsearch),
getRulesClientWithRequest,
spaceIdToNamespace,
actionsPlugin: plugins.actions,
encryptedSavedObjectsClient,
basePathService: core.http.basePath,
eventLogger: this.eventLogger!,
internalSavedObjectsRepository: core.savedObjects.createInternalRepository(['alert']),
executionContext: core.executionContext,
ruleTypeRegistry: this.ruleTypeRegistry!,
kibanaBaseUrl: this.kibanaBaseUrl,
supportsEphemeralTasks: plugins.taskManager.supportsEphemeralTasks(),
maxEphemeralActionsPerAlert: this.config.then((config) => config.maxEphemeralActionsPerAlert),
this.config.then((config) => {
taskRunnerFactory.initialize({
logger,
getServices: this.getServicesFactory(core.savedObjects, core.elasticsearch),
getRulesClientWithRequest,
spaceIdToNamespace,
actionsPlugin: plugins.actions,
encryptedSavedObjectsClient,
basePathService: core.http.basePath,
eventLogger: this.eventLogger!,
internalSavedObjectsRepository: core.savedObjects.createInternalRepository(['alert']),
executionContext: core.executionContext,
ruleTypeRegistry: this.ruleTypeRegistry!,
kibanaBaseUrl: this.kibanaBaseUrl,
supportsEphemeralTasks: plugins.taskManager.supportsEphemeralTasks(),
maxEphemeralActionsPerAlert: config.maxEphemeralActionsPerAlert,
cancelAlertsOnRuleTimeout: config.cancelAlertsOnRuleTimeout,
});
});

this.eventLogService!.registerSavedObjectProvider('alert', (request) => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ const createExecutionHandlerParams: jest.Mocked<
stateVal: 'My other {{state.value}} goes here',
},
supportsEphemeralTasks: false,
maxEphemeralActionsPerAlert: Promise.resolve(10),
maxEphemeralActionsPerAlert: 10,
};

beforeEach(() => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ export interface CreateExecutionHandlerOptions<
request: KibanaRequest;
alertParams: AlertTypeParams;
supportsEphemeralTasks: boolean;
maxEphemeralActionsPerAlert: Promise<number>;
maxEphemeralActionsPerAlert: number;
}

interface ExecutionHandlerOptions<ActionGroupIds extends string> {
Expand Down Expand Up @@ -157,7 +157,7 @@ export function createExecutionHandler<
const alertLabel = `${alertType.id}:${alertId}: '${alertName}'`;

const actionsClient = await actionsPlugin.getActionsClientWithRequest(request);
let ephemeralActionsToSchedule = await maxEphemeralActionsPerAlert;
let ephemeralActionsToSchedule = maxEphemeralActionsPerAlert;
for (const action of actions) {
if (
!actionsPlugin.isActionExecutable(action.id, action.actionTypeId, { notifyUsage: true })
Expand Down
15 changes: 8 additions & 7 deletions x-pack/plugins/alerting/server/task_runner/task_runner.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ describe('Task Runner', () => {
ruleTypeRegistry,
kibanaBaseUrl: 'https://localhost:5601',
supportsEphemeralTasks: false,
maxEphemeralActionsPerAlert: new Promise((resolve) => resolve(10)),
maxEphemeralActionsPerAlert: 10,
cancelAlertsOnRuleTimeout: true,
};

function testAgainstEphemeralSupport(
Expand Down Expand Up @@ -285,7 +286,7 @@ describe('Task Runner', () => {
expect(call.services).toBeTruthy();

const logger = taskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(2);
expect(logger.debug).toHaveBeenCalledTimes(3);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -432,7 +433,7 @@ describe('Task Runner', () => {
`);

const logger = customTaskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(3);
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -648,7 +649,7 @@ describe('Task Runner', () => {
expect(actionsClient.ephemeralEnqueuedExecution).toHaveBeenCalledTimes(0);

const logger = taskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).toHaveBeenCalledTimes(5);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -848,7 +849,7 @@ describe('Task Runner', () => {
expect(enqueueFunction).toHaveBeenCalledTimes(1);

const logger = customTaskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).toHaveBeenCalledTimes(5);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -1537,7 +1538,7 @@ describe('Task Runner', () => {
`);

const logger = customTaskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).toHaveBeenCalledTimes(5);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -4339,7 +4340,7 @@ describe('Task Runner', () => {
expect(call.services).toBeTruthy();

const logger = taskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(2);
expect(logger.debug).toHaveBeenCalledTimes(3);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down
Loading

0 comments on commit 53cdde4

Please sign in to comment.