Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SPAM from bucket monitor when number of results over 50 #6710

Open
NiFeuu opened this issue May 3, 2024 · 2 comments
Open

[BUG] SPAM from bucket monitor when number of results over 50 #6710

NiFeuu opened this issue May 3, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@NiFeuu
Copy link

NiFeuu commented May 3, 2024

Describe the bug

When a monitor is "by bucket" and the throttling is set to a value (60 minutes in my case), when the number of results from the monitor is over 50, the throotling time is no more respected, and the monitor execute the action every time it's runned
For example, I have a monitor which check if a CPU is over 70% (value 0.7). When I have 49 servers with CPU over 70%, I have one message by host, and every 60 minutes. all is ok.
But when there is 51 hosts over 70%, I have one message per minute (monitor scheduled every minutes)

To Reproduce
The index i am using in this example is named metricbeat-ng. It index metrics from metricbeat
1 Create a monitor with execution executed every minute and a throttling time at 60 minutes. Set the action to send a message to something you can log (for example, an email destination)
2 The content of the monitor is :

{
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "@timestamp": {
                            "from": "{{period_end}}||-2m",
                            "to": "{{period_end}}",
                            "include_lower": true,
                            "include_upper": true,
                            "format": "epoch_millis",
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "aggregations": {
        "composite_agg": {
            "composite": {
                "size": 1000,
                "sources": [
                    {
                        "host.hostname": {
                            "terms": {
                                "field": "host.hostname",
                                "missing_bucket": false,
                                "order": "asc"
                            }
                        }
                    }
                ]
            },
            "aggregations": {
                "avg_system_cpu_total_norm_pct": {
                    "avg": {
                        "field": "system.cpu.total.norm.pct"
                    }
                }
            }
        }
    }
}

3 The content of the trigger is :

{
    "buckets_path": {
        "avg_system_cpu_total_norm_pct": "avg_system_cpu_total_norm_pct"
    },
    "parent_bucket_path": "composite_agg",
    "script": {
        "source": "params.avg_system_cpu_total_norm_pct > 0.7",
        "lang": "painless"
    },
    "gap_policy": "skip"
}

4 Execute this script to send a cpu value of 0.75 (75%) every 10 seconds, for 60 hosts and during 5 minutes (30 *10 seconds)

for i in {0..30}; do
    for j in {1..60}; do
        json_content={\"@timestamp\":`date -u -d '+ '$((i*10))' seconds' +\"%Y-%m-%dT%H:%M:%S.000Z\"`,\"host.hostname\":\"Host-Test$((j))\",\"system.cpu.total.norm.pct\":0.75}
        echo $json_content
        curl -k -i -u 'smartpulse:#Smartpulse' -d $json_content -H 'Content-Type: application/json' -X POST https://localhost:9200/metricbeat-ng/_doc
    done
done

Expected behavior
Throttling time is respected in any case

OpenSearch Version
Opensearch 2.12

Dashboards Version
Opensearch Dashboard 2.12

Plugins

Plugin Alerting

Screenshots

Host/Environment (please complete the following information):

  • OS: Linux Red Hat 8.8
  • Browser and version : any browser, any version

Additional context

I have also noticed that over 500 buckets, the number of alerts increase at each execution of the monitor. Normaly it should stay the same (for example, for 510 hosts, the first minute it says 510 alerts, the second minute it says 520, the third 530...)
This is just an example with metricbeat, but we have seen the same behaviour on other index

@NiFeuu NiFeuu added bug Something isn't working untriaged labels May 3, 2024
@NiFeuu
Copy link
Author

NiFeuu commented May 3, 2024

Sorry, i've posted this issue in opensearch dashboard porject. Maybe it should go to the alerting section ?

@kavilla
Copy link
Member

kavilla commented May 3, 2024

@NiFeuu . Thanks for opening!

@opensearch-project/admin please redirect to alerting dashboards repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants