Otel Processor metric filter is not working as expected #28585

prateekpatel · 2023-10-24T19:22:11Z

Component(s)

processor/filter

What happened?

I have the following configuration for my OpenTelemetry (otel) filter processor. Within the receiver configuration, kubelet is scraping node metrics, and the filter is intended to selectively include or exclude metrics based on the filter configuration.

Here's the scrape configuration job in the receiver:

job_name: kubelet
scrape_interval: 1m
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:

role: node
api_server: null
relabel_configs:
action: labelmap
regex: _meta_kubernetes_node_label(.+)
target_label: address
replacement: kubernetes.default.svc:443
source_labels: [__meta_kubernetes_node_name]
action: replace
target_label: metrics_path
regex: (ip-.*)
replacement: /api/v1/nodes/$${1}/proxy/metrics

In the filter configuration:

filter/metrics:
error_mode: ignore
metrics:
include:
match_type: regexp
metric_names:
- otelcol_receiver_accepted_metric_points_total$
- jvm_memory_used_bytes$
- jvm_threads_states_threads$
- container_cpu_usage_seconds_total$
- kubelet_runtime_operations_duration_seconds_bucket$

However, I'm encountering an issue. While I expect to see the metric name kubelet_runtime_operations_duration_seconds_bucket,' I also observe metrics like 'kubelet_runtime_operations_duration_seconds_sum' and 'kubelet_runtime_operations_duration_seconds_total' in Grafana. Any guidance or insights on resolving this would be greatly appreciated."

I tried below with no success

filter/metrics:
error_mode: ignore
metrics:
include:
match_type: strict
metric_names:
- otelcol_receiver_accepted_metric_points_total
- jvm_memory_used_bytes
- jvm_threads_states_threads
- container_cpu_usage_seconds_total
- kubelet_runtime_operations_duration_seconds

filter/metrics:
error_mode: ignore
metrics:
include:
match_type: regexp
metric_names:
- otelcol.*
- jvm_memory_used_bytes
- jvm_threads_states_threads
- container_cpu_usage_seconds_total
- kubelet_runtime_operations_duration_seconds_buck.*
- volume_operation_total_seconds_bucket.*

Collector version

refinery version v0.17.0

Environment information

Environment

OpenTelemetry Collector configuration

receivers:

  # This section below is used to collect the metric
  prometheus/internal:
    config:
      scrape_configs:
        - job_name: cadvisor
          scrape_interval: 1m
          scheme: https
          metrics_path: /metrics/cadvisor
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
            - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            action: replace
            target_label: __metrics_path__
            regex: (ip-.*)
            replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor


        - job_name: kubelet
          scrape_interval: 1m
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
            - role: node
              api_server: null
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            action: replace
            target_label: __metrics_path__
            regex: (ip-.*)
            replacement: /api/v1/nodes/$1/proxy/metrics
processors:
  memory_limiter/with-settings:
    check_interval: 1s
    limit_percentage: 80
    spike_limit_percentage: 20

 
  batch:
    timeout: 5s
    send_batch_size: 8192
    send_batch_max_size: 0

  # Detect if the collector is running on a cloud system. Overrides resource attributes set by receivers.
  # Detector order is important: the `system` detector goes last, so it can't preclude cloud detectors from setting host/os info.
  resourcedetection/internal:
    detectors: [ gcp, ecs, ec2, azure, system ]
    override: true            
  filter/metrics:
    error_mode: propagate
    metrics:
      include:
        match_type: regexp
        metric_names:
          - otel.*
          - container_cpu_.*
          - jvm_gc.*
          - container_mem.*
          - container_tasks_state.*
          - container_blkio_device_usage_total.*
          - jvm.Memory.pools.Eden-Space.*
          - kubelet_runtime_operations_duration_seconds_*
          - kubelet_volume_stat*
exporters:
 

  logging:
    verbosity: normal
    sampling_initial: 2
    sampling_thereafter: 500
    
  prometheusremotewrite:
    endpoint: {remote_write}     
    auth:
      authenticator: sigv4auth
pipelines:
    metrics/general:
      receivers: 
        - prometheus/internal
      processors:
        - memory_limiter/with-settings 
        
        - filter/metrics
        
        - batch
      exporters:
        - logging
        - prometheusremotewrite

Log output

No response

Additional context

No response

github-actions · 2023-10-24T19:22:33Z

Pinging code owners:

processor/filter: @TylerHelmuth @boostchicken

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 · 2023-10-27T21:47:49Z

Hello @prateekpatel, just want to make sure I understand: From your configuration it looks like you're explicitly including metrics that match the filter kubelet_runtime_operations_duration_seconds_*. Then from your description you say that you expect kubelet_runtime_operations_duration_seconds_bucket, but don't expect kubelet_runtime_operations_duration_seconds_sum and kubelet_runtime_operations_duration_seconds_total. Am I understanding properly?

Can you share exactly which metric names you want, and which ones you don't want? Then for the configurations that aren't working, can you share what their output is, and how it's not matching what you'd expect?

kizitonzeka · 2023-11-02T12:14:36Z

Hi @crobert-1 , I am working together with @prateekpatel on this.
When kubelet_runtime_operations_duration_seconds_* is used we are able to see all kubelet_runtime_operations_duration_seconds_(bucket|count|sum) but when we try to get kubelet_runtime_operations_duration_seconds_bucket just by itself, we dont see the metric. In the case above we are using regexp match type.
I also tried to use strict match type to include kubelet_runtime_operations_duration_seconds_bucket I dont see the metric.

I made a slack post on this.
https://cloud-native.slack.com/archives/C01N6P7KR6W/p1698844433792289

crobert-1 · 2023-11-02T15:13:29Z

Copying current status and question from the Slack thread

The goal is to keep only the kubelet_runtime_operations_duration_seconds_bucket metric. Here's the config being used:

filter/metrics:
        error_mode: propagate
        metrics:
          include:
            match_type: strict
            metric_names:
              - kubelet_runtime_operations_duration_seconds_bucket

With this filter in place all metrics end up being filtered, including the kubelet_runtime_operations_duration_seconds_bucket metric that is supposed to be included.

kizitonzeka · 2023-11-06T10:19:00Z

@crobert-1 could this behaviour be because this is a histogram metric type and _bucket, _sum and _count are considered datapoints where as filtering is done based on metric name which in this case is kubelet_runtime_operations_duration_seconds?
I have also observed the same behaviour in other histogram metrics such as otelcol_processor_batch_batch_send_size and storage_operation_duration_seconds.

TylerHelmuth · 2023-11-06T15:19:21Z

@kizitonzeka have you confirmed, via the debug exporter using detailed verbosity, that the metric name you're using is the name of the metric as it appears in the collector?

kizitonzeka · 2023-11-06T16:10:27Z

@TylerHelmuth Yes, the metric name is kubelet_runtime_operations_duration_seconds as on the screenshot. Just to confirm as on my previous comment, should I not expect to filter based on bucket, sum or count for this type of metric?

TylerHelmuth · 2023-11-06T16:29:43Z

Correct, if you filter by metric name it would drop (or keep) the entire metric and all of its datapoints.

crobert-1 · 2023-11-07T16:27:01Z

Is there anything else that we can do to help here, or are you able to get filtering working the way you need now? We can close this issue if there's nothing left to do here.

kizitonzeka · 2023-11-07T17:44:01Z

Thank you so much @crobert-1 @TylerHelmuth, we have more understanding now on how filtering is working and we have made it to suit our need. This issue can be closed.

karvounis-form3 · 2024-01-23T15:55:27Z

@kizitonzeka I am interested to hear how you made it work. I am also facing issues with it. We transform Prometheus metrics from Delta to Cumulative temporality and we have metrics like:

metricA_bucket
metricA_sum
metricA_count

However, when I am trying to use:

processors:
  filter:
    metrics:
      exclude:
        match_type: strict
        metric_names:
          - metricA_bucket

metricA_bucket still appears in Grafana

kizitonzeka · 2024-01-27T10:19:22Z

@karvounis-form3 as far as I know the filtering is done based on metric names and metricA_bucket is a datapoint of metricA. In my case I wasnt able to filter those datapoints (_bucket, _sum, _count) so I used strict match type and allowed by metric name.

VJ1313 · 2024-05-10T10:24:36Z

Can someone help here? I am trying to stop entire bucket but its not working. Basically dont need any data from this bucket - http_server_duration_milliseconds_bucket

receivers:
otlp:
protocols:
grpc: null
http: null
processors:
memory_limiter:
limit_mib: 20
check_interval: 5s
batch: null
filter/uri:
error_mode: ignore
metrics:
metric:
- IsMatch(name, "http_server_requests_.")
datapoint:
- IsMatch(attributes["uri"], "./health.")
- IsMatch(attributes["uri"], "./version.")
- IsMatch(attributes["uri"], ". /actuator.*")
filter/http:
error_mode: ignore
metrics:
- name == "http_server_duration_milliseconds_bucket"
datapoint:
- attributes["net_protocol_name"] == "http"

crobert-1 · 2024-05-10T15:06:13Z

Hello @VJ1313, can you please open another issue with your question? You can include a reference to this issue to help give some more context as well.

VJ1313 · 2024-05-10T15:10:44Z

Thanks Done.

Cairry · 2024-08-13T03:38:58Z

@karvounis-form3据我所知，过滤是基于指标名称进行的，并且metricA_bucket是的数据点metricA。就我而言，我无法过滤这些数据点（_bucket、 _sum、 _count），因此我使用了严格匹配类型并按指标名称允许。

Hello, I have the same problem. Since I cannot filter out the two indicators xx_count and xx_bucket, how can I stop them from collecting indicators? I only need xx_sum.

prateekpatel added bug Something isn't working needs triage New item requiring triage labels Oct 24, 2023

github-actions bot added the processor/filter Filter processor label Oct 24, 2023

github-actions bot mentioned this issue Oct 31, 2023

Weekly Report: 2023-10-24 - 2023-10-31 #28813

Closed

crobert-1 added the waiting for author label Oct 31, 2023

crobert-1 removed the waiting for author label Nov 2, 2023

github-actions bot mentioned this issue Nov 7, 2023

Weekly Report: 2023-10-31 - 2023-11-07 #29000

Closed

crobert-1 added question Further information is requested and removed bug Something isn't working needs triage New item requiring triage labels Nov 7, 2023

crobert-1 closed this as completed Nov 7, 2023

VJ1313 mentioned this issue May 10, 2024

Otel Processor metric filter is not working as expected #32982

Open

fredrikgh mentioned this issue Jun 18, 2024

regex wildcards via target allocator not matching open-telemetry/opentelemetry-operator#3051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Otel Processor metric filter is not working as expected #28585

Otel Processor metric filter is not working as expected #28585

prateekpatel commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

crobert-1 commented Oct 27, 2023 •

edited

Loading

kizitonzeka commented Nov 2, 2023

crobert-1 commented Nov 2, 2023

kizitonzeka commented Nov 6, 2023

TylerHelmuth commented Nov 6, 2023

kizitonzeka commented Nov 6, 2023

TylerHelmuth commented Nov 6, 2023

crobert-1 commented Nov 7, 2023

kizitonzeka commented Nov 7, 2023

karvounis-form3 commented Jan 23, 2024

kizitonzeka commented Jan 27, 2024

VJ1313 commented May 10, 2024

crobert-1 commented May 10, 2024

VJ1313 commented May 10, 2024

Cairry commented Aug 13, 2024

Otel Processor metric filter is not working as expected #28585

Otel Processor metric filter is not working as expected #28585

Comments

prateekpatel commented Oct 24, 2023

Component(s)

What happened?

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Oct 24, 2023

crobert-1 commented Oct 27, 2023 • edited Loading

kizitonzeka commented Nov 2, 2023

crobert-1 commented Nov 2, 2023

kizitonzeka commented Nov 6, 2023

TylerHelmuth commented Nov 6, 2023

kizitonzeka commented Nov 6, 2023

TylerHelmuth commented Nov 6, 2023

crobert-1 commented Nov 7, 2023

kizitonzeka commented Nov 7, 2023

karvounis-form3 commented Jan 23, 2024

kizitonzeka commented Jan 27, 2024

VJ1313 commented May 10, 2024

crobert-1 commented May 10, 2024

VJ1313 commented May 10, 2024

Cairry commented Aug 13, 2024

crobert-1 commented Oct 27, 2023 •

edited

Loading