Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otel Processor metric filter is not working as expected #28585

Closed
prateekpatel opened this issue Oct 24, 2023 · 16 comments
Closed

Otel Processor metric filter is not working as expected #28585

prateekpatel opened this issue Oct 24, 2023 · 16 comments
Labels
processor/filter Filter processor question Further information is requested

Comments

@prateekpatel
Copy link

Component(s)

processor/filter

What happened?

I have the following configuration for my OpenTelemetry (otel) filter processor. Within the receiver configuration, kubelet is scraping node metrics, and the filter is intended to selectively include or exclude metrics based on the filter configuration.

Here's the scrape configuration job in the receiver:

job_name: kubelet
scrape_interval: 1m
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:

  • role: node
    api_server: null
    relabel_configs:
  • action: labelmap
    regex: _meta_kubernetes_node_label(.+)
  • target_label: address
    replacement: kubernetes.default.svc:443
  • source_labels: [__meta_kubernetes_node_name]
    action: replace
    target_label: metrics_path
    regex: (ip-.*)
    replacement: /api/v1/nodes/$${1}/proxy/metrics

In the filter configuration:

filter/metrics:
error_mode: ignore
metrics:
include:
match_type: regexp
metric_names:
- otelcol_receiver_accepted_metric_points_total$
- jvm_memory_used_bytes$
- jvm_threads_states_threads$
- container_cpu_usage_seconds_total$
- kubelet_runtime_operations_duration_seconds_bucket$

However, I'm encountering an issue. While I expect to see the metric name kubelet_runtime_operations_duration_seconds_bucket,' I also observe metrics like 'kubelet_runtime_operations_duration_seconds_sum' and 'kubelet_runtime_operations_duration_seconds_total' in Grafana. Any guidance or insights on resolving this would be greatly appreciated."
Screenshot 2023-10-24 at 16 34 24

I tried below with no success

filter/metrics:
error_mode: ignore
metrics:
include:
match_type: strict
metric_names:
- otelcol_receiver_accepted_metric_points_total
- jvm_memory_used_bytes
- jvm_threads_states_threads
- container_cpu_usage_seconds_total
- kubelet_runtime_operations_duration_seconds

filter/metrics:
error_mode: ignore
metrics:
include:
match_type: regexp
metric_names:
- otelcol.*
- jvm_memory_used_bytes
- jvm_threads_states_threads
- container_cpu_usage_seconds_total
- kubelet_runtime_operations_duration_seconds_buck.*
- volume_operation_total_seconds_bucket.*

Collector version

refinery version v0.17.0

Environment information

Environment

OpenTelemetry Collector configuration

receivers:

  # This section below is used to collect the metric
  prometheus/internal:
    config:
      scrape_configs:
        - job_name: cadvisor
          scrape_interval: 1m
          scheme: https
          metrics_path: /metrics/cadvisor
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
            - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            action: replace
            target_label: __metrics_path__
            regex: (ip-.*)
            replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor


        - job_name: kubelet
          scrape_interval: 1m
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
            - role: node
              api_server: null
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            action: replace
            target_label: __metrics_path__
            regex: (ip-.*)
            replacement: /api/v1/nodes/$1/proxy/metrics
processors:
  memory_limiter/with-settings:
    check_interval: 1s
    limit_percentage: 80
    spike_limit_percentage: 20

 
  batch:
    timeout: 5s
    send_batch_size: 8192
    send_batch_max_size: 0

  # Detect if the collector is running on a cloud system. Overrides resource attributes set by receivers.
  # Detector order is important: the `system` detector goes last, so it can't preclude cloud detectors from setting host/os info.
  resourcedetection/internal:
    detectors: [ gcp, ecs, ec2, azure, system ]
    override: true            
  filter/metrics:
    error_mode: propagate
    metrics:
      include:
        match_type: regexp
        metric_names:
          - otel.*
          - container_cpu_.*
          - jvm_gc.*
          - container_mem.*
          - container_tasks_state.*
          - container_blkio_device_usage_total.*
          - jvm.Memory.pools.Eden-Space.*
          - kubelet_runtime_operations_duration_seconds_*
          - kubelet_volume_stat*
exporters:
 

  logging:
    verbosity: normal
    sampling_initial: 2
    sampling_thereafter: 500
    
  prometheusremotewrite:
    endpoint: {remote_write}     
    auth:
      authenticator: sigv4auth
pipelines:
    metrics/general:
      receivers: 
        - prometheus/internal
      processors:
        - memory_limiter/with-settings 
        
        - filter/metrics
        
        - batch
      exporters:
        - logging
        - prometheusremotewrite

Log output

No response

Additional context

No response

@prateekpatel prateekpatel added bug Something isn't working needs triage New item requiring triage labels Oct 24, 2023
@github-actions github-actions bot added the processor/filter Filter processor label Oct 24, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

crobert-1 commented Oct 27, 2023

Hello @prateekpatel, just want to make sure I understand: From your configuration it looks like you're explicitly including metrics that match the filter kubelet_runtime_operations_duration_seconds_*. Then from your description you say that you expect kubelet_runtime_operations_duration_seconds_bucket, but don't expect kubelet_runtime_operations_duration_seconds_sum and kubelet_runtime_operations_duration_seconds_total. Am I understanding properly?

Can you share exactly which metric names you want, and which ones you don't want? Then for the configurations that aren't working, can you share what their output is, and how it's not matching what you'd expect?

@kizitonzeka
Copy link

Hi @crobert-1 , I am working together with @prateekpatel on this.
When kubelet_runtime_operations_duration_seconds_* is used we are able to see all kubelet_runtime_operations_duration_seconds_(bucket|count|sum) but when we try to get kubelet_runtime_operations_duration_seconds_bucket just by itself, we dont see the metric. In the case above we are using regexp match type.
I also tried to use strict match type to include kubelet_runtime_operations_duration_seconds_bucket I dont see the metric.

I made a slack post on this.
https://cloud-native.slack.com/archives/C01N6P7KR6W/p1698844433792289

@crobert-1
Copy link
Member

Copying current status and question from the Slack thread

The goal is to keep only the kubelet_runtime_operations_duration_seconds_bucket metric. Here's the config being used:

filter/metrics:
        error_mode: propagate
        metrics:
          include:
            match_type: strict
            metric_names:
              - kubelet_runtime_operations_duration_seconds_bucket

With this filter in place all metrics end up being filtered, including the kubelet_runtime_operations_duration_seconds_bucket metric that is supposed to be included.

@kizitonzeka
Copy link

@crobert-1 could this behaviour be because this is a histogram metric type and _bucket, _sum and _count are considered datapoints where as filtering is done based on metric name which in this case is kubelet_runtime_operations_duration_seconds?
I have also observed the same behaviour in other histogram metrics such as otelcol_processor_batch_batch_send_size and storage_operation_duration_seconds.

@TylerHelmuth
Copy link
Member

@kizitonzeka have you confirmed, via the debug exporter using detailed verbosity, that the metric name you're using is the name of the metric as it appears in the collector?

@kizitonzeka
Copy link

@TylerHelmuth Yes, the metric name is kubelet_runtime_operations_duration_seconds as on the screenshot. Just to confirm as on my previous comment, should I not expect to filter based on bucket, sum or count for this type of metric?

Screenshot 2023-11-06 at 16 02 23

@TylerHelmuth
Copy link
Member

Correct, if you filter by metric name it would drop (or keep) the entire metric and all of its datapoints.

@crobert-1 crobert-1 added question Further information is requested and removed bug Something isn't working needs triage New item requiring triage labels Nov 7, 2023
@crobert-1
Copy link
Member

Is there anything else that we can do to help here, or are you able to get filtering working the way you need now? We can close this issue if there's nothing left to do here.

@kizitonzeka
Copy link

Thank you so much @crobert-1 @TylerHelmuth, we have more understanding now on how filtering is working and we have made it to suit our need. This issue can be closed.

@karvounis-form3
Copy link

@kizitonzeka I am interested to hear how you made it work. I am also facing issues with it. We transform Prometheus metrics from Delta to Cumulative temporality and we have metrics like:

  • metricA_bucket
  • metricA_sum
  • metricA_count

However, when I am trying to use:

processors:
  filter:
    metrics:
      exclude:
        match_type: strict
        metric_names:
          - metricA_bucket

metricA_bucket still appears in Grafana

@kizitonzeka
Copy link

@karvounis-form3 as far as I know the filtering is done based on metric names and metricA_bucket is a datapoint of metricA. In my case I wasnt able to filter those datapoints (_bucket, _sum, _count) so I used strict match type and allowed by metric name.

@VJ1313
Copy link

VJ1313 commented May 10, 2024

Can someone help here? I am trying to stop entire bucket but its not working. Basically dont need any data from this bucket - http_server_duration_milliseconds_bucket

receivers:
otlp:
protocols:
grpc: null
http: null
processors:
memory_limiter:
limit_mib: 20
check_interval: 5s
batch: null
filter/uri:
error_mode: ignore
metrics:
metric:
- IsMatch(name, "http_server_requests_.")
datapoint:
- IsMatch(attributes["uri"], ".
/health.")
- IsMatch(attributes["uri"], ".
/version.")
- IsMatch(attributes["uri"], ".
/actuator.*")
filter/http:
error_mode: ignore
metrics:
- name == "http_server_duration_milliseconds_bucket"
datapoint:
- attributes["net_protocol_name"] == "http"

@crobert-1
Copy link
Member

Hello @VJ1313, can you please open another issue with your question? You can include a reference to this issue to help give some more context as well.

@VJ1313
Copy link

VJ1313 commented May 10, 2024

Thanks Done.

@Cairry
Copy link

Cairry commented Aug 13, 2024

@karvounis-form3据我所知,过滤是基于指标名称进行的,并且metricA_bucket是 的数据点metricA。就我而言,我无法过滤这些数据点(_bucket _sum _count),因此我使用了严格匹配类型并按指标名称允许。

Hello, I have the same problem. Since I cannot filter out the two indicators xx_count and xx_bucket, how can I stop them from collecting indicators? I only need xx_sum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/filter Filter processor question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants