Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper] Add exporter dropped metrics #11077

Closed

Conversation

TylerHelmuth
Copy link
Member

Description

I've been working under the assumption that we had a metric for when the collector dropped data that corresponds with when we write out Exporting Failed. Dropping data.. I was wrong, so I played around with adding such a metric.

Link to tracking issue

Related to #5056

Testing

Tested locally using the otlp exporter and a bogus endpoint:

receivers:
  otlp:
    protocols:
      grpc:
      http:
exporters:
  otlp:
    endpoint: "localhost:1111"
    retry_on_failure:
      max_elapsed_time: 30s
service:
  telemetry:
    metrics:
      address: localhost:9090
  pipelines:
    traces:
      receivers:
        - otlp
      exporters:
        - otlp

using telemetrygen I sent 1 trace: telemetrygen traces --otlp-insecure --traces 1.

Generated these metrics after 30 seconds:

# HELP otelcol_exporter_dropped_spans Number of spans dropped after failing to export.
# TYPE otelcol_exporter_dropped_spans counter
otelcol_exporter_dropped_spans{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 2
# HELP otelcol_exporter_queue_capacity Fixed capacity of the retry queue (in batches)
# TYPE otelcol_exporter_queue_capacity gauge
otelcol_exporter_queue_capacity{exporter="otlp",service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 1000
# HELP otelcol_exporter_queue_size Current size of the retry queue (in batches)
# TYPE otelcol_exporter_queue_size gauge
otelcol_exporter_queue_size{data_type="traces",exporter="otlp",service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 0
# HELP otelcol_exporter_send_failed_spans Number of spans in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_spans counter
otelcol_exporter_send_failed_spans{exporter="otlp",service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 2
# HELP otelcol_exporter_sent_spans Number of spans successfully sent to destination.
# TYPE otelcol_exporter_sent_spans counter
otelcol_exporter_sent_spans{exporter="otlp",service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 0
# HELP otelcol_process_cpu_seconds Total CPU user and system time in seconds
# TYPE otelcol_process_cpu_seconds counter
otelcol_process_cpu_seconds{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 0.06
# HELP otelcol_process_memory_rss Total physical memory (resident set size)
# TYPE otelcol_process_memory_rss gauge
otelcol_process_memory_rss{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 2.6591232e+07
# HELP otelcol_process_runtime_heap_alloc_bytes Bytes of allocated heap objects (see 'go doc runtime.MemStats.HeapAlloc')
# TYPE otelcol_process_runtime_heap_alloc_bytes gauge
otelcol_process_runtime_heap_alloc_bytes{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 3.677368e+06
# HELP otelcol_process_runtime_total_alloc_bytes Cumulative bytes allocated for heap objects (see 'go doc runtime.MemStats.TotalAlloc')
# TYPE otelcol_process_runtime_total_alloc_bytes counter
otelcol_process_runtime_total_alloc_bytes{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 4.508216e+06
# HELP otelcol_process_runtime_total_sys_memory_bytes Total bytes of memory obtained from the OS (see 'go doc runtime.MemStats.Sys')
# TYPE otelcol_process_runtime_total_sys_memory_bytes gauge
otelcol_process_runtime_total_sys_memory_bytes{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 1.2536848e+07
# HELP otelcol_process_uptime Uptime of the process
# TYPE otelcol_process_uptime counter
otelcol_process_uptime{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 60.676046
# HELP otelcol_receiver_accepted_spans Number of spans successfully pushed into the pipeline.
# TYPE otelcol_receiver_accepted_spans counter
otelcol_receiver_accepted_spans{receiver="otlp",service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev",transport="grpc"} 2
# HELP otelcol_receiver_refused_spans Number of spans that could not be pushed into the pipeline.
# TYPE otelcol_receiver_refused_spans counter
otelcol_receiver_refused_spans{receiver="otlp",service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev",transport="grpc"} 0
# HELP target_info Target metadata
# TYPE target_info gauge
target_info{service_instance_id="511f426c-d533-47a1-83ec-7cce65b7a487",service_name="otelcorecol",service_version="0.108.1-dev"} 1

Documentation

@TylerHelmuth TylerHelmuth requested review from a team, codeboten and dmitryax September 6, 2024 18:18
Copy link

codecov bot commented Sep 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.30%. Comparing base (2c0941f) to head (1825974).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11077      +/-   ##
==========================================
+ Coverage   92.28%   92.30%   +0.01%     
==========================================
  Files         413      413              
  Lines       19766    19793      +27     
==========================================
+ Hits        18242    18269      +27     
  Misses       1152     1152              
  Partials      372      372              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a changelog for this

exporter/exporterhelper/metadata.yaml Outdated Show resolved Hide resolved
exporter/exporterhelper/metadata.yaml Outdated Show resolved Hide resolved
@TylerHelmuth TylerHelmuth requested a review from a team as a code owner September 19, 2024 16:47
@dmitryax
Copy link
Member

I was wrong, so I played around with adding such a metric.

Why is this wrong? The telemetrygen sends 2 spans which are reflected in the logs and the otelcol_exporter_send_failed_spans metric. The name might be confusing, but I'm not sure why we need another metric

@TylerHelmuth
Copy link
Member Author

TylerHelmuth commented Sep 19, 2024

@dmitryax otelcol_exporter_send_failed_spans increments on any failure to export, even when the data will be retried.

If my test was setup to fail once and then succeed on the second try, otelcol_exporter_send_failed_spans would be 2 and otelcol_exporter_dropped_spans would be 0.

I'd like to add these metrics to record when data is finally and totally dropped. At the moment our only way to communicate this situations is the error logs Exporting failed. Dropping data and Exporting failed. Rejecting data.

@dmitryax
Copy link
Member

@dmitryax otelcol_exporter_send_failed_spans increments on any failure to export, even when the data will be retried.

If my test was setup to fail once and then succeed on the second try, otelcol_exporter_send_failed_spans would be 2 and otelcol_exporter_dropped_spans would be 0.

That's not true. The retry mechanism is past the point where we record the failed exports

@TylerHelmuth
Copy link
Member Author

@dmitryax you're right, this PR is not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants