Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add size in bytes to batch processor #4761

Closed
pavolloffay opened this issue Jun 18, 2020 · 4 comments
Closed

Add size in bytes to batch processor #4761

pavolloffay opened this issue Jun 18, 2020 · 4 comments
Labels

Comments

@pavolloffay
Copy link
Member

Is your feature request related to a problem? Please describe.

The Bulk processor controls the batch size via send_batch_size which is the number of spans or metrics. However the spans can vary in size and therefore it would be useful to instruct the batch processor to send the batch once a size in bytes is reached.

Describe the solution you'd like

Add send_batch_size_bytes to the Batch processor https://github.com/open-telemetry/opentelemetry-collector/blob/master/processor/batchprocessor/README.md#batch-processor

The batch processor will send the batch once size in bytes is reached.

Describe alternatives you've considered

Additional context
In Jaeger we would like to put the batch processor in front of ES exporter that uses Elasticsearch bulk API (sends multiple requests in batch). Hence we would like to control how much data is sent to the storage.

Related to jaegertracing/jaeger#2295.

@jmacd
Copy link
Contributor

jmacd commented Jun 18, 2020

This is a tricky problem. I've seen one library attempt to solve it in Go: https://godoc.org/google.golang.org/api/support/bundler

It's an irritating problem because the natural solution is to repeatedly invoke proto.Size() on the batch object as it is assembled, but this leads to an O(N^2) algorithm. To work around this requires being conservative about the cost of joining two protocol messages into one in terms of additional tag and length bytes overhead.

The same issue affects the span batcher in every exporter across OpenTelemetry. Limiting a batch in terms of number of units leaves an unlimited batch size which immediately causes trouble when used with gRPC.

@pavolloffay
Copy link
Member Author

but this leads to an O(N^2) algorithm

Perhaps the size could be cached to some degree. The algorithm could work on the span sizes. The spans would be added to batch and once the size is higher as the threshold the flush would be called. The same could apply to other data types.

@gramidt
Copy link
Member

gramidt commented Jan 25, 2021

This is a rather tricky problem and would love to work with others on coming up with a few solutions to evaluate.

@bogdandrutu bogdandrutu transferred this issue from open-telemetry/opentelemetry-collector Aug 20, 2021
@alolita alolita added the processor/batch Batch processor label Sep 30, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 4, 2022
@atoulme atoulme closed this as completed Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants