[processor/interval]: time-based batching #34906

sh0rez · 2024-08-28T13:22:46Z

Component(s)

processor/interval

Is your feature request related to a problem? Please describe.

intervalprocessor exports all metrics strictly on interval. with sufficient scale, this poses challenges, as metrics are collected over e.g. 60 seconds and then flushed all at once, leading to spikes and silence, instead of a constant load on the network and receiving side.

Describe the solution you'd like

Distribute metrics export over the entire interval.

I suggest this "sharding" is done on the stream level, grouping the streams as such (pseudocode):

const interval = 60*time.Second
var streams [60]map[identity.Stream]metric.DataPoint

func ingest(in []metric.Stream) {
  for id, dp := range in {
    k := id.Hash() % 60
    streams[k][id] = dp
}

func export() {
  for ts := range time.Tick(time.Second) {
    k := ts.Seconds() % 60
    next.ConsumeMetrics(streams[k])
  }
}

The text was updated successfully, but these errors were encountered:

github-actions · 2024-08-28T13:23:01Z

Pinging code owners:

processor/interval: @RichieSams @sh0rez @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ArthurSens · 2024-08-28T16:58:32Z

I wonder if we could discard old samples during ingestion. Like, do not store an array of datapoints, just replace the old one if we receive another with more recent timestamp 🤔

sh0rez · 2024-08-29T08:30:24Z

oh that's absolutely the case here, sorry if my pseudocode wasn't clear enough.

we are only storing the last datapoint per stream, but sharding our stored streams into 60 maps, so that we can flush one every second, instead of all every minute, more evenly distributing load over the course of a minute.

note we store streams in a map[identity.Stream]metric.DataPoint, which we have 60 of (as an array)

say you have a stream with id cbf29ce484222325, that would go into streams[17] (because 0xcbf29ce484222325 % 60 = 17). once the clock hits xx:xx:17, that set would be sent to the next pipeline step and cleared

RichieSams · 2024-08-29T14:52:07Z

Interesting.... I like the idea. For any given data stream, we're still aggregating at the given interval. But overall, we're doing flushes at interval / 60 (which could be configured) rate. To reduce the spikiness

crobert-1 · 2024-08-29T17:24:32Z

Issue filed by code owner, and another has voiced support. Removing needs triage with the understanding that discussion is still happening here.

sh0rez added enhancement New feature or request needs triage New item requiring triage labels Aug 28, 2024

github-actions bot added the processor/interval label Aug 28, 2024

crobert-1 removed the needs triage New item requiring triage label Aug 29, 2024

ArthurSens mentioned this issue Aug 31, 2024

[processor/interval] Refacor with time-base partitioning strategy #34948

Open

4 tasks

github-actions bot mentioned this issue Sep 3, 2024

Weekly Report: 2024-08-27 - 2024-09-03 #34966

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/interval]: time-based batching #34906

[processor/interval]: time-based batching #34906

sh0rez commented Aug 28, 2024 •

edited

Loading

github-actions bot commented Aug 28, 2024

ArthurSens commented Aug 28, 2024

sh0rez commented Aug 29, 2024 •

edited

Loading

RichieSams commented Aug 29, 2024

crobert-1 commented Aug 29, 2024

[processor/interval]: time-based batching #34906

[processor/interval]: time-based batching #34906

Comments

sh0rez commented Aug 28, 2024 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

github-actions bot commented Aug 28, 2024

ArthurSens commented Aug 28, 2024

sh0rez commented Aug 29, 2024 • edited Loading

RichieSams commented Aug 29, 2024

crobert-1 commented Aug 29, 2024

sh0rez commented Aug 28, 2024 •

edited

Loading

sh0rez commented Aug 29, 2024 •

edited

Loading