Proposal: selective and fast logs deduplicator #1900

pracucci · 2022-05-20T08:11:42Z

Problem

A high traffic Mimir cluster can log a lot. For example, I've analysed the log rate of a medium-size cluster (with a good % of requests return 4xx because of some limits hit or out of order/bounds samples written) running with -log.level=info and the vaste majority of logs come from 2 sources:

grpc_logging.go:38 (48k logs / sec)
push.go:89 (13k logs / sec)

All other logging callers are orders of magnitude less noisy.

Data as been queried from Loki:

sum by(caller) (rate({namespace="REDACTED"} | logfmt | __error__="" [5m]))

Logs are very important and useful when debugging, but repeating the same log hundreds or thousands of times per second is not much useful, other than adding pressure to the system.

Proposal

I propose to build a logs deduplicator in Mimir, following these design principles:

Selective: deduplicate only logs from grpc_logging.go and push.go (in the future can be plugged in other places, if required).
Intelligent: deduplicate logs of the same "type" but not necessarily with the same exact log message (we don't many exactly equal log messages in our use cases).
Fast: ideally, should be a positive-sum change. The overhead introduced by the deduplicator should be absorbed by the reduced pressure on the downstream logging pipeline.

Intelligent

An example log:

level=warn ts=2022-05-19T14:38:39.174301048Z caller=grpc_logging.go:38 method=/cortex.Ingester/Push duration=1.174705ms err="rpc error: code = Code(400) desc = user=tenant-1: err: out of order sample. timestamp=2022-05-19T14:38:21.89Z, series={__name__=\"loki_ingester_chunk_size_bytes_sum\", pod=\"ingester-1\"}" msg=gRPC

The deduplication key for the example log above should be composed only by:

code=400
user=tenant-1
err: out of order sample

Discussion on actual proposed implementation will follow.

The text was updated successfully, but these errors were encountered:

bboreham · 2022-05-20T08:35:21Z

In the past we failed by not excluding the duration from dedupe.

replay · 2022-05-30T20:35:39Z

Would a de-duplicated log line have a field which indicates how many lines have been de-duplicated into one? I think that can still be important to know in some cases.

pracucci · 2022-05-31T07:33:34Z

Would a de-duplicated log line have a field which indicates how many lines have been de-duplicated into one?

Definitely yes!

pracucci added the type/performance label May 20, 2022

ortuman mentioned this issue Jun 7, 2022

util: log deduplicator #2041

Closed

3 tasks

bboreham mentioned this issue Jul 25, 2023

Sampled logging: log only 1 in N of specific errors #5584

Merged

3 tasks

pr00se mentioned this issue Sep 7, 2023

Limit repetitive error logging in ingesters #5894

Open

duricanikolic mentioned this issue Sep 12, 2023

Move error samplers from Limiter to Ingester #6014

Merged

3 tasks

This was referenced Sep 25, 2023

Get rid of httpgrpc in distributor.Push() #6122

Closed

Get rid of errMaxSeriesPerMetricLimitExceeded and errMaxSeriesPerUserLimitExceeded #6166

Merged

Refactoring: change the signature of push.Func function #6169

Merged

This was referenced Oct 2, 2023

Distributor write path: put httpgrpc.Errorf() calls at the topmost level #6191

Merged

Move distributor's validations into distributor package #6296

Merged

Get rid of /ingester/push endpoint #6299

Merged

duricanikolic mentioned this issue Oct 10, 2023

Ingester write path: refactor error creation #6324

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: selective and fast logs deduplicator #1900

Proposal: selective and fast logs deduplicator #1900

pracucci commented May 20, 2022

bboreham commented May 20, 2022

replay commented May 30, 2022

pracucci commented May 31, 2022

Proposal: selective and fast logs deduplicator #1900

Proposal: selective and fast logs deduplicator #1900

Comments

pracucci commented May 20, 2022

Problem

Proposal

bboreham commented May 20, 2022

replay commented May 30, 2022

pracucci commented May 31, 2022