Log record mutations are visible in next registered processors #4065

pellared · 2024-05-31T12:32:37Z

What are you trying to achieve?

Define if log record mutations done in record processor are visible in next registered processors.

I believe the spec is not clear enough and can be interpreted in different ways leading to very different implementations (e.g. chained processors that can mutate data and see mutations of previous processor or parallel processors that don't see mutations of other processors).

We likely need to fix the spec. It is in a stable doc but is vague enough that I think it is a bug and should be fixed.

Since the spec is vague I am also worried that existing implementations in different language SDKs may have interpreted it differently and have made different choices. IMO, we need to start with auditing existing implementations and see where we stand currently and then decide how exactly we want to fix the spec (one of the options is that both "chained-allow-mutation" and "fanout-copy-to-parallel" should be possible like they are in the Collector pipelines).

Additional context.

Issue originates from Logs: Allow the registered processor to be an independent pipeline #4010
Add isolating log record processor #4062 (comment)

The undocumented assumption was the log processor will follow the pattern that mutations will be visible to the next processor to be consistent with span processors. However, the C++ Logs SDK is not designed this way (log records are passed as a copy to subsequent registered processors). Other Logs SDKs follow the pattern or are not stable.

pellared · 2024-05-31T12:35:18Z

I will check if it is possible to follow the pattern in Go even though it's pitfalls I outlined #4010 (comment). My main goal is to keep a possibility to have zero allocations when the log records are not modified (as this the the usual use case).

If we succeed to find a satisfying design for Go (and Rust), we could update the specification to make this requirement explicit.
However, what are we going to do with the fact that the stable C++ Logs SDK already has taken a different route? Would it have to be a "SHOULD" instead of a "MUST"?

CC @open-telemetry/rust-maintainers @open-telemetry/cpp-maintainers @open-telemetry/technical-committee

jack-berg · 2024-05-31T15:26:40Z

If a log processor's mutations were not visible to the next processor, then there would be no (or very little) point of onEmit receiving a ReadWriteLogRecord. If mutations are only visible to the processor being called, then the SDK might as well call onEmit with a ReadableLogRecord, since that processor can easily mutate with copy-on-write semantics.

Other evidence that the "mutations are not visible to subsequent processors" is the wrong interpretation:

The log SDK was modeled after the trace SDK, and SpanProcessor#onStart modifications are visible to subsequent processors. It would be a very strange design if they weren't. .
"LogRecordProcessors are invoked in the same order they were registered" (see spec). There's no point of this if mutations aren't visible to subsequent processors.
"A LogRecordProcessor may freely modify logRecord for the duration of the OnEmit call. If logRecord is needed after OnEmit returns (i.e. for asynchronous processing) only reads are permitted" (see spec). The intent of this line is clarify that log processors can only mutate log records synchronously, since if they hold onto the record and mutate asynchronously, the mutation may conflict with one from a subsequent processor. See the Resolve lingering question of LogRecord mutability in LogRecordProcessor #2955 for more context, which was resolved by this statement.

However, what are we going to do with the fact that the stable C++ Logs SDK already has taken a different route?

Wondering if you can elaborate on this. I can't seem to find the LogRecordProcessor interface definition in c++ (I don't know c++ so I'm very naive navigating the code). Do log record processors call subsequent processors in c++? Is it possible to write a log processor which extracts and includes baggage attributes on log records in c++?

pellared · 2024-05-31T15:47:28Z

If a log processor's mutations were not visible to the next processor, then there would be no (or very little) point of onEmit receiving a ReadWriteLogRecord.

The modified record can be passed to a decorated processor. This is how mutations can be currently done in C++, Rust, and Go. E.g. mutatingProcessor found here: #4010 (comment)

Notice that currently there is nothing in the spec that allows creating a new/custom ReadableLogRecord.

Wondering if you can elaborate on this.

Sure. Regarding C++ my information is based on #4010 (comment).

jack-berg · 2024-05-31T16:03:21Z

The modified record can be passed to a decorated processor. This is how mutations can be currently done in C++, Rust, and Go.

There's no mention of a decorated processor in the spec, so it sounds like C++, Rust and Go invented new concepts. To be clear, I support an effort to evolve the spec to include accommodations which allow for higher performance implementations. However, I feel strongly that the current language of the spec does imply that log record mutations are visible to the subsequent processors. The design juggles several priorities, including performance and conceptual symmetry with prior art in otel (the trace signal in particular). A design featuring chaining or decorating processors might pave a clear path for a high performance implementation, it does so at the expense of conceptual symmetry.

My rust, c++, and go skills are pretty poor, but there may be creative language-specific ways to keep the door open to zero allocations while also staying within the spirit of the spec. For example, a processor might have a method indicating whether or not it mutates data. If none of the processors mutate data, or if there is only one processor registered, then the SDK could use an alternative implementation of ReadWriteLogRecord which avoids allocations.

pellared · 2024-05-31T16:20:10Z

Actually, it may occur that it won't be a problem for Go. I am in progress of evaluating it. I remember from my previous prototypes for most cases the Go escape analysis was doing good job. Reference https://github.com/open-telemetry/opentelemetry-go/blob/main/log/DESIGN.md#passing-record-as-pointer-to-loggeremit

Rust Logs SDK is also not stable and there was a discussion whether to use the same pattern as for span processors. See #4010 (comment)

It may be a good compromise to follow span processor pattern to achieve better consistency across languages as well as with tracing.

pellared · 2024-06-03T12:20:33Z

FYI open-telemetry/opentelemetry-go#5469

pellared · 2024-06-03T16:43:35Z

For sake of transparency I want to share some thoughts based on feedback I received from @MrAlias.

The specification does not require log records to be concurrent safe (notice that the spans are required to be concurrent safe by the spec here). What happens when a subsequent processor modifies a log record that was passed to a batch processor? Does the batch processor need to make a deep copy of processed records?

How will we make sure that users will not modify the log record asynchronously? In Go, I thought about adding a comment. However, commenting something like this is not going to be a safe solution. There are going to be many user use cases that will violate this and having a comment is not an adequate design choice. We could make log record concurrent safe, but this would decrease the performance even further.

jack-berg · 2024-06-03T17:45:42Z

A LogRecordProcessor may freely modify logRecord for the duration of the OnEmit call. If logRecord is needed after OnEmit returns (i.e. for asynchronous processing) only reads are permitted

From the spec). A processor which asynchronously modifies the ReadWriteLogRecord passed to LogRecordProcessor#onEmit is violating the spec and the behavior is undefined. In practice, some processors will need to make changes asynchronously. These implementations should make a local copy of the log record. I imagine an implementation could take stronger measures to prevent this by doing something to lock the log record after a log record processor has returned, which may be at the expense of performance (not sure because I haven't tried).

This appears to be one of those areas of the spec where we have a possible foot gun. And personally, I'm fine with it. We should have a well paved path that works out of the box for most of the things that users want to do. But there will be advanced users who want to step off that path, and they should have the ability to do things and take responsibility for those things. Its the "we're all adults here" argument.

pellared · 2024-06-03T17:57:00Z

A processor which asynchronously modifies the ReadWriteLogRecord passed to LogRecordProcessor#onEmit is violating the spec and the behavior is undefined.

A processor is synchronously modifying a record which is retained by another processor which retains is and only makes asynchronous reads. This could happen if e.g. one registers (e.g. by mistake) a "retracting processor" after a batch processor.

I think that the specification should say something like:

If the log record is not concurrent safe then OnEmit implementations:

must not modify the record asynchronously

must not be retain the record

must copy the record if it is needed after OnEmit returns

Currently the spec makes it easy to have implementations with race conditions.

jack-berg · 2024-06-03T18:04:50Z

I'm supportive of clarifying the language. But it wouldn't effectively change any implementation would it? At the end of the day, the user is still responsible for not making asynchronous changes to the record. This is just providing better language explaining "why".

pellared · 2024-06-03T18:07:13Z

the user is still responsible for not making asynchronous changes to the record.

The user can make a synchronous change and another processor makes an asynchronous read. This leads to a race condition.

But it wouldn't effectively change any implementation would it?

E.g. you are not making a clone of record here: https://github.com/open-telemetry/opentelemetry-java/blob/d9cef81e298f93b2379aa87d96edacf834fd7fcc/sdk/logs/src/main/java/io/opentelemetry/sdk/logs/export/BatchLogRecordProcessor.java#L92 and the next registered processor can make log record modifications that would lead to a race condition.

MrAlias · 2024-06-03T18:24:40Z

Keep in mind that a user defined processor is not constrained by this specification. Saying they should or shouldn't do something is a meaningless statement. We can only make prescriptive statements about OTel processor behaviors.

Having designs the syntactically restrict or promote good user use or experience is the way you influence 3rd-party use cases.

jack-berg · 2024-06-03T18:26:44Z

E.g. you are not making a clone of record here:

Nah, we're always going to make that clone in otel java. We have different representations for SDK processors vs. exporters. But it could save a non-stable implementation from needing to do something like that. Although I would argue that an implementation doesn't need to make that distinction today - they can safely at the record to the batch processor queue. Any issues with the record being mutated at the same time as the exporter is serializing it is the fault of the processor which violated the spec.

pellared · 2024-06-03T21:07:16Z

If we want consistency with tracing (over performance) for log processors then we should consider requiring the SDK's log record record to be concurrent safe.

EDIT: For reference open-telemetry/opentelemetry-go#5478

pellared · 2024-06-04T13:08:15Z

Leaving some thoughts why I still think that requiring log record mutations to be visible in next registered processors may be a performance+design footgun and why it may be undesired in some languages.

The design juggles several priorities, including performance and conceptual symmetry with prior art in otel (the trace signal in particular).

We should answer what is more important.

Performance or symmetry with tracing?
Symmetry with OTel tracing or language's standard logging library?

A design featuring chaining or decorating processors might pave a clear path for a high performance implementation, it does so at the expense of conceptual symmetry.

In my opinion, there can be application which emits "tons" of logs and performance is more important. Especially that having a similar design to tracing involves overhead that would not be possible to overcome. Some users are already complaining that tracing and metrics involves to many heap allocations (e.g. here) and adding allocations to logs would bring the problem to the next level. Reference: open-telemetry/opentelemetry-go#5478

But there will be advanced users who want to step off that path, and they should have the ability to do things and take responsibility for those things.

More advanced users may want to squeeze as much performance as possible.

From https://opentelemetry.io/community/mission/#we-value-_performance_:

OpenTelemetry users should not have to choose between high-quality telemetry and a performant application. High performance is a requirement for OpenTelemetry, and unexpected interference effects in the host application are unacceptable.

I think that in Go ecosystem having a design which allows zero-heap allocation logging (especially in the hot path) is important. Reference: https://go.dev/blog/slog#performance

A "decorator-based" design would be Go idiomatic as it would follow the design of the standard Go structures logging library: https://pkg.go.dev/log/slog#Handler. Creating pipelines for OTel Go Logs SDK and slog (Go standard library) would be very similar. Notice that it is possible to make a processor which affect others by making a "fan-out" processor.

My rust, c++, and go skills are pretty poor, but there may be creative language-specific ways to keep the door open to zero allocations while also staying within the spirit of the spec.

This is how I believe it can be done in Go: open-telemetry/opentelemetry-go#5470
However, I feel that this design is ugly. It gives the user several options how to propagate changes to other processors (as well how to isolate the changes).

For example, a processor might have a method indicating whether or not it mutates data. If none of the processors mutate data, or if there is only one processor registered, then the SDK could use an alternative implementation of ReadWriteLogRecord which avoids allocations.

In Go, representing a record as interface (to give a possibility to have several implementation) would already decrease the performance (slower calls and more heap allocations; more; I was doing some benchmarks months ago but I hope you will just trust me).

To sum up, I lean towards favoring logging performance and symmetry with structured logging package from the Go standard library over symmetry with OTel tracing design.

pellared · 2024-06-04T15:38:10Z

SIG meeting summary

We have a common agreement that for logs the performance is extremely important as we might expect more load than for e.g. tracing.

It looks that we are leaning towards changing the spec so that SDKs can require chaining processors for making mutations. Processors are responsible for calling the next in the chain which allows the records to stay on the stack. This is also how C++, Rust, Go Logs SDKs are currently designed.

pellared · 2024-06-04T15:40:21Z

changing the spec so that SDKs can require chaining processors for making mutations.

I will be unavailable for about next 3 weeks. I try to make some PR once I am back.

Thanks a lot for your feedback and support.

## Why Fixes #4010 (read the issue for more details) The SDK SHOULD offer a way to make independent log processing. I find it as a missing feature that should be covered by by the specification. ## What This PR tries to achieve it in the following way: - SDK SHOULD provide an isolating processor allowing independent log processing pipelines. Source: #4010 (comment) This approach should be solving the issue at least for Java, .NET, Python, JavaScript, Go. This proposal should not make any existing log SDKs not compliant with the specification. I think that the same approach could be also adopted in tracing for span processors (but I have not yet investigated it in depth). ## Remarks I created a separate issue for specifying whether (by default) a change to a record made in a processor is propagated to next registered processor. See #4065

pellared added the spec:logs Related to the specification/logs directory label May 31, 2024

pellared mentioned this issue May 31, 2024

Add isolating log record processor #4062

Merged

pellared changed the title ~~Define if log record mutations done in record processor SHOUDD be visible to the next registered processors~~ Define if log record mutations done in record processor MUST (or SHOULD) be visible to the next registered processors May 31, 2024

pellared changed the title ~~Define if log record mutations done in record processor MUST (or SHOULD) be visible to the next registered processors~~ Define if log record mutations done in record processor are visible to the next registered processors May 31, 2024

pellared changed the title ~~Define if log record mutations done in record processor are visible to the next registered processors~~ Log record mutations are visible in next registered processors May 31, 2024

pellared self-assigned this May 31, 2024

pellared mentioned this issue May 31, 2024

Log Records passed to log processor OnEmit is non-mutable open-telemetry/opentelemetry-go#5219

Closed

danielgblanco added the triage:deciding:tc-inbox label Jun 3, 2024

This was referenced Jun 3, 2024

[PoC] Change Processor.OnEmit to accept a record pointer open-telemetry/opentelemetry-go#5469

Draft

[PoC] Change Processor.OnEmit to return record passed to next registered processor open-telemetry/opentelemetry-go#5470

Draft

pellared changed the title ~~Log record mutations are visible in next registered processors~~ Proposal: Log record mutations are visible in next registered processors Jun 4, 2024

pellared changed the title ~~Proposal: Log record mutations are visible in next registered processors~~ Log record mutations are visible in next registered processors Jun 4, 2024

pellared mentioned this issue Jun 4, 2024

Log record mutations are visible in next registered processors #4067

Merged

tigrannajaryan added triage:accepted:ready-with-sponsor and removed triage:deciding:tc-inbox labels Jun 5, 2024

carlosalberto closed this as completed in #4067 Sep 4, 2024

carlosalberto closed this as completed in 42dd11a Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log record mutations are visible in next registered processors #4065

Log record mutations are visible in next registered processors #4065

pellared commented May 31, 2024 •

edited

Loading

pellared commented May 31, 2024 •

edited

Loading

jack-berg commented May 31, 2024

pellared commented May 31, 2024 •

edited

Loading

jack-berg commented May 31, 2024

pellared commented May 31, 2024 •

edited

Loading

pellared commented Jun 3, 2024

pellared commented Jun 3, 2024

jack-berg commented Jun 3, 2024

pellared commented Jun 3, 2024 •

edited

Loading

jack-berg commented Jun 3, 2024

pellared commented Jun 3, 2024 •

edited

Loading

MrAlias commented Jun 3, 2024

jack-berg commented Jun 3, 2024

pellared commented Jun 3, 2024 •

edited

Loading

pellared commented Jun 4, 2024 •

edited

Loading

pellared commented Jun 4, 2024

pellared commented Jun 4, 2024 •

edited

Loading

Log record mutations are visible in next registered processors #4065

Log record mutations are visible in next registered processors #4065

Comments

pellared commented May 31, 2024 • edited Loading

pellared commented May 31, 2024 • edited Loading

jack-berg commented May 31, 2024

pellared commented May 31, 2024 • edited Loading

jack-berg commented May 31, 2024

pellared commented May 31, 2024 • edited Loading

pellared commented Jun 3, 2024

pellared commented Jun 3, 2024

jack-berg commented Jun 3, 2024

pellared commented Jun 3, 2024 • edited Loading

jack-berg commented Jun 3, 2024

pellared commented Jun 3, 2024 • edited Loading

MrAlias commented Jun 3, 2024

jack-berg commented Jun 3, 2024

pellared commented Jun 3, 2024 • edited Loading

pellared commented Jun 4, 2024 • edited Loading

pellared commented Jun 4, 2024

pellared commented Jun 4, 2024 • edited Loading

pellared commented May 31, 2024 •

edited

Loading

pellared commented May 31, 2024 •

edited

Loading

pellared commented May 31, 2024 •

edited

Loading

pellared commented May 31, 2024 •

edited

Loading

pellared commented Jun 3, 2024 •

edited

Loading

pellared commented Jun 3, 2024 •

edited

Loading

pellared commented Jun 3, 2024 •

edited

Loading

pellared commented Jun 4, 2024 •

edited

Loading

pellared commented Jun 4, 2024 •

edited

Loading