Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement receiver/processor/exporter "group" components #18509

Open
tigrannajaryan opened this issue Feb 10, 2023 · 21 comments
Open

Implement receiver/processor/exporter "group" components #18509

tigrannajaryan opened this issue Feb 10, 2023 · 21 comments
Labels
cmd/otelcontribcol otelcontribcol command enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Feb 10, 2023

Is your feature request related to a problem? Please describe.

Collector configuration requires listing receivers at least in 2 places: once in the top-level receivers section and one more time the receivers setting of the pipeline.

This gives flexibility to use the same receiver in more than one pipeline, but complicates the most common use case when the receiver is used in one pipeline only.

Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone (and prevents an interesting use case described below).

Describe the solution you'd like

I would like to have a new group receiver component that can instantiate other receivers. The config for example can look like this:

receivers:
  group/all:
    hostmetrics: ...
    otlp: ....
    redis: ...

service:
  pipelines:
    metrics:
      receivers: [group/all]
      processors: ...
      exporters: ...

With a config like this if I need to add or remove a receiver I only need to do it in one place, under the group/all section.

Functionality the group component will be a significantly simplified version of the existing receivercreator component.

The exact same problem applies to processors and exporters for which equivalent processor group and exporter group component would be also highly desirable.

Additional context

Here is a use case.

We can break down Collector config file into smaller, more manageable config files, where receivers, processors and exporters are in their own files. For example:

# this is the main config file

receivers:
  group/all: ${file: receivers.yaml}

processors:
  group/all: ${file: processors.yaml}

exporters:
  group/all: ${file: exporters.yaml}

service:
  pipelines:
    metrics:
      receivers: [group/all]
      processors: [group/all]
      exporters: [group/all]
# this is receivers.yaml

hostmetrics: ...

otlp: ...

redis: ...
# this is exporters.yaml

otlphttp:
  endpoint: ...
@tigrannajaryan tigrannajaryan added enhancement New feature or request needs triage New item requiring triage labels Feb 10, 2023
@Aneurysm9
Copy link
Member

I like the concept. Would it be possible to achieve this as a config convertor that rewrites the pipeline configuration to expand a set of components? That could make it a zero-cost abstraction at runtime with low startup overhead.

@tigrannajaryan
Copy link
Member Author

I like the concept. Would it be possible to achieve this as a config convertor that rewrites the pipeline configuration to expand a set of components? That could make it a zero-cost abstraction at runtime with low startup overhead.

This is zero-cost abstraction for the receivergroup and a tiny cost for processorgroup and exportergroup (just one for loop going over the list of nested component and a single extra function call per nested component - I think it will be negligible). We can benchmark, but my gut feeling is that the a straightforward, trivial implementation should be good enough.

@atoulme
Copy link
Contributor

atoulme commented Feb 10, 2023

I think this is a great idea. I believe @rmfitzpatrick implemented this approach in our config to allow to create config modules. We can port over/adopt this approach.

@djaglowski
Copy link
Member

I like the idea a lot. What do you think about using group as the component name?

receivers:
  group/all:
    hostmetrics: ...
    otlp: ....
    redis: ...

service:
  pipelines:
    metrics:
      receivers: [group/all]
      processors: ...
      exporters: ...

@atoulme atoulme added cmd/otelcontribcol otelcontribcol command and removed needs triage New item requiring triage labels Feb 12, 2023
@tigrannajaryan
Copy link
Member Author

I like the idea a lot. What do you think about using group as the component name?

I like the shorter name.

Can we have a component named group which is a receiver, a processor or an exporter depending on the context it is used in?

@djaglowski
Copy link
Member

Can we have a component named group which is a receiver, a processor or an exporter depending on the context it is used in?

Yes, all are always in context of one type (receiver/processor/exporter). We have existing examples of overlap such as otlp, though of course this is only receiver/exporter, but the same concept applies to processors.

@tigrannajaryan tigrannajaryan changed the title Implement receivergroup, processorgroup, exportergroup components Implement receiver/processor/exporter "group" components Feb 13, 2023
@tigrannajaryan
Copy link
Member Author

I updated the PR title/description to use group as the component name.

@bogdandrutu
Copy link
Member

Lets avoiding jumping to the implementation details Converter/Proper Component/etc. I think the idea is very good and helpful. Let'a agree for the configuration.

For some reasons I don't like the name "group". It does not signal to me that this is just a "config grouping". I have some questions, maybe that will clarify:

  • Can we reference a receiver from a group in a pipeline or only the whole group?
  • For processors, if we don't allow individual references then what is the order of execution?

@rmfitzpatrick
Copy link
Contributor

rmfitzpatrick commented Feb 15, 2023

Is it fair to say this proposal attempts to solve issues arising from component definitions and in-sequence reference limitations with anchors, aliases, and merge keys? I wonder if a config provider that offers starlark or something like ytt* support could solve this without requiring new components or converter adoption.

@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Feb 16, 2023

  • Can we reference a receiver from a group in a pipeline or only the whole group?

Only the whole group. The group is just a regular component from Service's perspective. The inner structure of the group and the fact that it instantiates other components is invisible from the outside. This is very similar to existing receivercreator component.

  • For processors, if we don't allow individual references then what is the order of execution?

Ahh, that's a good point. I started with receivers and forgot to specify ordering for processors. We can make the config a sequence for group processor, e.g.:

processors:
  group/myprocessors:
    - batch:
        # batch settings
    - attributes:
        # attributes settings

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [group/myprocessors]
      exporters: [otlp]

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Apr 18, 2023
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2023
@tigrannajaryan
Copy link
Member Author

@open-telemetry/collector-contrib-approvers we had some interest from approvers/maintainers in the past. Should we reopen this or keep closed?

@andrzej-stencel
Copy link
Member

In my experience I haven't felt the need for this kind of grouping of components, so I don't see the benefits. On the other hand, I see the disadvantage of making the config more complex and hard to read by regular users, by introducing another level of complexity to it, making it easier to make mistakes and harder to understand resulting error messages. As I said, the benefits might justify it, but I personally don't see them. 😄

If I ever wanted to split configs, it is by functionality. For example, here's how the config is split in the Sumo distro:

Of course I don't aspire to know about all useful use cases in the world. Apparently there are people who find this useful, so this might be implemented, and that's OK. 😄 Just wanted to make sure all opinions are aired.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Sep 5, 2023
@dmitryax dmitryax removed the Stale label Sep 5, 2023
@mx-psi
Copy link
Member

mx-psi commented Sep 6, 2023

Is using the forward connector enough here? For example, rewriting the first post example:

receivers:
  hostmetrics: ...
  otlp: ....
  redis: ...

connector:
  forward/all:

service:
  pipelines:
    metrics/all:
       receivers: [hostmetrics, otlp, redis]
       exporters: [forward/all]
    metrics:
      receivers: [forward/all]
      processors: ...
      exporters: ...

A similar approach can be taken for processors and exporters (processors is a bit more complex). It is a bit uglier/needs a bit more config, but it still gets 80% of the way down what we want (having a way to manage group of components easily in pipelines)

@andrzej-stencel
Copy link
Member

Is using the forward connector enough here?

This doesn't seem to solve the original problem stated in the opening comment:

Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone

@mx-psi
Copy link
Member

mx-psi commented Sep 8, 2023

I was focusing on this part

With a config like this if I need to add or remove a receiver I only need to do it in one place, under the group/all section.

Not saying it gets everything fixed, but it does address this and makes configuration a bit less repetitive

@tigrannajaryan
Copy link
Member Author

Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone

This was indeed my primary problem and the forward connector does not seem to help with that.

@dmitryax dmitryax added the never stale Issues marked with this label will be never staled and automatically removed label Sep 20, 2023
@dmitryax
Copy link
Member

This approach can also be applicable to extensions to solve a use case described in open-telemetry/opentelemetry-collector#8394.

I believe we need a way to define order for all component types, not just processors. Even if the order for other components is not that important, it still defines the collector's behavior like the order of start/shutdown and data exporting. So I think there should be a way to set an order, maybe optional, applying alphabetical order by default.

Defining order as a list makes the overriding of the group difficult. For example, if I want to extend the group by applying another config or by including multiple files with globs, I would need to override the whole list.

Not sure what would be the best solution for defining the order. Maybe another field for that in every group's component, the components will be sorted alphabetically by a value of that field. Or we can use the name of components for that purpose, e.g. processors list would be compiled into [memory_limiter/1, k8sattributes/2, batch/3, transform/4_add_attr_a, transform/5_remove_attr_b]

@djaglowski
Copy link
Member

djaglowski commented Sep 20, 2023

I believe we need a way to define order for all component types, not just processors. Even if the order for other components is not that important, it still defines the collector's behavior like the order of start/shutdown and data exporting. So I think there should be a way to set an order, maybe optional, applying alphabetical order by default.

Pipeline components are currently ordered according to a topological sort. I do not think we can easily or necessarily should worry about ordering the start/stop sequence of pipeline components (except processors, which are naturally ordered in each pipeline due to topological ordering).

I don't understand the motivation for this. What is the reason why the user needs to control this beyond what the service graph already does?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd/otelcontribcol otelcontribcol command enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

No branches or pull requests

9 participants