Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/k8seventsreceiver] support kubernetes leader election #17369

Closed
newly12 opened this issue Jan 4, 2023 · 21 comments
Closed

[receiver/k8seventsreceiver] support kubernetes leader election #17369

newly12 opened this issue Jan 4, 2023 · 21 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed receiver/k8sevents

Comments

@newly12
Copy link
Contributor

newly12 commented Jan 4, 2023

Component(s)

receiver/k8seventsreceiver

Is your feature request related to a problem? Please describe.

The use case is that we need to collector Federated k8s cluster events, however otel collector pods are deployed in sub cluster(s), the idea is to support leader election so pods can be deployed in multiple sub clusters and running in active-standby mode.

Describe the solution you'd like

Support kubernetes leader election

Describe alternatives you've considered

n/a

Additional context

No response

@newly12 newly12 added enhancement New feature or request needs triage New item requiring triage labels Jan 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Mar 6, 2023
@dmitryax
Copy link
Member

dmitryax commented Mar 6, 2023

Hi @newly12, thanks for reporting the issue.

The k8sevents receiver is likely will be deprecated in favor of k8sobjects receiver.

The leader election mechanism is a pretty complicated addition to otel collector. The k8s receivers fetching data from the control plane (k8scluster and k8sobjects) are expected to be deployed as one replica deployment. Can you please elaborate on why it won't work for federated clusters?

@dmitryax dmitryax removed the Stale label Mar 6, 2023
@atoulme atoulme added receiver/k8sevents and removed needs triage New item requiring triage labels Mar 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Pinging code owners for receiver/k8sevents: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@newly12
Copy link
Contributor Author

newly12 commented Mar 13, 2023

Hi @dmitryax it works for federated clusters as well. I guess the question to me is that whether we should, or does it make sense to support HA(or active-standby mode) in otel collector.

In a regular cluster case, collector usually will be deployed as one replica deployment and run in the in-cluster mode, and there will be few seconds data loss during any rollout f.g. image upgrade.

In federated cluster cases, currently collector can only be deployed in one sub cluster and configure the federated cluster API server address, service account token, etc either via configuration or env vars, in case of the sub cluster outage(network issue or something else), there will be data loss.

I think for both cases it make sense to have leader election, like first case, it can be two replica deployment, and when the active pod is shutdown for upgrade, or evicted, the standby pod can start receiving events immediately to reduce the data loss as much as possible. and in second case, it makes more sense to have HA support to avoid single cluster failure.

One available solution is the k8s controller runtime framework, which leverage a lease object(or configmap IIRC) for leader election. I was thinking maybe similar approach can be used, or leverage the framework, I haven't look into k8sevents receiver or k8sobjects receiver code yet, I can't tell how many changes are needed.

@dmitryax
Copy link
Member

Sounds good to me. I agree that having HA option is great. We need this functionality to be reused across all the receivers fetching data from the k8s cluster API. Currently, those are k8scluster, k8sevents, and k8sobjects receivers. Let me know if you have a chance to work on that.

@newly12
Copy link
Contributor Author

newly12 commented Mar 14, 2023

Thanks for the confirmation. I may not have bandwidth to work on this recently, how long could the issue remain open?

@dmitryax
Copy link
Member

Up to half a year. I can also add a label Help Wanted, and maybe we will find someone willing to help sooner

@newly12
Copy link
Contributor Author

newly12 commented Apr 22, 2023

Sounds good, could you please add the tag? Thanks.

@dmitryax dmitryax added the help wanted Extra attention is needed label Apr 22, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 22, 2023
@dmitryax dmitryax removed the Stale label Jun 26, 2023
@JaredTan95
Copy link
Member

JaredTan95 commented Jul 27, 2023

@dmitryax I recently planned to replace our original kube-event-exporter component with k8sevents, but k8sevents is deprecated and to be replaced by k8sobjectsreceiver, so I'd love to support the leader election feature in the otel-col multi-instance/HA scenario in k8sobjectsreceiver.

@subithal
Copy link

@JaredTan95 @newly12 @dmitryax Any plan to implement the HA for k8sclusterreceiver. Another use case i can think of is we are running these workloads as daemon sets and seeing too many duplicate data being gathered. Also could you please suggest any workaround until then?.

Thanks in advance.

@JaredTan95
Copy link
Member

@JaredTan95 @newly12 @dmitryax Any plan to implement the HA for k8sclusterreceiver. Another use case i can think of is we are running these workloads as daemon sets and seeing too many duplicate data being gathered. Also could you please suggest any workaround until then?.

Thanks in advance.

I think that after this PR get merged, we will be able to move forward with other receivers supporting leaderelection very quickly 😄

@subithal
Copy link

@JaredTan95 Any idea when this PR will be merged and available for users.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

github-actions bot commented Jan 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 1, 2024
@JaredTan95 JaredTan95 added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Jan 2, 2024
@TylerHelmuth
Copy link
Member

With #24242 still very much a goal for the future of the k8seventsreceiver, I'd prefer not to add new features like this and instead focus our efforts on the k8sobjectsreceiver. With that in mind I'm going to close this issue as Not Planned. Please ping me if you object.

@TylerHelmuth TylerHelmuth reopened this Jan 8, 2024
@TylerHelmuth TylerHelmuth closed this as not planned Won't fix, can't repro, duplicate, stale Jan 8, 2024
@dmitryax
Copy link
Member

The leader election mechanism should be reusable across any k8s components (maybe not even k8s). There is another issue for supporting this in the k8s cluster receiver, which is more appropriate: #24678. We need to keep at least one issue open regarding this. If anyone is interested, please feel free to help.

@TylerHelmuth
Copy link
Member

I'd prefer we track the feature via the k8sobjectsreceiver instead of the k8seventreceiver. I've opened #32994.

@JaredTan95
Copy link
Member

JaredTan95 commented May 11, 2024

I'd prefer we track the feature via the k8sobjectsreceiver instead of the k8seventreceiver. I've opened #32994.

+1, So, It would be more appropriate for this issue to be closed in order to focus attention on k8sobjectsreceiver

@TylerHelmuth TylerHelmuth closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024
@a-thaler
Copy link
Contributor

FYI: A more generic solution got proposed with #34460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed receiver/k8sevents
Projects
None yet
Development

No branches or pull requests

7 participants