Provide operational insights on # of triggers per trigger type #3663

tomkerkhove · 2022-09-14T07:33:05Z

Proposal

Based on the current docs, we provide the following metrics today:

keda_metrics_adapter_scaler_error_totals - The total number of errors encountered for all scalers.
keda_metrics_adapter_scaled_object_error_totals - The number of errors that have occurred for each scaled object.
keda_metrics_adapter_scaler_errors - The number of errors that have occurred for each scaler.
keda_metrics_adapter_scaler_metrics_value - The current value for each scaler’s metric that would be used by the HPA in computing the target average.

Based on these, I don't see a good fit to extend them with good labels. Another reason for it is that ScaledObjects/ScaledJobs with multiple triggers of the same type would not be reflected correctly.

That's why I propose to introduce keda_metrics_adapter_trigger_totals metric with a type label that is the type of trigger being used, for example cron.

Use-Case

Gain insights on how many triggers are using a given trigger type.

This helps you better understand your autoscaling landscape and its dependencies.

Anything else?

No response

The text was updated successfully, but these errors were encountered:

v-shenoy · 2022-09-14T08:46:00Z

Would like to hear some thoughts on this from @JorTurFer @zroubalik

JorTurFer · 2022-09-14T16:22:10Z

I think that we can add these metrics about adoption, it's true that from monitoring pov could not give a lot of information, but from adoption pov it gives. I mean, as a cluster operator I could want to know the aggregated values of trigger types to apply some improvements, .e.g: If I only have 1 azure scaler, maybe doesn't make sense to use managed identities.
This also gives useful information about what that cluster is used for.

In general, I don't have any problem with improving the metrics we offer because they are private inside your cluster and we don't collect them so each admin is able to collect them or not.

v-shenoy · 2022-09-14T19:25:58Z

I can work on implementing this.

v-shenoy · 2022-09-26T07:43:26Z

Reposting a query from Slack -

The value for this metric might change every time a ScaledObject / ScaledJob is reconciled, as the triggers for them might be changed by the user. Considering the reconciliation happens in the operator, I am not sure how best to expose the Prometheus metrics through the metrics adapter. Need some guidance on this.

@zroubalik @tomkerkhove @JorTurFer

tomkerkhove · 2022-09-26T08:00:04Z

I'd expose them on the operator instead of the metrics adapter as it's not related to the metric adapter at all. So I'd introduce a new endpoint there

v-shenoy · 2022-09-26T08:02:29Z

Well, if we're okay with exposing a new endpoint on the operator, then it shouldn't be a big task (I think?). Should the metric be named keda_operator_trigger_totals then?

JorTurFer · 2022-09-26T08:35:28Z

Don't expose another endpoint in the operator please, we are already exposing a single endpoint with the runtime-metrics (operator-sdk does it). Recently we have talked about unifying them in the metrics server, so let's do it directly in this case

JorTurFer · 2022-09-26T08:37:22Z

Basically, instead of starting another server, you should register the metrics in the already existing server by prometheus global registry
https://book.kubebuilder.io/reference/metrics.html#publishing-additional-metrics

v-shenoy · 2022-09-26T08:50:36Z

Basically, instead of starting another server, you should register the metrics in the already existing server by prometheus global registry https://book.kubebuilder.io/reference/metrics.html#publishing-additional-metrics

The link mentions this -

You may then record metrics to those collectors from any part of your reconcile loop. These metrics can be evaluated from anywhere in the operator code.

But the reconcile loop is running within a separate pod, so this won't work in our case? Or am I misunderstanding something from the docs?

JorTurFer · 2022-09-26T09:04:46Z

You asked about adding this metric to the operator, there you have reconciliation loops xD
In the metrics-server already have 2 Prometheus metric servers (which we should unify)

v-shenoy · 2022-09-26T09:10:01Z

Oh, so you're suggesting that we should reuse the Prometheus endpoint used for the runtime metrics in the operator pod, instead of exposing another one.

I just got confused and thought that you wanted to not expose the metrics in the operator and expose them in the adapter itself.

tomkerkhove · 2022-09-26T09:13:17Z

Don't expose another endpoint in the operator please, we are already exposing a single endpoint with the runtime-metrics (operator-sdk does it). Recently we have talked about unifying them in the metrics server, so let's do it directly in this case

Agreed, sorry I was more referring to exposing them on the operator rather than introducing a new endpoint.

JorTurFer · 2022-09-26T09:15:20Z

Lost in translation xD
I meant , let's reuse the current Prometheus endpoint in the operator and avoid duplications like in the metrics server

v-shenoy · 2022-09-26T09:30:03Z

Each one of us misunderstood the others 😄

tomkerkhove added needs-discussion feature-request All issues for new features that have not been committed to labels Sep 14, 2022

tomkerkhove added feature All issues for new features that have been committed to operations and removed needs-discussion feature-request All issues for new features that have not been committed to labels Sep 14, 2022

tomkerkhove assigned v-shenoy Sep 14, 2022

This was referenced Sep 27, 2022

feat: Expose trigger totals Prometheus metric in the operator pod #3695

Merged

Add documentation for operator Prometheus metrics. kedacore/keda-docs#945

Merged

JorTurFer closed this as completed in #3695 Sep 29, 2022

zroubalik mentioned this issue Nov 29, 2022

Consolidate all exposed Prometheus Metrics in KEDA Operator #3919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide operational insights on # of triggers per trigger type #3663

Provide operational insights on # of triggers per trigger type #3663

tomkerkhove commented Sep 14, 2022

v-shenoy commented Sep 14, 2022

JorTurFer commented Sep 14, 2022

v-shenoy commented Sep 14, 2022

v-shenoy commented Sep 26, 2022

tomkerkhove commented Sep 26, 2022

v-shenoy commented Sep 26, 2022

JorTurFer commented Sep 26, 2022

JorTurFer commented Sep 26, 2022 •

edited

Loading

v-shenoy commented Sep 26, 2022 •

edited

Loading

JorTurFer commented Sep 26, 2022

v-shenoy commented Sep 26, 2022 •

edited

Loading

tomkerkhove commented Sep 26, 2022

JorTurFer commented Sep 26, 2022 •

edited

Loading

v-shenoy commented Sep 26, 2022

Provide operational insights on # of triggers per trigger type #3663

Provide operational insights on # of triggers per trigger type #3663

Comments

tomkerkhove commented Sep 14, 2022

Proposal

Use-Case

Anything else?

v-shenoy commented Sep 14, 2022

JorTurFer commented Sep 14, 2022

v-shenoy commented Sep 14, 2022

v-shenoy commented Sep 26, 2022

tomkerkhove commented Sep 26, 2022

v-shenoy commented Sep 26, 2022

JorTurFer commented Sep 26, 2022

JorTurFer commented Sep 26, 2022 • edited Loading

v-shenoy commented Sep 26, 2022 • edited Loading

JorTurFer commented Sep 26, 2022

v-shenoy commented Sep 26, 2022 • edited Loading

tomkerkhove commented Sep 26, 2022

JorTurFer commented Sep 26, 2022 • edited Loading

v-shenoy commented Sep 26, 2022

JorTurFer commented Sep 26, 2022 •

edited

Loading

v-shenoy commented Sep 26, 2022 •

edited

Loading

v-shenoy commented Sep 26, 2022 •

edited

Loading

JorTurFer commented Sep 26, 2022 •

edited

Loading