Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide operational insights on # of triggers per trigger type #3663

Closed
tomkerkhove opened this issue Sep 14, 2022 · 14 comments · Fixed by #3695
Closed

Provide operational insights on # of triggers per trigger type #3663

tomkerkhove opened this issue Sep 14, 2022 · 14 comments · Fixed by #3695
Assignees
Labels
feature All issues for new features that have been committed to operations

Comments

@tomkerkhove
Copy link
Member

Proposal

Based on the current docs, we provide the following metrics today:

  • keda_metrics_adapter_scaler_error_totals - The total number of errors encountered for all scalers.
  • keda_metrics_adapter_scaled_object_error_totals - The number of errors that have occurred for each scaled object.
  • keda_metrics_adapter_scaler_errors - The number of errors that have occurred for each scaler.
  • keda_metrics_adapter_scaler_metrics_value - The current value for each scaler’s metric that would be used by the HPA in computing the target average.

Based on these, I don't see a good fit to extend them with good labels. Another reason for it is that ScaledObjects/ScaledJobs with multiple triggers of the same type would not be reflected correctly.

That's why I propose to introduce keda_metrics_adapter_trigger_totals metric with a type label that is the type of trigger being used, for example cron.

Use-Case

Gain insights on how many triggers are using a given trigger type.

This helps you better understand your autoscaling landscape and its dependencies.

Anything else?

No response

@tomkerkhove tomkerkhove added needs-discussion feature-request All issues for new features that have not been committed to labels Sep 14, 2022
@v-shenoy
Copy link
Contributor

Would like to hear some thoughts on this from @JorTurFer @zroubalik

@JorTurFer
Copy link
Member

I think that we can add these metrics about adoption, it's true that from monitoring pov could not give a lot of information, but from adoption pov it gives. I mean, as a cluster operator I could want to know the aggregated values of trigger types to apply some improvements, .e.g: If I only have 1 azure scaler, maybe doesn't make sense to use managed identities.
This also gives useful information about what that cluster is used for.

In general, I don't have any problem with improving the metrics we offer because they are private inside your cluster and we don't collect them so each admin is able to collect them or not.

@v-shenoy
Copy link
Contributor

I can work on implementing this.

@tomkerkhove tomkerkhove added feature All issues for new features that have been committed to operations and removed needs-discussion feature-request All issues for new features that have not been committed to labels Sep 14, 2022
@v-shenoy
Copy link
Contributor

Reposting a query from Slack -

The value for this metric might change every time a ScaledObject / ScaledJob is reconciled, as the triggers for them might be changed by the user. Considering the reconciliation happens in the operator, I am not sure how best to expose the Prometheus metrics through the metrics adapter. Need some guidance on this.

@zroubalik @tomkerkhove @JorTurFer

@tomkerkhove
Copy link
Member Author

I'd expose them on the operator instead of the metrics adapter as it's not related to the metric adapter at all. So I'd introduce a new endpoint there

@v-shenoy
Copy link
Contributor

Well, if we're okay with exposing a new endpoint on the operator, then it shouldn't be a big task (I think?). Should the metric be named keda_operator_trigger_totals then?

@JorTurFer
Copy link
Member

Don't expose another endpoint in the operator please, we are already exposing a single endpoint with the runtime-metrics (operator-sdk does it). Recently we have talked about unifying them in the metrics server, so let's do it directly in this case

@JorTurFer
Copy link
Member

JorTurFer commented Sep 26, 2022

Basically, instead of starting another server, you should register the metrics in the already existing server by prometheus global registry
https://book.kubebuilder.io/reference/metrics.html#publishing-additional-metrics

@v-shenoy
Copy link
Contributor

v-shenoy commented Sep 26, 2022

Basically, instead of starting another server, you should register the metrics in the already existing server by prometheus global registry https://book.kubebuilder.io/reference/metrics.html#publishing-additional-metrics

The link mentions this -

You may then record metrics to those collectors from any part of your reconcile loop. These metrics can be evaluated from anywhere in the operator code.

But the reconcile loop is running within a separate pod, so this won't work in our case? Or am I misunderstanding something from the docs?

@JorTurFer
Copy link
Member

You asked about adding this metric to the operator, there you have reconciliation loops xD
In the metrics-server already have 2 Prometheus metric servers (which we should unify)

@v-shenoy
Copy link
Contributor

v-shenoy commented Sep 26, 2022

Oh, so you're suggesting that we should reuse the Prometheus endpoint used for the runtime metrics in the operator pod, instead of exposing another one.

I just got confused and thought that you wanted to not expose the metrics in the operator and expose them in the adapter itself.

@tomkerkhove
Copy link
Member Author

Don't expose another endpoint in the operator please, we are already exposing a single endpoint with the runtime-metrics (operator-sdk does it). Recently we have talked about unifying them in the metrics server, so let's do it directly in this case

Agreed, sorry I was more referring to exposing them on the operator rather than introducing a new endpoint.

@JorTurFer
Copy link
Member

JorTurFer commented Sep 26, 2022

Lost in translation xD
I meant , let's reuse the current Prometheus endpoint in the operator and avoid duplications like in the metrics server

@v-shenoy
Copy link
Contributor

Each one of us misunderstood the others 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature All issues for new features that have been committed to operations
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants