Prometheus Metrics #1008

dtp263 · 2022-08-04T18:09:19Z

Proposed Change or Idea

Add prometheus metrics for monitoring the health of a sidecar instance.

Steps:

Add promethus client (possible client lib)
Add Counter increase to error middleware

First Metrics

Standard Prom Metrics
error_count

Additional Information

In order to provide a clean consistent service. You need to be able to easily monitor the health and status of each service.

TarikGul · 2022-08-08T02:55:12Z

It's definitely possible to integrate prometheus into Sidecar as we have the helm setup already intrinsic to the Repo. I am not totally sure on what the workload on integration would look like though - @ArshamTeymouri Any thoughts on this?

ArshamTeymouri · 2022-08-08T08:10:16Z

Hi Tarik!

That's a great idea to have the metrics exposed by the application. For this, we need a new endpoint on the application, say /metrics, that exposes application details in a Prometheus-compatible way. Please check Prometheus docs for more details about how to integrate those metrics in your application.

Besides, two additional endpoints should also be introduced, /readiness and liveness. Those endpoints will expose the general health of the application. e.g. If the app is crashing, the /liveness should return a non-200 HTTP response and if the application is working fine and is just waiting for its backing database to sync, /readiness should be non-200. This way Kubernetes knows whether it needs to restart the pod(application) because of a crash or wait until the application becomes ready to serve requests.

We have already done the same thing for one of the internal tools, Gitspiegel. You can reference its corresponding issue and PR for more details: https://github.com/paritytech/gitspiegel/issues/84

Once those endpoints are in place, I can proceed with configuring Prometheus to scrape the metrics and store them in its database. Later, you would be able to create your own dashboards on Grafana and visualize your application performance with fancy panels.

Feel free to reach me on Matrix if you need anything.

dtp263 · 2022-08-08T17:07:34Z

@ArshamTeymouri I love the idea of adding the readiness and liveness endpoints. But I wonder if it would make more sense to do that in a separate effort.

Implementing and exposing the basic metrics at /metrics seems like a nice objective block of work. Determining liveness / readiness could be a touch more subjective. Opening up the possibility of one blocking the other.

Imod7 · 2023-02-15T11:10:38Z

@ArshamTeymouri I was looking into the issue and I was wondering if it is recommended to run the /metrics endpoint in a different instance of express (different from what the API is running on) and different port.
Maybe this would be a good idea so that the metrics are not exposed to the internet ? I am thinking of cases like the public instances of Sidecar or if another team uses Sidecar and also has public instances, maybe they would not want their metrics to be exposed ?
Another advantage might be that if Sidecar has issues running in its server instance/port, the /metrics endpoint (if it is in a different port) will still be running right? So, then the fact that Sidecar is experiencing issues will be reflected in the corresponding metrics.

ArshamTeymouri · 2023-03-31T08:19:41Z

@Imod7 sorry for the late reply, I was on a long vacation.
I'm afraid if it's possible to run the metrics endpoint in a different instance of API. I think it wouldn't be possible to gather metrics from another process by a sidecar. Most of the applications I know expose their metrics on the same api/port, there are always some ways for sysadmins/devops to secure the /metrics endpoint by acls on the webserver level so no worries for that.
Though, I'd still suggest to check enterprise applications and their implementation as this topic sounds more developer side than infrastructure.

Imod7 · 2023-05-31T18:33:11Z

Current Status of this Issue

The prometheus metrics endpoint is available in a dedicated server & port - [ status : DONE with PR #1232 ]
Add a liveness endpoint [ status : PENDING]
Add a histogram of response times for each HTTP endpoint [ status : DONE with [PR feat: Add route based metrics across API #1465]

PierreBesson · 2023-08-10T16:26:08Z

I integrated metrics support into our substrate node helm-chart (paritytech/helm-charts#284), However I don't see a lot of useful metrics exposed beyond the default nodejs cpu/mem. For example, it would be useful to have histogram of response times for each HTTP endpoint.

Imod7 · 2023-08-10T17:12:23Z

@PierreBesson thank you so much for the feedback! 💯 🙏 Yes, indeed, it does not have a lot of useful metrics right now since my goal through the 1st iteration was to build the base where later we could add any metrics we think are useful. Also, I think it needs a proper research on what metrics would be useful in a rest api and I have not done this (yet!) so your suggestion is a huge help. Please share if you have any other ideas/suggestions/recommendations/links on what I should check. For now, I will add the histogram metric in the pending list (my previous/comment above).

IkerAlus · 2023-11-14T11:31:00Z

what is the status on this one? @Imod7

Imod7 · 2023-11-14T11:45:58Z

what is the status on this one? @Imod7

The status is reflected in this comment which shows :

what was implemented : the addition of the metrics endpoint (with which we get the possibility to add any metrics we want) and one error counter metric.
what is pending : the two metrics suggested in previous comments (if they are still relevant)

Also, this comment shows that further research can be done to come up with other (than the 2 metrics suggested above) metrics that might useful.

filvecchiato · 2024-08-12T07:28:02Z

This PR integrated HTTP metrics for all routes [PR #1465 ]

TarikGul added the I8 - Enhancement Additional feature request label Aug 4, 2022

Imod7 self-assigned this Feb 13, 2023

Imod7 mentioned this issue Feb 15, 2023

feat: add prometheus metrics in dedicated port #1232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Metrics #1008

Prometheus Metrics #1008

dtp263 commented Aug 4, 2022 •

edited

Loading

TarikGul commented Aug 8, 2022

ArshamTeymouri commented Aug 8, 2022

dtp263 commented Aug 8, 2022

Imod7 commented Feb 15, 2023 •

edited

Loading

ArshamTeymouri commented Mar 31, 2023

Imod7 commented May 31, 2023 •

edited by filvecchiato

Loading

PierreBesson commented Aug 10, 2023

Imod7 commented Aug 10, 2023 •

edited

Loading

IkerAlus commented Nov 14, 2023

Imod7 commented Nov 14, 2023

filvecchiato commented Aug 12, 2024

Prometheus Metrics #1008

Prometheus Metrics #1008

Comments

dtp263 commented Aug 4, 2022 • edited Loading

Proposed Change or Idea

Steps:

First Metrics

Additional Information

TarikGul commented Aug 8, 2022

ArshamTeymouri commented Aug 8, 2022

dtp263 commented Aug 8, 2022

Imod7 commented Feb 15, 2023 • edited Loading

ArshamTeymouri commented Mar 31, 2023

Imod7 commented May 31, 2023 • edited by filvecchiato Loading

PierreBesson commented Aug 10, 2023

Imod7 commented Aug 10, 2023 • edited Loading

IkerAlus commented Nov 14, 2023

Imod7 commented Nov 14, 2023

filvecchiato commented Aug 12, 2024

dtp263 commented Aug 4, 2022 •

edited

Loading

Imod7 commented Feb 15, 2023 •

edited

Loading

Imod7 commented May 31, 2023 •

edited by filvecchiato

Loading

Imod7 commented Aug 10, 2023 •

edited

Loading