Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Metrics #1008

Open
dtp263 opened this issue Aug 4, 2022 · 11 comments
Open

Prometheus Metrics #1008

dtp263 opened this issue Aug 4, 2022 · 11 comments
Assignees
Labels
I8 - Enhancement Additional feature request

Comments

@dtp263
Copy link

dtp263 commented Aug 4, 2022

Proposed Change or Idea

Add prometheus metrics for monitoring the health of a sidecar instance.

Steps:

  1. Add promethus client (possible client lib)
  2. Add Counter increase to error middleware

First Metrics

Additional Information

In order to provide a clean consistent service. You need to be able to easily monitor the health and status of each service.

@TarikGul TarikGul added the I8 - Enhancement Additional feature request label Aug 4, 2022
@TarikGul
Copy link
Member

TarikGul commented Aug 8, 2022

It's definitely possible to integrate prometheus into Sidecar as we have the helm setup already intrinsic to the Repo. I am not totally sure on what the workload on integration would look like though - @ArshamTeymouri Any thoughts on this?

@ArshamTeymouri
Copy link
Contributor

Hi Tarik!

That's a great idea to have the metrics exposed by the application. For this, we need a new endpoint on the application, say /metrics, that exposes application details in a Prometheus-compatible way. Please check Prometheus docs for more details about how to integrate those metrics in your application.

Besides, two additional endpoints should also be introduced, /readiness and liveness. Those endpoints will expose the general health of the application. e.g. If the app is crashing, the /liveness should return a non-200 HTTP response and if the application is working fine and is just waiting for its backing database to sync, /readiness should be non-200. This way Kubernetes knows whether it needs to restart the pod(application) because of a crash or wait until the application becomes ready to serve requests.

We have already done the same thing for one of the internal tools, Gitspiegel. You can reference its corresponding issue and PR for more details: https://github.com/paritytech/gitspiegel/issues/84

Once those endpoints are in place, I can proceed with configuring Prometheus to scrape the metrics and store them in its database. Later, you would be able to create your own dashboards on Grafana and visualize your application performance with fancy panels.

Feel free to reach me on Matrix if you need anything.

@dtp263
Copy link
Author

dtp263 commented Aug 8, 2022

@ArshamTeymouri I love the idea of adding the readiness and liveness endpoints. But I wonder if it would make more sense to do that in a separate effort.

Implementing and exposing the basic metrics at /metrics seems like a nice objective block of work. Determining liveness / readiness could be a touch more subjective. Opening up the possibility of one blocking the other.

@Imod7 Imod7 self-assigned this Feb 13, 2023
@Imod7
Copy link
Contributor

Imod7 commented Feb 15, 2023

@ArshamTeymouri I was looking into the issue and I was wondering if it is recommended to run the /metrics endpoint in a different instance of express (different from what the API is running on) and different port.
Maybe this would be a good idea so that the metrics are not exposed to the internet ? I am thinking of cases like the public instances of Sidecar or if another team uses Sidecar and also has public instances, maybe they would not want their metrics to be exposed ?
Another advantage might be that if Sidecar has issues running in its server instance/port, the /metrics endpoint (if it is in a different port) will still be running right? So, then the fact that Sidecar is experiencing issues will be reflected in the corresponding metrics.

@ArshamTeymouri
Copy link
Contributor

@Imod7 sorry for the late reply, I was on a long vacation.
I'm afraid if it's possible to run the metrics endpoint in a different instance of API. I think it wouldn't be possible to gather metrics from another process by a sidecar. Most of the applications I know expose their metrics on the same api/port, there are always some ways for sysadmins/devops to secure the /metrics endpoint by acls on the webserver level so no worries for that.
Though, I'd still suggest to check enterprise applications and their implementation as this topic sounds more developer side than infrastructure.

@Imod7
Copy link
Contributor

Imod7 commented May 31, 2023

Current Status of this Issue

  • The prometheus metrics endpoint is available in a dedicated server & port - [ status : DONE with PR #1232 ]
  • Add a liveness endpoint [ status : PENDING]
  • Add a histogram of response times for each HTTP endpoint [ status : DONE with [PR feat: Add route based metrics across API #1465]

@PierreBesson
Copy link
Contributor

I integrated metrics support into our substrate node helm-chart (paritytech/helm-charts#284), However I don't see a lot of useful metrics exposed beyond the default nodejs cpu/mem. For example, it would be useful to have histogram of response times for each HTTP endpoint.

@Imod7
Copy link
Contributor

Imod7 commented Aug 10, 2023

@PierreBesson thank you so much for the feedback! 💯 🙏 Yes, indeed, it does not have a lot of useful metrics right now since my goal through the 1st iteration was to build the base where later we could add any metrics we think are useful. Also, I think it needs a proper research on what metrics would be useful in a rest api and I have not done this (yet!) so your suggestion is a huge help. Please share if you have any other ideas/suggestions/recommendations/links on what I should check. For now, I will add the histogram metric in the pending list (my previous/comment above).

@IkerAlus
Copy link
Contributor

what is the status on this one? @Imod7

@Imod7
Copy link
Contributor

Imod7 commented Nov 14, 2023

what is the status on this one? @Imod7

The status is reflected in this comment which shows :

  • what was implemented : the addition of the metrics endpoint (with which we get the possibility to add any metrics we want) and one error counter metric.
  • what is pending : the two metrics suggested in previous comments (if they are still relevant)

Also, this comment shows that further research can be done to come up with other (than the 2 metrics suggested above) metrics that might useful.

@filvecchiato
Copy link
Contributor

This PR integrated HTTP metrics for all routes [PR #1465 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I8 - Enhancement Additional feature request
Projects
None yet
Development

No branches or pull requests

7 participants