Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify behaviour of prometheus grafana functionality with old and new metrics endpoint #5501

Closed
3 tasks
mmedenjak opened this issue Jul 18, 2022 · 7 comments · Fixed by #5526
Closed
3 tasks
Assignees
Labels

Comments

@mmedenjak
Copy link
Contributor

With the recent changes to metrics in 22.2, we have two metrics endpoints:

  • /metrics - "old" metrics endpoint, mostly unchanged but now has aggregated metrics which decreases the total number of metrics. Should be backwards compatible.
  • /public_metrics - "new" metrics endpoint, fewer metrics, better documented, but not complete yet for a monitoring solution. We'll be adding new metrics in the future releases.

For further info, see the PRD, the RFC for cardinality reduction and for the new endpoint.

In terms of rpk generate prometheus-config and rpk generate grafana-dashboard, we should:

  • check they are not broken by recent metrics changes on the "old" endpoint (must-have for 22.2.1)
  • see what it would take to add scraping of the new endpoint as well (some small changes can go in 22.2.1, but can also go after and be backported)
  • add new dashboard sections consuming both endpoints (some small changes can go in 22.2.1, but can also go after and be backported)
@mmedenjak mmedenjak changed the title Verify behaviour of prometheus grafana functionality with old and new metrics endpoing Verify behaviour of prometheus grafana functionality with old and new metrics endpoint Jul 18, 2022
@r-vasquez
Copy link
Contributor

r-vasquez commented Jul 19, 2022

rpk generate prometheus-config shouldn't be a problem since the command doesn't use any metric endpoint (/metrics or /public-metrics endpoint).

It uses 3 flags (job-name, node-addrs, seed-addrs) to generate a yaml scrape_config that looks like:

- job_name: redpanda-metrics-test
  static_configs:
  - targets:
    - localhost:9644

@r-vasquez
Copy link
Contributor

By reading the original slack thread it seems that we wanted to change the endpoint in generate prometheus-config but I'm not entirely sure what's needed here, what do you think @BenPope

@BenPope
Copy link
Member

BenPope commented Jul 19, 2022

I don't really know how the scrape config works, but I guess it needs to output something like:

- job_name: redpanda-node
  static_configs:
  - targets:
    - localhost:9644
  metrics_path: /metrics
- job_name: redpanda-node-public
  static_configs:
  - targets:
    - localhost:9644
  metrics_path: /public_metrics

I think @VladLazar had a play with the scrape config on a test cluster.

@BenPope
Copy link
Member

BenPope commented Jul 19, 2022

It might be useful, for customers with multiple clusters, to be able to take another argument that can attach a label:

rpk generate prometheus-config --seed-addr localhost:9092 --job-name redpanda-node --add-label cluster-id:cluster-a

To get

- job_name: redpanda-node
  static_configs:
  - targets:
    - localhost:9644
  labels:
    - cluster-id: cluster-a
  metrics_path: /metrics
- job_name: redpanda-node-public
  static_configs:
  - targets:
    - localhost:9644
  labels:
    - cluster-id: cluster-a
  metrics_path: /public_metrics

@VladLazar
Copy link
Contributor

I think @VladLazar had a play with the scrape config on a test cluster.

I managed to set this up in k8s env, but you don't edit the scrape config directly for that.
What Ben suggested looks about right to me.

@r-vasquez
Copy link
Contributor

r-vasquez commented Jul 19, 2022

To summarize (give me a 👍 if it's correct), 2 changes required for generate prometheus-config:

  1. To include a flag --metrics-path (default to /metrics) to add metrics_path property to the generated yaml.
  2. To include a flag --add-label that receives a list of 'key:values' to add under labels yaml property

CC: @VladLazar @BenPope

@BenPope
Copy link
Member

BenPope commented Jul 19, 2022

Sounds reasonable to me, I think people will also want to be able to scrape both, but they can concat the results of two calls, I guess. It'll simplify the way we document stuff and over time, deprecate the old metrics endpoint.

Alternatively we could have flags --internal-metrics and --public-metrics, which could be combined and produce results for /metrics and /public_metrics respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants