Implementing Prometheus Exporter and documentation #534

flaviostutz · 2020-07-24T20:30:50Z

Background

This is a complete initial implementation of Prometheus Exporter support as discussed in #477

Checklist

Git commit messages conform to community standards.
Each Git commit represents meaningful milestones or atomic units of work.
Changed or added code is covered by appropriate tests.

flaviostutz · 2020-07-24T20:31:36Z

This is a continuation of PR #526 but decided to create a new PR to cleanup things for a new review.

flaviostutz · 2020-08-05T23:24:49Z

@tsenart Could you please give a check on this PR?

tsenart · 2020-08-06T09:19:23Z

@flaviostutz: I've been swamped with work but saved some time to look at this on Saturday! Sorry.

flaviostutz · 2020-08-06T12:07:11Z

No worries... I know how it is... here with me it's the same... thanks!

tsenart

First round of review done! Thanks for working on this 🙇

Dockerfile

README.md

lib/prom/prom.go

lib/results_test.go

flaviostutz · 2020-08-28T23:10:05Z

@tsenart could you please take a look at this PR when you have some time? I think I managed to do all the requested changes.

tsenart

Looks good! Just a few leftover changes / inconsistencies. Thank you 🙇

Dockerfile

README.md

lib/prom/prom_test.go

lib/prom/prom.go

tsenart · 2020-09-13T14:04:32Z

lib/prom/prom.go

+ vegeta "github.com/tsenart/vegeta/v12/lib"
+)
+
+//PrometheusMetrics vegeta metrics observer with exposition as Prometheus metrics endpoint


Please do this.

flaviostutz · 2020-10-09T21:35:21Z

@tsenart please take a look at the requested changes.
Sorry for the late reply...

tsenart · 2020-10-11T09:46:31Z

go.sum

+github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
+github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
+github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
+github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=


@flaviostutz: Why is this still here?

I removed all dependencies to testify in prom lib, but in go.sum there is an existing testify dependency in master already and I don't know where it comes from. go.mod doesn't declare it. probably an indirect dependency somewhere (but it already exists before my PR)

onelapahead · 2020-11-03T13:45:56Z

README.md

@@ -803,6 +807,62 @@ $ ulimit -u # processes / threads

 Just pass a new number as the argument to change it.

+## Prometheus Support
+
+Vegeta has a built-in Prometheus Exporter that may be enabled during "attacks" so that you can point any Prometheus instance to Vegeta instances and get some metrics about http requests performance and about the Vegeta process itself.


While I like this for a lot reasons, I think vegeta should be exporting metrics to a Prometheus push gateway rather than being scraped directly by prometheus.

Since vegeta isn't a long lived process you'll have race conditions where Prometheus might not scrape the last bit of attack data before vegeta shuts down.

Prometheus best practices docs say you should use push gateway for "service-level batch jobs" which is what I think vegeta would qualify as:

Usually, the only valid use case for the Pushgateway is for capturing the outcome of a service-level batch job. A "service-level" batch job is one which is not semantically related to a specific machine or job instance (for example, a batch job that deletes a number of users for an entire service). Such a job's metrics should not include a machine or instance label to decouple the lifecycle of specific machines or instances from the pushed metrics. This decreases the burden for managing stale metrics in the Pushgateway. See also the best practices for monitoring batch jobs.

I bring this up, bc I think it should instead have, or at least additionally offer, a report type of prom which will then export metrics from a report to a prometheus push gateway.

I am currently writing that for datadog rather than Prometheus but the idea is the same. The advantage of the report is that one could take an existing Vegeta test result and push into a metrics store rather than having to run a new test.

And the user could publish the results to a Prometheus and a DataDog and wherever else they needed with the reporting decoupled from the attacking.

Note to self: When -prometheus-addr is set and we start an HTTP server for prometheus to scrape, make sure that all metrics are scrapped before exiting from the program, even after the attack finished per se.

ghost · 2020-12-03T04:28:26Z

Hi guys,

Hope you are all well !

Just wanted to know the current status of this PR as it sounds awesome for load testing any webapp.

What is missing to merge it ?

Cheers,
Luc

leosunmo · 2020-12-16T00:51:15Z

As someone who uses Vegeta as a library for testing/benchmarking/loadtesting, this is fantastic! Having to manually implement OpenCensus middleware for HTTP is a bit of a pain.

Just a note, is there a way to customise the latency histogram buckets? It looked like they were hardcoded to me, and since requestSecondsHistogram is not exposed I can't change it.

I can think of two immediate ways of making it customisable. Exposing the prom metrics types to the user, or enabling the user to set it as a "param" upon creating the PrometheusMetrics using NewPrometheusMetricsWithParams.

dntosas · 2021-01-23T06:37:00Z

hola people!

any news on this one? seems this functionality is widely needed and a lot of thanks for your nice job cc @flaviostutz @tsenart

spacez320 · 2021-05-23T23:05:45Z

@hfuss @flaviostutz I'm wondering what the state of the PR is and if it's still possible to implement Prometheus support.

I agree that normally an ad-hoc job like a Vegeta process could benefit from just delivering results to Pushgateway. I also think creating an exporter could be useful if you're going to run Vegeta for a long time and want a simpler integration.

I'm preparing to use Vegeta for some long-running (hours-days) load-testing jobs and wondered if I could help at all with Prometheus integration one way or the other.

Do we think it's still possible to create the exporter?
Should we instead (or also) make it easy to transmit to Pushgateway via vegeta report or something?

daluu · 2023-07-11T01:08:42Z

Depending on use case, I would advise against the PushGateway approach except as a last resort. In my efforts to use it, observed that metrics over the push gateway have no TTL and persist forever, and the metric's value only updates if you send new values over time with the exact same labels/dimensions. If the labels change over time (different instance, host, etc.), you'll just get new metrics instead and both old and new metrics persist in the system over time.

So when say viewing the metrics in grafana, you have a never ending set of timeseries, rather than timeseries with filled lines for whenever metrics are sent to gateway, and missing data on chart for intervals where no data is sent to push gateway. I wanted and expected the latter but got the former. For ad-hoc and periodic load testing, I'd assume people would want the latter. The former is more suitable if say you ran vegeta for monitoring of a test instance that never changes, same service you are monitoring over time.

For what I want to do, if you go with push gateway, you may need something like this forked version instead that offers TTL to the push gateway metrics: https://github.com/dinumathai/pushgateway

So I think the exporter route is good, I think it doesn't have the TTL issue that the push gateway has.

Another option to consider for alternative to exporter and push gateway, is the Prometheus remote write protocol/feature, as a way to push metrics to prometheus. We're looking into and testing the remote write out at our organization, I'm not aware of the specific implementation or how well it's performing at the moment. But I wouldn't know for case of vegeta, which would be more optimal from a performance standpoint since vegeta needs to both generate the load (test) and also expose or push the metrics. https://last9.io/blog/what-is-prometheus-remote-write/

tsenart · 2023-07-11T06:51:09Z

@daluu: Makes sense. I think the Prometheus exporter should then be implemented as a vegeta sub command, like vegeta prom-export, which you'd use with vegeta attack ... | tee results.bin | vegeta prom-export

flaviostutz · 2023-07-18T23:49:07Z

As discussed in #477 we decided to instrument attack because then we don't need to parse stdin and it would be simpler/more performant.

If we decide to change this at this point of the PR actually we should cancel it and start a new branch because a lot of things will be different.

I would recommend us to finish this PR with the initial requirements because it's almost there and if necessary in the future we discuss better about creating a reporter/push gateway/direct write version of it.

What do you think?

tsenart · 2023-07-19T07:03:04Z

Ah, I had forgotten that discussion. It was a long time ago!

Thinking about it now, having the attack command expose a web server handler that a Prometheus instance can scrape is good for performance and reducing moving pieces in distributed attacks.

But for interactive use and debugging past attacks I think we should still have a sub command that exporta saved results as Prometheus metrics.

So, the answer is I think we need both, and they should share as much code as possible. The only difference between doing it in attack and doing it in another sub-command is that in attack we can observe the metric without decoding the result first.

I suggest we introduce a lib/prom package where all of this is encapsulated and modularized.

Also, very much up for hacking on this together. So let me know how you'd want to go about it.

flaviostutz · 2023-07-22T15:02:30Z

Ah, I had forgotten that discussion. It was a long time ago!

Thinking about it now, having the attack command expose a web server handler that a Prometheus instance can scrape is good for performance and reducing moving pieces in distributed attacks.

But for interactive use and debugging past attacks I think we should still have a sub command that exporta saved results as Prometheus metrics.

So, the answer is I think we need both, and they should share as much code as possible. The only difference between doing it in attack and doing it in another sub-command is that in attack we can observe the metric without decoding the result first.

I suggest we introduce a lib/prom package where all of this is encapsulated and modularized.

Also, very much up for hacking on this together. So let me know how you'd want to go about it.

This PR is already creating a lib/prom package. I just reviewed and fixed all comments that were open (after 3y lol!). Please do another review round and mark comments as resolved if they are ok.

Signed-off-by: Flávio Stutz <[email protected]>

flaviostutz · 2023-07-22T16:14:19Z

I squashed the messy commits (various upstream merges and small changes) from my branch into one and applied the PGP signatures so you can merge it to master 😁

tsenart

Thank you for moving this forward! Out of all my comments, let me know what you want to work on. I'm happy to address my own feedback on top of your changes. I'll also implement the prom sub-command that I mentioned.

docker-compose.yml

grafana.json

lib/prom/prom.go

prometheus-sample.png

lib/results_test.go

lib/prom/prom_test.go

…es with P90; minor changes Signed-off-by: Flávio Stutz <[email protected]>

flaviostutz · 2023-07-23T21:32:56Z

@tsenart Pushed everything I changed here. In summary I tried to resolve all comments, so please do another round of checks.

As this PR is too big already, I would advise us to merge it as soon as possible and build the "prom" command in another one.

Closes #477, #534 Signed-off-by: Flávio Stutz <[email protected]> Signed-off-by: Tomás Senart <[email protected]>

Signed-off-by: Tomás Senart <[email protected]>

tsenart · 2023-07-24T17:30:03Z

Thank you for your great effort on this <3 Landed in 81403a6. Opened #637 which I'll work on next!

fasibio · 2024-02-28T17:03:43Z

@daluu #534 (comment)
to this point:
have no TTL and persist forever, and the metric's value only updates if you send new values over time with the exact same labels/dimensions

As Info I use a PushGateway and delete at the end all metrics based by vegta so i see no problem with pushgateway:

import (
	"log"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/push"
	vegeta "github.com/tsenart/vegeta/v12/lib"
	"github.com/tsenart/vegeta/v12/lib/prom"
)


type PushGatewayRegister struct {
	Pusher *push.Pusher
}

// MustRegister implements prometheus.Registerer.
func (p *PushGatewayRegister) MustRegister(c ...prometheus.Collector) {
	for _, v := range c {
		p.Pusher = p.Pusher.Collector(v)
	}
}

// Register implements prometheus.Registerer.
func (p *PushGatewayRegister) Register(c prometheus.Collector) error {
	p.Pusher = p.Pusher.Collector(c)
	return nil
}

// Unregister implements prometheus.Registerer.
func (p *PushGatewayRegister) Unregister(c prometheus.Collector) bool {
	return true
}


func LoadTest() {
      metrics := prom.NewMetrics()

	pusher := push.New("url", "job")
	_ := metrics.Register(&PushGatewayRegister{Pusher: pusher})
       for res := range attacker.Attack(targeter.Targeter, rate, duration, "Big Bang!") {
		metrics.Observe(res)
		err = pusher.Push()
		if err != nil {
			log.Println(fmt.Errorf("push error %w", err))
		}
	}
	_ = pusher.Delete() <- thats the point... all pusher created metrics will be removed from pushgateway
}

but change my mind

daluu · 2024-02-28T19:05:55Z

@fasibio

As Info I use a PushGateway and delete at the end all metrics based by vegta so i see no problem with pushgateway:

As long as that works out in the end. Then yes should be fine. I would advise to please do test that this works as anticipated, by sending some metrics, then stop sending anymore for some time e.g. over 15 minutes to an hour or more, and verifying that grafana (or equivalent data viewer) properly renders the metric as discrete data points for when metrics were sent rather than an extrapolated line that continues through current time even though no more metrics were sent.

Expecting how something works in theory is different from actually validating it with some testing. So as long as this special handling in push gateway has been confirmed to work as expected with testing, then sounds good.

Note, maybe this approach works better when you have more access to interface to the push gateway, the library/interface we were using at the time, I'm not aware if it had a way to "delete" metrics on the push gateway, we only sent metrics to it.

but change my mind

I'm not sure what is meant here, it is a little vague for interpretation. Did you mean to say, unless someone can convince you to change your mind otherwise, pushgateway works fine for you? Or did you mean despite what you mentioned, you have decided to change your mind about using push gateway approach?

fasibio · 2024-03-06T09:37:09Z

@daluu
I use "github.com/prometheus/client_golang/prometheus/push" to handle pusher.Delete() at the end of the test.
And so he remove all created Metrics by the same pusher object.

My Grafana graph looks like this:

As you see only at the time of attack there are data. So i think to advise against pushgateway (see readme) is incorrect.

daluu · 2024-03-06T18:19:06Z

So i think to advise against pushgateway (see readme) is incorrect.

Yes, makes sense, I take back my prior advice, but with the caveat/warning that provided the user has properly utilized the push gateway logic. Because if you omit the delete step at the end, I believe you will run into the concern I previously mentioned, one can try to confirm it. So if we're building a solution and documentation here, need to account for that to ensure proper successful deployment.

@fasibio, curious what made you issue delete at the end of pushing metrics? Somehow you were aware of the need for this (or the issue when you don't delete), or you came across it from trial & error, or found it documented somewhere? Because unless I overlooked the documentation/example code, as far as I can recall I don't recall seeing the documentation or example code indicating to user to issue a delete after pushing out metrics. It's not so intuitive to me how the push gateway was designed, as one would think the push (and pull) model is discrete - you send/poll data, you get data. when no push or poll is occurring, then there should be no data - but the push gateway still holds on to it for continuous forwarding to prometheus if you don't clear it out specifically when the metric values don't change.

fasibio · 2024-03-21T09:18:09Z

@daluu Simple I follow the pushgateway "Use it" (no go specific) and there the CURL delete command is part of.
https://github.com/prometheus/pushgateway?tab=readme-ov-file#use-it .
And that is all.

@tsenart see discussion, might make sense to update readme.

flaviostutz requested review from tsenart and xla as code owners July 24, 2020 20:30

flaviostutz mentioned this pull request Jul 24, 2020

Issue/#477 prometheus #526

Closed

3 tasks

tsenart reviewed Aug 8, 2020

View reviewed changes

flaviostutz added a commit to flaviostutz/vegeta that referenced this pull request Aug 28, 2020

making changes discussed in PR tsenart#534

6c8cf51

xla removed their request for review September 3, 2020 18:06

tsenart previously approved these changes Sep 13, 2020

View reviewed changes

flaviostutz added a commit to flaviostutz/vegeta that referenced this pull request Oct 9, 2020

making changes related to PR tsenart#534 review

c0cae43

flaviostutz dismissed tsenart’s stale review via a4f7d77 October 9, 2020 21:19

tsenart reviewed Oct 11, 2020

View reviewed changes

Jmainguy mentioned this pull request Oct 23, 2020

Dockerfile for helping devs #525

Closed

onelapahead reviewed Nov 3, 2020

View reviewed changes

flaviostutz force-pushed the issue/#477-promlib branch 2 times, most recently from 44c66f3 to 7962a87 Compare July 22, 2023 16:00

Implementing Prometheus Exporter and documentation as in tsenart#477

a998c8e

Signed-off-by: Flávio Stutz <[email protected]>

flaviostutz force-pushed the issue/#477-promlib branch from 7962a87 to a998c8e Compare July 22, 2023 16:12

tsenart reviewed Jul 22, 2023

View reviewed changes

Separating prom http metrics logic from server; updating query exampl…

3b5de89

…es with P90; minor changes Signed-off-by: Flávio Stutz <[email protected]>

tsenart pushed a commit that referenced this pull request Jul 24, 2023

Prometheus Exporter in attack (#534)

81403a6

Closes #477, #534 Signed-off-by: Flávio Stutz <[email protected]> Signed-off-by: Tomás Senart <[email protected]>

tsenart added a commit that referenced this pull request Jul 24, 2023

lib/prom: follow-up fixes to #534

3fd94ee

Signed-off-by: Tomás Senart <[email protected]>

tsenart closed this Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Prometheus Exporter and documentation #534

Implementing Prometheus Exporter and documentation #534

flaviostutz commented Jul 24, 2020

flaviostutz commented Jul 24, 2020

flaviostutz commented Aug 5, 2020

tsenart commented Aug 6, 2020

flaviostutz commented Aug 6, 2020

tsenart left a comment

flaviostutz commented Aug 28, 2020

tsenart left a comment

tsenart Sep 13, 2020

flaviostutz commented Oct 9, 2020

tsenart Oct 11, 2020

flaviostutz Jul 22, 2023

onelapahead Nov 3, 2020

onelapahead Nov 3, 2020

tsenart Jul 22, 2023

ghost commented Dec 3, 2020

leosunmo commented Dec 16, 2020 •

edited

dntosas commented Jan 23, 2021

spacez320 commented May 23, 2021

daluu commented Jul 11, 2023 •

edited

tsenart commented Jul 11, 2023

flaviostutz commented Jul 18, 2023

tsenart commented Jul 19, 2023

flaviostutz commented Jul 22, 2023

flaviostutz commented Jul 22, 2023

tsenart left a comment

flaviostutz commented Jul 23, 2023

tsenart commented Jul 24, 2023

fasibio commented Feb 28, 2024 •

edited

daluu commented Feb 28, 2024

fasibio commented Mar 6, 2024

daluu commented Mar 6, 2024

fasibio commented Mar 21, 2024 •

edited

Implementing Prometheus Exporter and documentation #534

Implementing Prometheus Exporter and documentation #534

Conversation

flaviostutz commented Jul 24, 2020

Background

Checklist

flaviostutz commented Jul 24, 2020

flaviostutz commented Aug 5, 2020

tsenart commented Aug 6, 2020

flaviostutz commented Aug 6, 2020

tsenart left a comment

Choose a reason for hiding this comment

flaviostutz commented Aug 28, 2020

tsenart left a comment

Choose a reason for hiding this comment

tsenart Sep 13, 2020

Choose a reason for hiding this comment

flaviostutz commented Oct 9, 2020

tsenart Oct 11, 2020

Choose a reason for hiding this comment

flaviostutz Jul 22, 2023

Choose a reason for hiding this comment

onelapahead Nov 3, 2020

Choose a reason for hiding this comment

onelapahead Nov 3, 2020

Choose a reason for hiding this comment

tsenart Jul 22, 2023

Choose a reason for hiding this comment

ghost commented Dec 3, 2020

leosunmo commented Dec 16, 2020 • edited

dntosas commented Jan 23, 2021

spacez320 commented May 23, 2021

daluu commented Jul 11, 2023 • edited

tsenart commented Jul 11, 2023

flaviostutz commented Jul 18, 2023

tsenart commented Jul 19, 2023

flaviostutz commented Jul 22, 2023

flaviostutz commented Jul 22, 2023

tsenart left a comment

Choose a reason for hiding this comment

flaviostutz commented Jul 23, 2023

tsenart commented Jul 24, 2023

fasibio commented Feb 28, 2024 • edited

daluu commented Feb 28, 2024

fasibio commented Mar 6, 2024

daluu commented Mar 6, 2024

fasibio commented Mar 21, 2024 • edited

leosunmo commented Dec 16, 2020 •

edited

daluu commented Jul 11, 2023 •

edited

fasibio commented Feb 28, 2024 •

edited

fasibio commented Mar 21, 2024 •

edited