Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSockets Next: OTel integration #39143

Open
mkouba opened this issue Mar 4, 2024 · 14 comments · May be fixed by #41956
Open

WebSockets Next: OTel integration #39143

mkouba opened this issue Mar 4, 2024 · 14 comments · May be fixed by #41956
Assignees
Labels

Comments

@mkouba
Copy link
Contributor

mkouba commented Mar 4, 2024

Description

Follow-up of #39142.

Implementation ideas

No response

@michalvavrik
Copy link
Member

hey @brunobat, is this WIP or can I have a look later this / next week? Feel free to share high-level plan if you have one or we could discuss this via DM.

@mkouba
Copy link
Contributor Author

mkouba commented Jun 4, 2024

hey @brunobat, is this WIP or can I have a look later this / next week? Feel free to share high-level plan if you have one or we could discuss this via DM.

The plan is to wait a bit until it's 100% clear what API should we actually use. See also Bruno's Micrometer, Observation API and OpenTelemetry roadmap email.

@michalvavrik
Copy link
Member

michalvavrik commented Jun 4, 2024

The plan is to wait a bit until it's 100% clear what API should we actually use. See also Bruno's Micrometer, Observation API and OpenTelemetry roadmap email.

The email says nothing about WS and neither says description of this issue about waiting. I'm not sure why should integration differ to any other integration with Quarkus REST, gRPC etc. but I'll leave it to experts and let @brunobat handle this issue. Thank you for info

@michalvavrik
Copy link
Member

hey @brunobat in case you want to help with this one when the time is right, ping me

@brunobat
Copy link
Contributor

Created a meeting for next monday with @mkouba and @michalvavrik. If anyone else want to be there, please let me know.

@brunobat
Copy link
Contributor

brunobat commented Jun 17, 2024

About tracing:
We will be creating spans for:

  • Open connection. It can be linked to the request opening the connection.
  • Close connection with the open connection as a link because this is not related with the request that originated the open.

About Metrics:

  • Count open connections
  • Count per target (channel):
    • Server messages count received
    • Server messages count received errors
    • Server messages count bytes sent
    • Server messages count bytes received
    • Client messages count sent
    • Client messages count sent errors
    • Client messages count bytes sent
    • Client messages count bytes received

In relation to the architecture of the solution we should proceed in a similar way we did for reactive messaging.

@geoand
Copy link
Contributor

geoand commented Jul 11, 2024

@brunobat can you explain the tracing part a little more as I don't really understand it?

Thanks

@brunobat
Copy link
Contributor

Sure @geoand. The idea for tracing is that creating spans for individual messages does not make much sense because:

  • They can be really small and each span could be 10x larger...
  • Keep track of the parent for each sent message can be hard
  • The connection is opened in a request and might be kept open for a very long time... Better use a link to correlate with close.

@geoand
Copy link
Contributor

geoand commented Jul 11, 2024

Gotcha, thanks

@michalvavrik
Copy link
Member

status update: got OTel, Micrometer needs more tests. Will continue early next week.

Here is how I understood and implemented Micrometer metrics so far:

Messages count sent

count all responses from client and server endpoints, each broadcasted message is counted as well

Messages count received

counts all received messages by client and server endpoints, each multi item is accounted

Messages count sent errors

counts all errors for all client endpoints, collerate to OnError (I call the metric client errors); I had a problem with original definition:

  • problem with "sent errors" is what does it mean, is it error that occurred when codes are applied on response? or when client connection is processing message and sending fails? or when
  • I think it would require a lot of hooks to define something like that, while what I have ATM is not invasive
  • I am not sure users would understand what that means, while OnError they do understand

Messages count received errors

same as above but for server endpoints (I call the metric server errors)

Messages count bytes sent

counts bytes for messages returned from client / server endpoints

Messages count bytes received

counts bytes for messages received by client / server endpoints

These metrics are global - per all endpoints. It's not hard to add metrics per endpoint at all, but I didn't want to add something that wasn't agreed above. If you understand desired metrics differently than I do, I suggest short meeting would be more effective.

Following metrics I suggest to leave out at the first stage. Maybe we can follow-up with them later:

Messages count bytes received errors

Unclear how to define this metrics. They could be bytes received that we failed to decode, they could be bytes passed to the endpoint that thrown an error. I think it ignores that failure could happen before all the bytes were received (or any). Also that there are other errors when no bytes were received.

Messages count bytes sent errors

Analogous issues.

@geoand
Copy link
Contributor

geoand commented Jul 15, 2024

Thanks for the update @michalvavrik!

I personally don't understand Messages count bytes received errors and Messages count bytes sent errors at all.

@michalvavrik
Copy link
Member

#39143 (comment) wants messages per connection and destination, missed it till I re-read the description now, so my above-mentioned comment is not valid anymore.

@brunobat
Copy link
Contributor

About tracing: We will be creating spans for:

  • Open connection. It can be linked to the request opening the connection.
  • Close connection with the open connection as a link because this is not related with the request that originated the open.

About Metrics:

  • Count open connections

  • Count per target (channel):

    • Server messages count received
    • Server messages count received errors
    • Server messages count bytes sent
    • Server messages count bytes received
    • Client messages count sent
    • Client messages count sent errors
    • Client messages count bytes sent
    • Client messages count bytes received

In relation to the architecture of the solution we should proceed in a similar way we did for reactive messaging.

Updated the required metrics comment

@michalvavrik
Copy link
Member

sounds good, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

4 participants