-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delegate for collecting eventloop tick metrics #2608
Conversation
2bb268d
to
e8301e1
Compare
} | ||
|
||
/// Implement this delegate to receive information about the EventLoop, such as each tick | ||
public protocol EventLoopDelegate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- needs a
NIO
prefix to satisfy the public API guarantees NIO has - I'd call this
NIOEventLoopMetricsDelegate
or something
@@ -76,6 +76,7 @@ public final class MultiThreadedEventLoopGroup: EventLoopGroup { | |||
canEventLoopBeShutdownIndividually: Bool, | |||
selectorFactory: @escaping () throws -> NIOPosix.Selector<NIORegistration>, | |||
initializer: @escaping ThreadInitializer, | |||
delegate: EventLoopDelegate?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this this should be metricsDelegate
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's also odd is that we seem to have one delegate which gets metrics from multiple EventLoops in that EventLoopGroup.
To make this useful to users we should aggregate the metrics from all loops in a group, no? If we don't give the user to stats from all loops collectively, then the user would have to perform the aggregation themselves -- likely with locks across all loops -- which isn't ideal.
@@ -30,6 +30,27 @@ internal func withAutoReleasePool<T>(_ execute: () throws -> T) rethrows -> T { | |||
#endif | |||
} | |||
|
|||
/// Information about an EventLoop tick | |||
public struct EventLoopTickInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- needs
NIO
prefix - lacks identifier for EventLoop
- as said above, I think we should probably periodically give the user one value which contains information from all loops in a group together. Something like
public struct NIOEventLoopMetrics {
public struct LoopMetric {
var loopID: ...
var numberOfTasks: Int
var startTime: NIODeadline
}
public var loopMetrics: [LoopMetric]
}
I know this raises a lot of questions but I think we should think through how a user would use this and then settle on what NIO should provide. Just calling the delegate on each loop for every tick is easy but requires a lot of heavy-lifting on the user's part which seems bound to go wrong, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the trade-off falls the other way: we should offer the lowest level thing, which lets expert users get what they need, first. We can then do the engineering to produce something better. But frankly we've spent years not exposing these metrics because we wanted to do it right, and I'm a bit sick of stumbling around in the dark. So if we can get something good enough cheaply, we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Cory here. Let's do the simplest approach and just expose a delegate that gets called every tick and users have to do the aggregation. Over time we will see what kind of aggregation users are doing and provide pre-backed delegates that give them an even simpler interface
0786fc6
to
1c86a03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally happy with this approach @hamzahrmalik. I've left a few notes in the diff but I think it's a good start. Now we just need some tests!
selectorFactory: @escaping () throws -> NIOPosix.Selector<NIORegistration>) { | ||
precondition(numberOfThreads > 0, "numberOfThreads must be positive") | ||
let initializers: [ThreadInitializer] = Array(repeating: { _ in }, count: numberOfThreads) | ||
self.init(threadInitializers: initializers, | ||
canBeShutDown: canBeShutDown, | ||
threadNamePrefix: threadNamePrefix, | ||
metricsDelegate: metricsDelegate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: indentation.
@@ -30,6 +30,30 @@ internal func withAutoReleasePool<T>(_ execute: () throws -> T) rethrows -> T { | |||
#endif | |||
} | |||
|
|||
/// Information about an EventLoop tick | |||
public struct NIOEventLoopTickInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this Sendable
at the very least. Maybe Hashable
too.
} | ||
|
||
/// Implement this delegate to receive information about the EventLoop, such as each tick | ||
public protocol NIOEventLoopMetricsDelegate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type needs to be Sendable
.
@@ -526,6 +557,7 @@ Further information: | |||
} | |||
|
|||
// Execute all the tasks that were submitted | |||
tasksProcessedInTick += self.tasksCopy.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be a little careful here. It's not really practically possible, but conceptually this addition can trap. We probably want to replace this with an addingReportingOverflow
that, if we overflow, saturates instead.
@@ -547,6 +579,8 @@ Further information: | |||
// Drop everything (but keep the capacity) so we can fill it again on the next iteration. | |||
self.tasksCopy.removeAll(keepingCapacity: true) | |||
} | |||
let tickInfo = NIOEventLoopTickInfo(eventLoopID: ObjectIdentifier(self), numberOfTasks: tasksProcessedInTick, startTime: tickStartTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can optimize this and create the loop ID only once. The optimizer might do this for us, but it's nicer not to rely on it.
e2a4a9a
to
599aeae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically good, just a doc note.
/// Implement this delegate to receive information about the EventLoop, such as each tick | ||
public protocol NIOEventLoopMetricsDelegate: Sendable { | ||
/// Called after a tick has run | ||
/// This function is called after every tick - avoid long-running tasks here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be really clear in this API, add a big warning block that says something like: "This function is called after every event loop tick and on the event loop thread. Any non-trivial work in this function will block the event loop and cause latency increases and performance degradation."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you have a flaky test here:
14:00:14 /swift-nio/Tests/NIOPosixTests/EventLoopMetricsDelegateTests.swift:55: error: EventLoopMetricsDelegateTests.testMetricsDelegateTickInfo : XCTAssertEqual failed: ("2") is not equal to ("1") -
14:00:14 /swift-nio/Tests/NIOPosixTests/EventLoopMetricsDelegateTests.swift:57: error: EventLoopMetricsDelegateTests.testMetricsDelegateTickInfo : XCTAssertEqual failed: ("Optional(1)") is not equal to ("Optional(2)") -
Add delegate for collecting eventloop tick metrics
Motivation:
Users need a way to monitor how eventloops are running, to detect problems such as the eventloop being blocked
Modifications:
Add a delegate to SelectableEventLoop and multithreadedEventLoop which is called on every tick. The delegate is given information about the tick