Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework AFSelector to normalize and simplify its locking #146

Conversation

ThePumpingLemma
Copy link
Contributor

Note that this PR is split into three commits; the first two commits are mainly renames to make the implementation more consistent with the Java standard library's implementations. For reviewing, strongly recommend ignoring whitespace changes.

Changes:

  • Place bulk of select0 operations under this and
    publicSelectedKeys locks to reduce locking/unlocking thrash given
    frequency of locking operations
  • Lock publicSelectedKeys instead of selectedKeys since that is what
    clients of AFSelector are locking on
  • Eschew uses of synchronized methods in favor of locking this via
    synchronized blocks
  • Use assertions to check lock invariants

With our PoC at
https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773,
we noticed that there is unbounded growth of file descriptors when
running our simple load test:

```bash
seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process
```

Leading to timeouts from curl and timeout exceptions in jetty:

```
java.util.concurrent.TimeoutException: Idle timeout expired: 32422/30000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
        at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
```

This patch fixes the issue by ensuring that keys with the `OP_INVALID`
bit are still tracked in the selected key set and cancelled accordingly

Changes:
* Add invalid keys to `selectedKeysSet` in `setOpsReady` and mark as
  cancelled, ensuring that the fd is cleaned up timely
* Switch from `cancelNoRemove` to `cancel` now that the registered keys
  are modified separately in the select implementation
* Prefer `!key.isValid()` over `key.hasOpInvalid()` since the former is
  a superset of the latter
* In `implCloseSelector`, cancel the keys before clearing
  `keysRegistered` since `keys()` is backed by `keysRegistered`
* Store cancelled keys in a deque instead of a hash set, which should
  be a little more efficient
* Remove dead code
`processDeregisterQueue`, respectively

This naming convention more closely aligns with the selector
implementations in Java's standard library
* Place bulk of `select0` operations under `this` and
  `publicSelectedKeys` locks to reduce locking/unlocking thrash given
  frequency of locking operations
* Lock `publicSelectedKeys` instead of `selectedKeys` since that is what
  clients of AFSelector are locking on
* Eschew uses of synchronized methods in favor of locking `this` via
  synchronized blocks
* Use assertions to check lock invariants
}
}

private synchronized void consumeAllBytesAfterPoll() throws IOException {
private void consumeAllBytesAfterPoll() throws IOException {
assert Thread.holdsLock(this);
Copy link
Member

@kohlschuetter kohlschuetter Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be assert Thread.holdsLock(publicSelectedKeys) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is interesting. It doesn't actually need to lock on publicSelectedKeys for correctness, but then we would have to lock, drop, then immediately lock on publicSelectedKeys again if we didn't want to consume bytes under the lock. I think we can reorder some of this to reduce what is under the lock.

@@ -110,47 +111,51 @@ public int select() throws IOException {

@SuppressWarnings("PMD.CognitiveComplexity")
private int select0(int timeout) throws IOException {
PollFd pfd;
synchronized (this) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lock on this when we now also lock on publicSelectedKeys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. In the standard lib, they lock on both for most of the implementation, but I think we can push down the lock for selected keys to leave initPollFd only locked under this.

@kohlschuetter
Copy link
Member

kohlschuetter commented Nov 8, 2023

There's a lot going on in AFSelector for sure, and it's not the most readable code. I'm not sure that this PR resolves this issue, though.

As a general rule, try simplifying your change so it's easy to follow what is actually changed. Please do not rename methods/fields just because, it really makes things hard to follow, prevents unrelated changes from merging cleanly, and honestly in this case, it is a matter of consistency and maybe taste, so I don't think we need these particular changes.

I can't remember exactly why, but the synchronized-locking was split around the JNI poll method for a reason. That reason may have been wrong in the first place, or things have changed, but I would want to make sure we don't introduce yet another regression.

What if we focus the change on locking on publicSelectedKeys instead of selectedKeys?

@ThePumpingLemma
Copy link
Contributor Author

There's a lot going on in AFSelector for sure, and it's not the most readable code. I'm not sure that this PR resolves this issue, though.

This feedback is fair.

As a general rule, try simplifying your change so it's easy to follow what is actually changed. Please do not rename methods/fields just because, it really makes things hard to follow, prevents unrelated changes from merging cleanly, and honestly in this case, it is a matter of consistency and maybe taste, so I don't think we need these particular changes.

Okay, I am working on a separate PR with only the locking changes.

I can't remember exactly why, but the synchronized-locking was split around the JNI poll method for a reason. That reason may have been wrong in the first place, or things have changed, but I would want to make sure we don't introduce yet another regression.

This part is odd to me as well. In Java's standard library, calls to native implementations like epoll and kqueue don't drop the locks and, as far as I can tell, it doesn't negatively impact junixsocket either. One thing I did notice is that Java's variants omit the begin/end synchronization when invoking selectNow.

What if we focus the change on locking on publicSelectedKeys instead of selectedKeys?

On it!

@ThePumpingLemma
Copy link
Contributor Author

Okay, I took another pass and this PR only adjusts locking: #147

I think it also addresses your comments with locking on this versus selectedKeysPublic by making what is under the locks meaningfully different.

@kohlschuetter
Copy link
Member

Continuing discussion in #147.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants