Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove arbitrary concurrency limiter for uplods and replace with a configurable limit that is disabled by default. #642

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TheGrizzlyDev
Copy link
Contributor

The existing limit doesn't really help to achieve the goals it states as it cannot help in conditions where multiple clients are using RE at the same time, which are the common exercise conditions. Since the flag can be used to limit quickly uploads in case of some issues with RBE I opten in for keeping it in some form, by making it a setting, though it likely isn't very useful, so it's disabled by default. Instead we should rely on other control mechanisms like number of connections and TCP or HTTP/2 flow control or, like in this case, explicit errors coming from the RE protocol itself (like RESOURCE_EXHAUSTED or others).

configurable limit that is disabled by default.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 9, 2024
@TheGrizzlyDev TheGrizzlyDev deleted the remove-arbitrary-upload-concurrency-limiter branch May 9, 2024 20:07
@TheGrizzlyDev TheGrizzlyDev restored the remove-arbitrary-upload-concurrency-limiter branch May 10, 2024 10:05
@TheGrizzlyDev TheGrizzlyDev reopened this May 10, 2024
@JakobDegen
Copy link
Contributor

cc @benbrittain who originally added this

The existing limit doesn't really help to achieve the goals it states as it cannot help in conditions where multiple clients are using RE at the same time, which are the common exercise conditions

Yeah, but even if they're the common exercise conditions, it's still pretty easy to imagine that they might not be the failure conditions. Ie you could imagine some server that has one thread per client or something like that and falls over if a client shows up trying to write too many big files at once.

I think there's a nice in-between option to go for though, which both preserves this property and sounds closer to what you actually want: Can we only apply this concurrency to the non-batched uploads, and let batched uploads proceed at unlimited concurrency?

@TheGrizzlyDev
Copy link
Contributor Author

Yeah, but even if they're the common exercise conditions, it's still pretty easy to imagine that they might not be the failure conditions.

Which is why there is an optional configuration to limit concurrency. Any default value would be making likely incorrect assumptions about the implementation and exercise conditions of a RE server and this is just wrong as there are multiple implementations and all of them are wildly different and can run on extremely large distributed clusters with a wide array of different hardware and software specifications. This invariably results in lower performances by default with no real impact. On the other hand if such a limit is necessary for a RE server they can state this setting in their docs. I think this is already a middle ground and brings the potential benefit of limiing concurrency whilst keeping a performant setup by default, wdyt?

@TheGrizzlyDev
Copy link
Contributor Author

friendly ping @JakobDegen :)

@TheGrizzlyDev
Copy link
Contributor Author

Ping :)

@valadez-abel
Copy link

I tested this on our RE setup, and without it the client seemed to be stuck for 40+ minutes on re_upload stages, but with this PR, it's no longer blocked on re_upload.

@zjturner
Copy link
Contributor

zjturner commented Jun 6, 2024

Agree that it would be great to see some forward progress on this. @JakobDegen is there anything you're waiting for?

@TheGrizzlyDev
Copy link
Contributor Author

ping @JakobDegen

@JakobDegen
Copy link
Contributor

@TheGrizzlyDev yeah, sorry, I realized that in my previous message I managed to actually not right any of the things that I was actually most worried about.

The reason that I think this kind of a concurrency limiter makes sense isn't so much the server side of things, but rather the client side. When it comes to batched uploads, or other uploads that are coming from memory and not from disk, I think I'm willing to trust the system network stack to make sure that resources are effectively utilized. But for non-batched uploads, those are generally reading from disk as a part of the request, and limiting concurrency the concurrency of disk accesses in buck2 does seem like something we should do, if for no other reason than that we don't want to run out of file descriptors. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants