Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the way LLM eval runs in the background #6

Open
kasnerz opened this issue Jun 10, 2024 · 1 comment
Open

Improve the way LLM eval runs in the background #6

kasnerz opened this issue Jun 10, 2024 · 1 comment
Labels
enhancement New feature or request low priority Tasks which can be postponed

Comments

@kasnerz
Copy link
Collaborator

kasnerz commented Jun 10, 2024

We currently have no specialized solution whatsover for running LLM evals in the background.

After receiving the request, we simply start to iterate over the examples to annotate on the backend. We check for the running flag at each iteration of the loop and stop if the flag is set to False.

Surprisingly, this seems to work quite ok so far, probably since Flask takes care of threading.

However, it seems to be too YOLO. I also expect it not to work robustly, especially if users start launching multiple tasks at once.

At first, I also tried using Python threads manually in the code, something along the lines of:

thread = Thread(target=utils.run_llm_eval, args=(app, campaign_id))
thread.daemon = True
thread.start()

But that actually rendered the frontend unresponsive (I might have just messed it up, though). In any case, implementing a more principled solution would be much appreciated.

@kasnerz kasnerz added enhancement New feature or request help wanted Extra attention is needed labels Jun 10, 2024
@kasnerz kasnerz added low priority Tasks which can be postponed and removed help wanted Extra attention is needed labels Jul 25, 2024
@oplatek
Copy link
Member

oplatek commented Aug 1, 2024

I would rather switch to async(io/HTTP), where we can wait for thousands of responses without affecting the server performance. We always delegate the heavy work to another server, and I think this works well for us.

If you worry about scaling the worker nodes, I would start solving it only once the waiting times for the users are too bad in some usecase. Somebody had to simulate it first. Personally, I think I will never need it in factgenie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low priority Tasks which can be postponed
Projects
None yet
Development

No branches or pull requests

2 participants