-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up entity linking process #90
Comments
Hi @manandey sorry for not getting back to you - we are actually investigating how to tag a (different) large corpus so the question is relevant. Did you make progress yourself on this? |
Hi @arjenpdevries, thanks a lot for your response! Using batching, did increase the speed up to a certain extent, but that might still be quite slow if we try to process a large corpus. I will keep the issue open since you are already investigating it. :) It would be a great help if you come up with some suggestions on this. Thanks again! |
This issue has not seen recent activity |
This issue has not seen recent activity |
This issue has not seen recent activity |
Hi,
I want to do entity linking on a large subset of the
c4/en
dataset. Since in the current settings, I am able to extract entities for around ~1000 rows/hr using CPU and around 5000 rows/hr using GPU. Is there any way be it batching/multiprocessing or any other suggestions from your side to try out to speed up the process taking into account the size of the c4 dataset? Any advice would be highly appreciated. I was also eager to know if the data is currently stored in memory? Thanks!The text was updated successfully, but these errors were encountered: