-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a snorkel_labels_train.xlsx file anywhere? #108
Comments
So this folder only contains sentences that were manually hand labeled for this project. The train version isn't available as it is supposes to consist of all the remaining documents within Pubtator. The following output would be too big of a file for github to host on their LFS (max file is 2GB). Currently, the main way to get those sentences is to download a snapshot of pubtator central and extract those sentences into a database. Otherwise I have a snapshot of the database used for this project that you could import (118GB); however, would need to figure out how to transport that large of a file. Overall recommendation is to use the first option as you would have the most current version for whichever project you are going to work on. |
I was after the hand labelled train/dev/test sentences to bolster my dataset for a similar RE project, not the entire pubtator db. Would it be okay for me to use these and if so, is there a straightforward method to download just these sentences with hand labellings? |
Sure. Can't guarantee that train.xlsx exists or has a lot of sentences annotated but here are the quick links to the available data atm: Compound Treats Disease Train Disease Associates Gene Dev Gene interacts Gene Train Compound binds Gene would take a bit for me to get to you so if you need that let me know. |
So do there not exist handcrafted labels for |
I forgot to upload onto this repository, but here is your request file: |
I'd like to utilise these labels for another project. It seems the folder
should also have
snorkel_labels_train.xlsx
to go along with itstest
anddev
files. Does this exist and if so is there any chance of getting access?The text was updated successfully, but these errors were encountered: