This workflow integrates all necessary steps to generate weekly lists of biorxiv and medrxiv preprints and downloads PDFs & text for screening with different tools.
- run_scripts.R creates folders for the weekly batch and results and runs all the necessary subscripts
- download current JSON metadata file from biorxiv and generate weekly filtered list
- download PDFs of preprints
- download full text from HTML version for biorxiv preprints (forked from https://github.com/PeterEckmann1/biorxiv-extractor)
- Run ODDPub on preprint full texts
- Retrieve data availibility statements (DAS) fields from medrxiv
- Run ODDPub on the DAS and combine the results
- Run Barzooka - not yet fully integrated into the workflow, as it needs some additional components to run