Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running DROP without reference samples #152

Closed
Jakob37 opened this issue Aug 14, 2024 · 7 comments · May be fixed by #147
Closed

Allow running DROP without reference samples #152

Jakob37 opened this issue Aug 14, 2024 · 7 comments · May be fixed by #147
Labels
enhancement New feature or request

Comments

@Jakob37
Copy link

Jakob37 commented Aug 14, 2024

Description of feature

Right now it looks like the drop_sample_annot.py expects a reference.

In this case, I have a large run with 100 samples, which would serve as their own reference.

It does not seem to be supported currently. If not supplying an external reference, DROP will not run.

I am working around this by supplying an empty reference and making some changes to drop_sample_annot.py to not crash it if no data is present in the df. It would be helpful to have an "official" way to do this.

What do you think?

Sorry about the issue bombardment today 🫣

@Jakob37 Jakob37 added the enhancement New feature or request label Aug 14, 2024
@Lucpen
Copy link
Collaborator

Lucpen commented Aug 14, 2024

No worries, we are happy to get issues and improve the pipeline 😄
At the moment Tomte is designed to run only a few samples with an already existing database, not to create one. However, we have been working on modifying the code so that an actual database can be created. Here is the PR, we still need to test it thoroughly but if you want to give it a try, feel free to run it and to make any suggestions on how to improve it.

@Lucpen Lucpen linked a pull request Aug 14, 2024 that will close this issue
10 tasks
@Jakob37
Copy link
Author

Jakob37 commented Aug 14, 2024

No worries, we are happy to get issues and improve the pipeline 😄 At the moment Tomte is designed to run only a few samples with an already existing database, not to create one. However, we have been working on modifying the code so that an actual database can be created. Here is the PR, we still need to test it thoroughly but if you want to give it a try, feel free to run it and to make any suggestions on how to improve it.

OK, great! I'll give it a go. That sounds exactly like what we will need ahead.

Managed to get the DROP run started anyway without any reference db. We will see how that goes ...

@Jakob37 Jakob37 closed this as completed Aug 14, 2024
@Lucpen
Copy link
Collaborator

Lucpen commented Aug 14, 2024

Please, let me know if it works, and if it doesn't it will be better if you restart DROP outside from the pipeline as explained here

@Jakob37
Copy link
Author

Jakob37 commented Aug 15, 2024

Please, let me know if it works, and if it doesn't it will be better if you restart DROP outside from the pipeline as explained here

Thanks for the tips! That will be very helpful.

It made it pretty far (edit: not super far, a little bit), into the Counting_Summary step. Will see if I can figure that out today 🤔

@Jakob37
Copy link
Author

Jakob37 commented Aug 16, 2024

The aberrant expression run went through 🎉 I needed to remove the following cols from the produced sample_annot.tsv file: GENE_COUNTS_FILE, SEX. Otherwise both were produced filled with NA values, which DROP downstream could not handle. Seems its R parsing 'cleverly' translates string "NA" to nan.

I raised an issue in DROP about it: gagneurlab/drop#568

Still running the splicing run. It filled our RAM when running with 64 threads, but seems to be doing fine on a smaller number of threads (12). Might require some further fiddling with the sample_annot.tsv file to get it through downstream steps I guess, we will see.

@Jakob37
Copy link
Author

Jakob37 commented Aug 16, 2024

Have you guys btw considered running OUTRIDER and FRASER2 outside the DROP pipeline? Seems the handful of steps could be lifted over from Snakemake to one or two Nextflow subworkflows. This would make things much cleaner with debugging, resuming caching, less dependencies on DROP..

I realize this would mean considerable extra work to set up, and it might not be feasible. Just a thought!

@Jakob37
Copy link
Author

Jakob37 commented Aug 16, 2024

FRASER2 pipeline ran pretty far, crashed in one of the final FRASER calculation steps. Seems to be a bug only appearing when not using external counts: gagneurlab/drop#558

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants