Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the absence of a bed file the pipeline should run on the full reference genome #39

Closed
mfoll opened this issue Oct 6, 2015 · 3 comments
Assignees
Milestone

Comments

@mfoll
Copy link
Member

mfoll commented Oct 6, 2015

bedtools has a function makewindows that can create a bed from a fasta index file with windows of a given size. For example split the whole genome in 10Mb regions:

bedtools makewindows -g reference.fasta.fai -w 10000000

I could actually also use this function to split the bed instead/in combination with my own R script.

@mfoll mfoll added this to the v0.3 milestone Oct 6, 2015
@mfoll mfoll self-assigned this Oct 6, 2015
@mfoll
Copy link
Member Author

mfoll commented Oct 6, 2015

We could use conditional processes like here

@mfoll
Copy link
Member Author

mfoll commented Oct 8, 2015

The window size should be automatically calculated from nsplit and the total size of the genome It's easy to get from the fai file with Rscript in bash for example to cut the genome in 500:

windows_size=$(Rscript -e "cat(sum(as.numeric(read.table(\"reference.fasta.fai\")[,2]))/500,\"\n\")")
echo $windows_size
6191388

So in nextflow script it will appear as:

windows_size=$(Rscript -e "cat(sum(as.numeric(read.table(\"!{fasta_ref_fai}\")[,2]))/!{params.nsplit,\"\n\")")

@mfoll mfoll assigned tdelhomme and unassigned mfoll Oct 8, 2015
@mfoll
Copy link
Member Author

mfoll commented Dec 11, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants