Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the way chromosome length is calculated #34

Closed
mfoll opened this issue Oct 1, 2015 · 2 comments
Closed

Change the way chromosome length is calculated #34

mfoll opened this issue Oct 1, 2015 · 2 comments
Assignees
Milestone

Comments

@mfoll
Copy link
Member

mfoll commented Oct 1, 2015

I just discovered that the size of chromosomes/contigs is written in the faidx index of the fasta reference file (see HTSlib). So my script to calculate that is useless (and very slow for nothing). I should rather simply use the faidx.

@mfoll mfoll added this to the v0.2 milestone Oct 1, 2015
@mfoll mfoll self-assigned this Oct 1, 2015
@mfoll
Copy link
Member Author

mfoll commented Oct 1, 2015

This is creating the same header to put in the VCF as fasta2contigvcf.awk reference.fasta.fai but much faster (instantaneous instead of ~4min):

cat reference.fasta.fai | cut -f1,2 | sed -e 's/^/##contig=<ID=/' -e 's/\t/,length=/' -e 's/$/>/'

Note that '\t' doesn't work on Mac OS sed (see here), so use a real tab in the script.

@mfoll
Copy link
Member Author

mfoll commented Oct 1, 2015

I also no longer need the less package in the Dockerfile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant