-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add non-coding regions #589
Comments
The plan is to first use minimap2 to align the entire consensus sequence with the whole-genome nucleotide reference, then cut out the sections that align to each gene region, and use the current Gotoh technique to align each of the three reading frames and choose the best match. That lets us keep the benefit of aligning in amino space, but gain the benefit of minimap2 handling the huge deletions we see in the assembled proviral samples. The sections that minimap2 can't align will appear in the genome coverage plots, but not in any gene coverage plots. Coordinate regions that are nucleotide references will just use the minimap2 alignment, without the extra Gotoh step. |
I discussed the problem of overlaps and gaps between minimap2 matches with @cbrumme, and we agreed to use the same arbitrary rule that I currently use in the plots. Any gaps between matches will be labelled as yellow in the coverage plots, and won't be reported in any gene region. |
Also stop reporting rows without data in nuc.csv and amino.csv. Report query position when a row comes from a single contig.
Also stop reporting rows without data in nuc.csv and amino.csv. Report query position when a row comes from a single contig.
Stop highlighting partial-match contigs as unaligned.
Add sorting and filtering to genome alignment.
Still have broken tests in test_aln2counts.py.
Still have more broken tests in test_aln2counts.py.
Updated plan, after discussing with @cbrumme:
|
New version allows larger deletions within a single alignment.
Also upgrade to Ubuntu 20.04 in GitHub Actions, because 16.04 is losing support.
Needed after upgrading Ubuntu to 20.04.
Old Singularity was incompatible with Ubuntu 20.04.
When minimap2 matches don't align to codon boundaries, report the exact nucleotides in nuc.csv that are included in the minimap2 match. |
Add some coordinate sequences that are not coding for proteins. The sequences can be marked as nucleotide sequences.
nuc.csv
, but notamino.csv
.The text was updated successfully, but these errors were encountered: