-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look for large deletions when aligning contigs #480
Labels
Milestone
Comments
donkirkby
added a commit
that referenced
this issue
Jul 13, 2020
donkirkby
added a commit
that referenced
this issue
Jul 15, 2020
Add a spring tension simulator for deciding where to display contigs.
donkirkby
added a commit
that referenced
this issue
Jul 16, 2020
Add link type to genome coverage file.
donkirkby
added a commit
that referenced
this issue
Jul 16, 2020
donkirkby
added a commit
that referenced
this issue
Jul 17, 2020
donkirkby
added a commit
that referenced
this issue
Jul 20, 2020
3 tasks
donkirkby
added a commit
that referenced
this issue
Jul 21, 2020
donkirkby
added a commit
that referenced
this issue
Jul 21, 2020
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Some samples have large deletions, and that makes it hard to align their assembled contigs to HXB2 or whatever reference we try to insert the contig into. Sample HIV3428P100IN200-A24-HIV_S11 from the 20 Sep 2019 run is a good example. That sample seems to have a deletion from rev to GP41. An even more challenging example is JRCCC2-GP160NEF-HIV_S78 from the 27 Sep 2019 run. It looks like a mixture of reads with and without a large deletion from before V3LOOP to the 3'LTR. There are two contigs, but one of them aligns to the 5'LTR, where we didn't amplify anything. Winnie says that there are similar sequences in the 5'LTR and 3'LTR, so I think that the large deletion is making the 5'LTR more attractive to the aligner, even though the 3'LTR is a better local match.
The incorrect alignments can also make some coverage scores higher than they should be. For example, D56730-HCV_S12 from 06-Mar-2015.M01841 reports good coverage for NS2 in the assembled version and terrible coverage in the remapped version. That's because it has a huge deletion from core to NS2, and the core coverage gets counted as NS2 in the assembled version. At the very least, we shouldn't be counting the same coverage twice.
A couple of tools worth investigating are BinaryPartialAlign and Joint Read Aligner.
There's a related problem - contigs that wrap around, like INTDEL-HIV_S111 from 7 Dec 2018.
Another related problem - some contigs can get combined in the reverse direction. Reads are recorded in both directions, so contigs are built up without knowing which direction they run. They are combined with other contigs when they have matching regions at the ends, but that may coincidentally be in the reverse direction. See sample 3428P1Y04602-2E5-HIV_S27 from the 21 Jun 2019.M04401 run. The final direction is chosen by looking at which direction has the longest open reading frame (section without stop codons).
alignment.strand
?)Minimap2 has been working very well, but here are some of the other tools we could also try in the future:
The text was updated successfully, but these errors were encountered: