Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look for large deletions when aligning contigs #480

Closed
5 tasks done
donkirkby opened this issue Oct 2, 2019 · 0 comments
Closed
5 tasks done

Look for large deletions when aligning contigs #480

donkirkby opened this issue Oct 2, 2019 · 0 comments

Comments

@donkirkby
Copy link
Member

donkirkby commented Oct 2, 2019

Some samples have large deletions, and that makes it hard to align their assembled contigs to HXB2 or whatever reference we try to insert the contig into. Sample HIV3428P100IN200-A24-HIV_S11 from the 20 Sep 2019 run is a good example. That sample seems to have a deletion from rev to GP41. An even more challenging example is JRCCC2-GP160NEF-HIV_S78 from the 27 Sep 2019 run. It looks like a mixture of reads with and without a large deletion from before V3LOOP to the 3'LTR. There are two contigs, but one of them aligns to the 5'LTR, where we didn't amplify anything. Winnie says that there are similar sequences in the 5'LTR and 3'LTR, so I think that the large deletion is making the 5'LTR more attractive to the aligner, even though the 3'LTR is a better local match.

The incorrect alignments can also make some coverage scores higher than they should be. For example, D56730-HCV_S12 from 06-Mar-2015.M01841 reports good coverage for NS2 in the assembled version and terrible coverage in the remapped version. That's because it has a huge deletion from core to NS2, and the core coverage gets counted as NS2 in the assembled version. At the very least, we shouldn't be counting the same coverage twice.

A couple of tools worth investigating are BinaryPartialAlign and Joint Read Aligner.

There's a related problem - contigs that wrap around, like INTDEL-HIV_S111 from 7 Dec 2018.

Another related problem - some contigs can get combined in the reverse direction. Reads are recorded in both directions, so contigs are built up without knowing which direction they run. They are combined with other contigs when they have matching regions at the ends, but that may coincidentally be in the reverse direction. See sample 3428P1Y04602-2E5-HIV_S27 from the 21 Jun 2019.M04401 run. The final direction is chosen by looking at which direction has the longest open reading frame (section without stop codons).

  • try minimap2 / mappy
  • try it on contigs that wrap around
  • highlight the regions that didn't actually map to the reference
  • try displaying alignment arrows based on minimap2 instead of BLAST
  • make sure reversed alignment arrow is shown reversed (check alignment.strand?)

Minimap2 has been working very well, but here are some of the other tools we could also try in the future:

  • BinaryPartialAlign
  • Joint Read Aligner
  • see if BWA has any related features
  • Use the BLAST results and Hooke's law to find the general area where we should align before doing a detailed alignment with one of the other tools. (We might need to swap out overlapping BLAST results in the contig to decide which is most consistent with the rest of the contig.)
@donkirkby donkirkby added this to the near future milestone Nov 4, 2019
@donkirkby donkirkby modified the milestones: near future, 7.13 Feb 7, 2020
donkirkby added a commit that referenced this issue Jul 15, 2020
Add a spring tension simulator for deciding where to display contigs.
donkirkby added a commit that referenced this issue Jul 16, 2020
Add link type to genome coverage file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant