Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

softclip reads PacBio WGS #6

Closed
indapa opened this issue Aug 16, 2024 · 1 comment
Closed

softclip reads PacBio WGS #6

indapa opened this issue Aug 16, 2024 · 1 comment

Comments

@indapa
Copy link

indapa commented Aug 16, 2024

Hi @santiago-es I"m new to telomere biology and trying to link that to what I see in my reads in IGV.

I extracted some telomere region reads from HG002 WGS available on the PacBio website. One of the reads processed has the following cigar:

m84011_220902_175841_s1/252383346/ccs	2048	chr1	10249	25	3048S6=1D14=1I6=1I15=1X5=1X5=1X5=1X5=1X6=1I2=1I6=1D22=1I16=1I22=49I64=2I14=26528S

As you can see, this is a supplementary alignment (2048) and the CIGAR starts would with ~3k softclipped bases. I aligned these reads to T2T ref with the addtional fastas as described in the readme. Do the soft clip bases represent the difference in telomere length between HG002 and the reference? I was able to follow the logic of the code with the help of the python debugger in VSCode and understand how the regular expression code works, but I'm just trying to sync what I see in the BAM file to biology.

Thanks for the great work!

@santiago-es
Copy link
Owner

Good question! My understanding of soft bases is that they are bases which ARE part of the read sequence in the bam alignment file, but do NOT contribute to the alignment as viewed by IGV.

Given that interpretation, I think in the case of telomere differences with the reference, it would make sense that soft clipped bases beyond the telomere terminus would be "extra" telomere in the read vs the reference. That said, softclipping can occur for many reasons and is not restricted to the telomere or satellite regions, a deeper explanation than that would be above my head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants