-
Notifications
You must be signed in to change notification settings - Fork 502
-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FASTq ouput files containing unmapped reads are out-of-order #222
Comments
Hi Christian, this does not happen in my tests, so it has to be parameter or system specific. Cheers |
Possibly it occurs only when multi-threading. I will look for the log file
and send it to you.
On Dec 16, 2016 4:23 PM, "alexdobin" <notifications@github.com> wrote:
Hi Christian,
this does not happen in my tests, so it has to be parameter or system
specific.
Please send me the Log.out file first.
Cheers
Alex
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#222 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABUGR_zEmtjFy3OsqhC2rBX5Y609k3tdks5rIwFrgaJpZM4LOg4W>
.
|
Hi Christian thanks for the files. Nothing suspicious in them, unfortunately.
Could you send me an example of the Unmapped reads with wrong ordering? If the files are too big, please try to reproduce this problem on a small subset of reads (~300k to 1M). Cheers |
1. yes
2. yes
3. yes
I'll send you a subset of the scrambled reads later today.
On Dec 21, 2016 12:52 PM, "alexdobin" <notifications@github.com> wrote:
Hi Christian
thanks for the files. Nothing suspicious in them, unfortunately.
Could you please check a few more things:
1. Does this happen every time you map?
2. Is the number of lines the same in the two Unmapped files?
3. Does it look like the order is screwed up in blocks?
Could you send me an example of the Unmapped reads with wrong ordering? If
the files are too big, please try to reproduce this problem on a small
subset of reads (~300k to 1M).
Cheers
Alex
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#222 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABUGR3ZDcQlOnFLiDyZbMZWkTEwKSG1nks5rKWd7gaJpZM4LOg4W>
.
|
Here a brief excerpt of the two FASTq files. Read IDs start to diverge on
line 3,138,677 of the original files (indicated in yellow):
*Mate1.fastq*
@ERR030880.72819202 00
GTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAG
+
HHHHHHHHHHHHHHHHHEHHHHHHFHHHHHHHHEHHHFHEADA@A#####
@ERR030880.72819207 00
GCTTGAGTAAGCATTTGGCGCATAATCTCGGAAACCTGCTGTTGCTTGGA
+
HHHHGHHHHHHHHHF44554HHFHHHHHHHBFBFBHEHEHHEHEHGEDG:
@ERR030880.72819210 00
CTAAGGCCCAGGCCAGGGCATCTGGAGTCTGAAGGACCCTAGTTCCTAGA
+
HHGHHHHHHEHHHHHHHHHHHHHHHCHEHHHHHHHEHEHHHHGHHHHEHE
@ERR030880.72819219 00
CCGCGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCGGGGCTG
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
@ERR030880.72819220 00
CGAGAGCTCACCGGACGCCGCCGGAACCGCGACGCTTTCCAAGGCACGGG
+
HHHHHHHHHHHHHHHHHHHDHHHHHHHHHGHHHHDHHHHHHHHHHHHGIH
@ERR030880.72819226 00
CCGACCTTAGCTCTCACCATCGCTCTTCTACTATGAACCCCCCTCCCCAT
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGFHHHBEGEED
@ERR030880.72819237 00
CTCTTTTTGAGTCTCATTTTGCATCTCGGCAATCTCTTTCTGATTGTCCA
+
HHHHHHHGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH@HGHHHHHF
@ERR030880.625834 00
CTTGCTCGAAGGAGCCAAAACTTTCTTCAGGGACTTGAGAGCTGTATACG
+
HHHFHHHFHEHHHHH@?A8845544HHHHEHGHHHHHDC>44554;A=DD
@ERR030880.625836 00
AGGTTACCCAAGGCACCCCTCTGACATCCGGCCTGCTTCTTCTCACATGA
+
HHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHEHHHHHHHHHH
@ERR030880.625843 00
CCGAAGCGTTTACTTTGACAAAATTAGAGTGTTCAAACCAGGCCCCAGCC
+
FDFBBFGFFGHHHHHHHHHHHHBHH5544344445@D4DCFFECAGGGEE
@ERR030880.625845 00
GCTAAGATTTTGCGTCGCTGGGTTTGGTTTAATCCACCTCAACTGGCTGC
+
DEFGGEGGCB########################################
@ERR030880.625855 00
TTCTTGGGCAGTGAGAGTTAGTAGTAGAATGTTTAGTAAACCTAGTGGGT
+
HHHHHIIIECGFGCG###################################
@ERR030880.625882 00
GGCCCTTCCCAGACGACCCTAAAGAAGTGAGTCAGACTCTTCTCCCGATT
+
<@@@<>@***@***.***=44544FECAFHDHHHHHHHA4DDDD
@ERR030880.625883 00
CCTCCCCATAAATAGCCCAGGACAGGCGAGGGGCTGCTCAGCCCAACAAA
+
HHHHHHEEHHGCDCGEHHHHGGCAG=><8@A###################
*Mate2.fastq*
@ERR030880.72819202 00
GCAACTTTAATATACGCTATTGGAGCTGGAATTACCGCGGCTGCTGGCAC
+
HHHHHHHHHHHHHHHHHHHHHFHH=GGFGGHHHHHHIHHH##########
@ERR030880.72819207 00
TACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGAGATGCAAAATG
+
H@HHFHFFHHHHEGHH@HHHFHHHCHHGIHHHHGE;@=>@ABBDDHHHEG
@ERR030880.72819210 00
GTCAGTTGCCAAAGCCTCCGATTATGATGGGTATTACTATGAAGAAGATT
+
55555BFFBFD9DF>DADD@DGEGGF>FFFFFBFF<@A@@FFFFB14445
@ERR030880.72819219 00
CCTTACTAAACCATCCAATCGGTAGTAGCGACGGGCGGTGTGTACAAAGG
+
HHHHHHHHHHHHHHHHHHHHHHHHHDHHHHHHHHHHHHDH:AA;A:AA@A
@ERR030880.72819220 00
CGGGCGATGGCCTCCGTTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCA
+
HHHHHHHHHHFHHHHHHHHHHHHHHIHHHDHHHHHHHHHAHGHHHD/A>D
@ERR030880.72819226 00
GTTCATAGTAGAAGAGCGATGGTGAGAGCTAAGGTCGGGGCGGGGTTCCT
+
HHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHFCGG#####
@ERR030880.72819237 00
CGCCGCTGATAAAGGAAAGGATACTCGTGATTATCTTGCTGCTGCATTTC
+
=3F?1FFBFB8;@;DFFEFHEEFA=DDDDDBBBFBHHHHDGDGDB52544
@ERR030880.937712 00
GGGCATCACAGACCTGTTATTGCTCAATCTCGGGTGGCTGAACGCCACTT
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHGFHGHHHHHEHHHHHHHHH
@ERR030880.937718 00
CACGGACTTACATCCTCATTACTATTCTGCCTAGCAAACTCAAACTACCC
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHDHHHHHH
@ERR030880.937720 00
TAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCC
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
@ERR030880.937743 00
ATGGCCTCGCACCCCCAGGCGCGGGGGCCCCCGGGGGGGGGGGGGCAACC
+
HHIHHDD###########################################
@ERR030880.937745 00
TCCCCCCGGAACCCAAAGACTTTGGTTTCCCGGAAGATGCCCGGCGGGTC
+
FHHGHHHHHHHHHHHHHGHGHHHHHGFHHHHGHGHHHBFHHHHHHHHHGH
@ERR030880.937748 00
AGCAGAACATTATTTCCCCATCTTGCTGTTTTCTAGCCTTGAGTTGGGGA
+
HHHHHHHHHHHHHHHH@HHHHHHIIHHHHHHHHHHHHHHHHHHFHGGFGF
@ERR030880.937754 00
CCAAGGCACCCCTCTGACATCCGGCCTGCTTCTTCTCACATGACAAAAAC
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGG
On 21 December 2016 at 20:00, Christian Frech <frech.christian@gmail.com>
wrote:
… 1. yes
2. yes
3. yes
I'll send you a subset of the scrambled reads later today.
On Dec 21, 2016 12:52 PM, "alexdobin" ***@***.***> wrote:
Hi Christian
thanks for the files. Nothing suspicious in them, unfortunately.
Could you please check a few more things:
1. Does this happen every time you map?
2. Is the number of lines the same in the two Unmapped files?
3. Does it look like the order is screwed up in blocks?
Could you send me an example of the Unmapped reads with wrong ordering? If
the files are too big, please try to reproduce this problem on a small
subset of reads (~300k to 1M).
Cheers
Alex
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#222 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABUGR3ZDcQlOnFLiDyZbMZWkTEwKSG1nks5rKWd7gaJpZM4LOg4W>
.
|
I have the same issue when mapping with STAR_2.5.3a |
Hi @yuxinghai I could not reproduce this problem on my system. If you could send me your FASTQ files, Log.out file, and the links to the genome, I can try running it on my system - maybe we get lucky to catch the error this time. Cheers |
We are seeing the same behavior in some samples. Unclear why this is. I attach here one example log out. We are using mapping human, so the fq-files are reasonably big. However, if you want them I can get them to you for the example below. The heads of the fq input looks like this: $ zcat /media/seb/Data/crc/201712/samples_star/CBMFTANXX-2706-219-12-1_S123_L001_R1_001.fastq.trimmed.paired.gz | head -8
@7001326F:122:CBMFTANXX:1:2310:9932:95316 1:N:0:CGCTCATT+CCTATCCT
CTGAAGTCCTTTAGGAGCTTGGACATTTAACTATATCTGCTAGTGTGCAAATCCCCTGACATCCTGGATATTAGTGATGGTTTTGTTGCTCTTCAAATTCAAGGATAAGGATGCACAAGTTACCA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@7001326F:122:CBMFTANXX:1:2310:9970:95337 1:N:0:CGCTCATT+CCTATCCT
GCTCTCCCACCCTGGTCCCTCTTCCTTCAA
+
FFFFFBFFFFFFFFFFBFFFBBBFFFFFFF
$ zcat samples_star/CBMFTANXX-2706-219-12-1_S123_L001_R2_001.fastq.trimmed.paired.gz | head -8
@7001326F:122:CBMFTANXX:1:2310:9932:95316 2:N:0:CGCTCATT+CCTATCCT
ATTCAGCATTATTTCATTGTGATCCAGTTTTTATATGCTTCAGTTAAGCCAGTGAGTTTTTAAATGCGACCAGCATCTGGCAAAATTGTTTCCAGGAAAAATGTTTCCATTGTTGGAAGGATGGT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFBFFBFFFFB
@7001326F:122:CBMFTANXX:1:2310:9970:95337 2:N:0:CGCTCATT+CCTATCCT
GAAGAATAGAGGTCCTCATGGGTCCCTTGAAGGAAGAGGGACCAGG
+
<7FFFFFFFFFFFBBFFFFFFBFBBBFFFFFFFFFFFFBF<FFFFF The heads of the unmapped reads look like this: $ cat CBMFTANXX-2706-219-12-1_S123_L001_Unmapped.out.mate1 | head -8
@7001326F:122:CBMFTANXX:1:2310:10067:95410 01
GCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCAGAACTCCTGGACTCAAGCGATCCTCCAGCCTC
+
BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@7001326F:122:CBMFTANXX:1:2310:12168:95410 01
TGATAGCTTTGCACAGGAAGATTGTGAGTTATTTGCACAGGAGGGCTATGTGTCCTGGACCATAAAGAAAGGCAGACTTACAGCTTATCCACTTTCT
+
B<FFBFFFFFFFBFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFBFFFFFF
$ cat CBMFTANXX-2706-219-12-1_S123_L001_Unmapped.out.mate2 | head -8
@7001326F:122:CBMFTANXX:1:2210:17208:15941 01
TGAGGTCAGGAGCTTGAGACCAGCCTGGCCAACATGGTGAAACCT
+
FFFF<FBFFFFFFFBFFFFFFFFFFFFFFFFFFBFFF7F<FFF<F
@7001326F:122:CBMFTANXX:1:2210:20971:15817 01
CCAGAAATGGTTCTGTGCCAGCTCACTCACTCCCGCTTTCTGGAAAAATGATTGCTTGGCCCGAGGGCTCTGCTCCCTCCCCCAACCCCTC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF Any ideas welcome. Please shout if you want to follow this up and need/want the original files. Great work by the way! Cheers, |
I just ran into this problem with STAR_2.5.2b. It only affects 2 Samples in 40 aligned samples (multi-threaded). |
Hi Seb, Michael, I could never reproduce this problem in my tests. Cheers |
I will compile something but need some time. Hopefully end of the week. |
Hi all, I came across the same problem. Best, |
Hi @zhoujj2013 I have not been able to reproduce the problem on my system. Cheers |
Hi @alexdobin Thanks for your reply. I use static pre-compiled executable with 6 threads (outReadsUnmapped - Fastx, other default paras) for my analysis. Thanks again. Best, |
Hi @zhoujj2013 I think I may have found the bug causing the problem. Cheers |
Hi @alexdobin Thanks a lot. Cheers |
Hi Zhoujj thanks a lot - I will release a new tagged version shortly. Cheers |
Hi @alexdobin, Sorry to bother you about an issue you already fixed, but we would like to know a bit more about this bug to troubleshoot a problem on our calculation nodes. We were using version 2.5.2b and had the same problem as described above. When troubleshooting (before to read this issue), we used a single fastq files and ran the same script several time. We have 4 calculation nodes that should be identical, but we noticed that the problem always occured when using 3 of the nodes and never on the 4th one. We have tested each node about 10 times so it seems unlikely to be just random. After reading this post, we updated to version 2.6.1e and that solved the problem. However, our sys-admin is really worried about the fact that our 4 nodes didn't behave identically, and he thought that maybe if you could give us some indications about the bug, he will have a better idea about what to look for. Thank you very much! |
Hi Alice, this problem was fixed - please try one of the latest releases 2.6.1d or 2.7.2a. Cheers |
Hi Alex, |
Was the fix commit a4fadc5 (Fixed the bug causing inconsistent output for mate1/2 in the Unmapped files.)? |
Thank you very much Paul, If I understand properly, the fix was to move the I guess we have to look for some delay in writing to /local which means that 2 threads are more likely to want to write at the same time on these node (?). Alternatively, it may have been a pure stochastic phenomenon and it's only by chance that we never had problem on 3 of the nodes and always on 3 other nodes (I've tested each node 5 to 10 times which may not be powerfull enough to reach statistical significance)... Thank you very much for your help anyway! All the best, |
When I configure STAR to output unmapped paired-end reads into two FASTq files using the "outReadsUnmapped - Fastx" option, then the resulting files are out-of-order, i.e. mates of the same pair are not always found at the same line number of the two files.
If I take these outputted FASTq files as-is and align it with STAR again, it results in a very high percentage of "reads unmapped: too short" (>80%). If the FASTq files are sorted before alignment, this percentage goes back to normal (~5%).
I'm using STAR version 2.5.1b.
The text was updated successfully, but these errors were encountered: