Hi, I'm busy with short-read analysis using Galaxy. Today I noticed strange output from the FASTQgroomer tool. I have a file which I upload, its in FASTQ format and looks like this: @HWI-EAS422_3_1_9_1306#0/1 GGAGAAACCCACATCCTTCTCANTAACCACATTGTG + ab`ba`aa```aaa`]aa`aaaB_`a``a^`a__]] @HWI-EAS422_3_1_9_1544#0/1 ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT + aabbba^ab``ab``[aaaa]\B__X__aY`a[`^a @HWI-EAS422_3_1_9_767#0/1 GGAAGTGGACTGTGGAACACATNGTCTATAAAGCCT + a`]]_Xaa^]a]ZVW^_]a`ZaBY^_`^`^^`^[^` @HWI-EAS422_3_1_9_415#0/1 ATACATTATAGCCCTGCTGTCTNGGTGGTTCTGACT + aaaaaaab`aaaaa`_a`YZ`_B`]Y__]___V^^a @HWI-EAS422_3_1_9_303#0/1 TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG + `]\]bbbbbbabaabba`]\]aBabb_aX^```Z_[ @HWI-EAS422_3_1_9_1554#0/1 ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG + aabaaaaab_aaaaabaa_\PaBaaaaaa`a`SZZY @HWI-EAS422_3_1_9_1845#0/1 AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG + aaaa`aaaaabT``b_b_aa`YBa[b`P_aa`aa`_ When I perform a FASTQgroomer on it I get the following output: @HWI-EAS422_3_1_9_1306#0/1 GGAGAAACCCACATCCTTCTCANTAACCACATTGTG + BCACBABBAAABBBA>BBABBB#@ABAAB?AB@@>> @HWI-EAS422_3_1_9_1544#0/1 ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT + BBCCCB?BCAABCAA=#@@9@@B:AB>@9BB?>B>;78?@>BA;B#:?@A?A??A?:@@>@@@7??B @HWI-EAS422_3_1_9_303#0/1 TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG + A>=>CCCCCCBCBBCCBA>=>B#BCC@B9?AAA;@< @HWI-EAS422_3_1_9_1554#0/1 ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG + BBCBBBBBC@BBBBBCBB@=1B#BBBBBBABA4;;: @HWI-EAS422_3_1_9_1845#0/1 AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG + BBBBABBBBBC5AAC@C@BBA:#B<=BBBB@BCB023CA:?=4#=B>?B<759(:=5 So the quality scores seem to be getting longer after the conversion. Next thing I did was selecting the "78?@>BA;B#:?@A?A??A?:@@>@@@7??B" and converted it back to the solexa FASTQ format. The resulted sequence was the same as the quality score from the sequence named "@HWI-EAS422_3_1_9_767" from the input file. So my guessing is that the FASTQgroomer script messes up the quality scores. Am I right in this? And if so, is there a way to solve this? I hope the hear from you soon. Sincerely, Freerk van Dijk UMC Groningen Department of Genetics De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken van dit bericht, het niet openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht. The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message.
Hi Freerk, This is indeed strange. However, I have been unable to duplicate this behavior with your sample file. The output I am getting has all sequences and the qualities are the correct length. Did you find that this is reproducible? Also, could you tell me the exact settings you used, and is this on your own Galaxy server, or one of our public servers? Regards, Kelly On Dec 10, 2009, at 9:48 AM, Dijk, F van wrote:
Hi,
I'm busy with short-read analysis using Galaxy. Today I noticed strange output from the FASTQgroomer tool. I have a file which I upload, its in FASTQ format and looks like this:
@HWI-EAS422_3_1_9_1306#0/1 GGAGAAACCCACATCCTTCTCANTAACCACATTGTG + ab`ba`aa```aaa`]aa`aaaB_`a``a^`a__]] @HWI-EAS422_3_1_9_1544#0/1 ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT + aabbba^ab``ab``[aaaa]\B__X__aY`a[`^a @HWI-EAS422_3_1_9_767#0/1 GGAAGTGGACTGTGGAACACATNGTCTATAAAGCCT + a`]]_Xaa^]a]ZVW^_]a`ZaBY^_`^`^^`^[^` @HWI-EAS422_3_1_9_415#0/1 ATACATTATAGCCCTGCTGTCTNGGTGGTTCTGACT + aaaaaaab`aaaaa`_a`YZ`_B`]Y__]___V^^a @HWI-EAS422_3_1_9_303#0/1 TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG + `]\]bbbbbbabaabba`]\]aBabb_aX^```Z_[ @HWI-EAS422_3_1_9_1554#0/1 ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG + aabaaaaab_aaaaabaa_\PaBaaaaaa`a`SZZY @HWI-EAS422_3_1_9_1845#0/1 AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG + aaaa`aaaaabT``b_b_aa`YBa[b`P_aa`aa`_
When I perform a FASTQgroomer on it I get the following output:
@HWI-EAS422_3_1_9_1306#0/1 GGAGAAACCCACATCCTTCTCANTAACCACATTGTG + BCACBABBAAABBBA>BBABBB#@ABAAB?AB@@>> @HWI-EAS422_3_1_9_1544#0/1 ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT + BBCCCB?BCAABCAA=#@@9@@B:AB>@9BB?>B>;78?@>BA;B#:?@A?A??A?:@@>@@@7??B @HWI-EAS422_3_1_9_303#0/1 TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG + A>=>CCCCCCBCBBCCBA>=>B#BCC@B9?AAA;@< @HWI-EAS422_3_1_9_1554#0/1 ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG + BBCBBBBBC@BBBBBCBB@=1B#BBBBBBABA4;;: @HWI-EAS422_3_1_9_1845#0/1 AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG + BBBBABBBBBC5AAC@C@BBA:#B<=BBBB@BCB023CA:?=4#=B>?B<759(:=5
So the quality scores seem to be getting longer after the conversion. Next thing I did was selecting the "78?@>BA;B#:?@A?A?? A?:@@>@@@7??B" and converted it back to the solexa FASTQ format. The resulted sequence was the same as the quality score from the sequence named "@HWI-EAS422_3_1_9_767" from the input file. So my guessing is that the FASTQgroomer script messes up the quality scores. Am I right in this? And if so, is there a way to solve this? I hope the hear from you soon.
Sincerely,
Freerk van Dijk
UMC Groningen Department of Genetics
De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken van dit bericht, het niet openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht.
The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message.
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (2)
-
Dijk, F van
-
Kelly Vincent