Hi,

I'm busy with short-read analysis using Galaxy. Today I noticed strange output from the FASTQgroomer tool. I have a file which I upload, its in FASTQ format and looks like this:

@HWI-EAS422_3_1_9_1306#0/1
GGAGAAACCCACATCCTTCTCANTAACCACATTGTG
+
ab`ba`aa```aaa`]aa`aaaB_`a``a^`a__]]
@HWI-EAS422_3_1_9_1544#0/1
ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT
+
aabbba^ab``ab``[aaaa]\B__X__aY`a[`^a
@HWI-EAS422_3_1_9_767#0/1
GGAAGTGGACTGTGGAACACATNGTCTATAAAGCCT
+
a`]]_Xaa^]a]ZVW^_]a`ZaBY^_`^`^^`^[^`
@HWI-EAS422_3_1_9_415#0/1
ATACATTATAGCCCTGCTGTCTNGGTGGTTCTGACT
+
aaaaaaab`aaaaa`_a`YZ`_B`]Y__]___V^^a
@HWI-EAS422_3_1_9_303#0/1
TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG
+
`]\]bbbbbbabaabba`]\]aBabb_aX^```Z_[
@HWI-EAS422_3_1_9_1554#0/1
ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG
+
aabaaaaab_aaaaabaa_\PaBaaaaaa`a`SZZY
@HWI-EAS422_3_1_9_1845#0/1
AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG
+
aaaa`aaaaabT``b_b_aa`YBa[b`P_aa`aa`_



When I perform a FASTQgroomer on it I get the following output:


@HWI-EAS422_3_1_9_1306#0/1
GGAGAAACCCACATCCTTCTCANTAACCACATTGTG
+
BCACBABBAAABBBA>BBABBB#@ABAAB?AB@@>>
@HWI-EAS422_3_1_9_1544#0/1
ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT
+
BBCCCB?BCAABCAA=#@@9@@B:AB>@9BB?>B>;78?@>BA;B#:?@A?A??A?:@@>@@@7??B
@HWI-EAS422_3_1_9_303#0/1
TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG
+
A>=>CCCCCCBCBBCCBA>=>B#BCC@B9?AAA;@<
@HWI-EAS422_3_1_9_1554#0/1
ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG
+
BBCBBBBBC@BBBBBCBB@=1B#BBBBBBABA4;;:
@HWI-EAS422_3_1_9_1845#0/1
AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG
+
BBBBABBBBBC5AAC@C@BBA:#B<=BBBB@BCB023CA:?=4#=B>?B<759(:=5

So the quality scores seem to be getting longer after the conversion. Next thing I did was selecting the "78?@>BA;B#:?@A?A??A?:@@>@@@7??B" and converted it back to the solexa FASTQ format. The resulted sequence was the same as the quality score from the sequence named "@HWI-EAS422_3_1_9_767" from the input file. So my guessing is that the FASTQgroomer script messes up the quality scores. Am I right in this? And if so, is there a way to solve this? I hope the hear from you soon.

Sincerely,


Freerk van Dijk

UMC Groningen
Department of Genetics


De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken van dit bericht, het niet openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht.

The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message.