Hi,
I'm busy with short-read analysis using Galaxy. Today I noticed strange output from the FASTQgroomer tool. I have a file which I upload, its in FASTQ format and looks like this:
@HWI-EAS422_3_1_9_1306#0/1
GGAGAAACCCACATCCTTCTCANTAACCACATTGTG
+
ab`ba`aa```aaa`]aa`aaaB_`a``a^`a__]]
@HWI-EAS422_3_1_9_1544#0/1
ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT
+
aabbba^ab``ab``[aaaa]\B__X__aY`a[`^a
@HWI-EAS422_3_1_9_767#0/1
GGAAGTGGACTGTGGAACACATNGTCTATAAAGCCT
+
a`]]_Xaa^]a]ZVW^_]a`ZaBY^_`^`^^`^[^`
@HWI-EAS422_3_1_9_415#0/1
ATACATTATAGCCCTGCTGTCTNGGTGGTTCTGACT
+
aaaaaaab`aaaaa`_a`YZ`_B`]Y__]___V^^a
@HWI-EAS422_3_1_9_303#0/1
TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG
+
`]\]bbbbbbabaabba`]\]aBabb_aX^```Z_[
@HWI-EAS422_3_1_9_1554#0/1
ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG
+
aabaaaaab_aaaaabaa_\PaBaaaaaa`a`SZZY
@HWI-EAS422_3_1_9_1845#0/1
AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG
+
aaaa`aaaaabT``b_b_aa`YBa[b`P_aa`aa`_
When I perform a FASTQgroomer on it I get the following output:
@HWI-EAS422_3_1_9_1306#0/1
GGAGAAACCCACATCCTTCTCANTAACCACATTGTG
+
BCACBABBAAABBBA>BBABBB#@ABAAB?AB@@>>
@HWI-EAS422_3_1_9_1544#0/1
ATCAAAGCAGACAAAGCACCTTNTATGTCTGCATTT
+
BBCCCB?BCAABCAA=#@@9@@B:AB>@9BB?>B>;78?@>BA;B#:?@A?A??A?:@@>@@@7??B
@HWI-EAS422_3_1_9_303#0/1
TGAGACTCACTTGAACCTGGGANGCAGAGGCTGCAG
+
A>=>CCCCCCBCBBCCBA>=>B#BCC@B9?AAA;@<
@HWI-EAS422_3_1_9_1554#0/1
ACAGTATGCTTCACGAATTTGCNTTTCATCCCTGTG
+
BBCBBBBBC@BBBBBCBB@=1B#BBBBBBABA4;;:
@HWI-EAS422_3_1_9_1845#0/1
AGAAACCTCAGGAATCACAAACNTTAGTTTTTACAG
+
BBBBABBBBBC5AAC@C@BBA:#B<=BBBB@BCB023CA:?=4#=B>?B<759(:=5
So the quality scores seem to be getting longer after the conversion. Next thing I did was selecting the "78?@>BA;B#:?@A?A??A?:@@>@@@7??B" and converted it back to the solexa FASTQ format. The resulted sequence was the same as the quality score from the sequence named "@HWI-EAS422_3_1_9_767" from the input file. So my guessing is that the FASTQgroomer script messes up the quality scores. Am I right in this? And if so, is there a way to solve this? I hope the hear from you soon.
Sincerely,
Freerk van Dijk
UMC Groningen
Department of Genetics