Hi All,
I downloaded some RNA-seq datasets from NCBI. The datasets were generated by Illumina Hiseq 2000. I am not sure which "Input FASTQ quality scores type" I should choose when run FASTQ Groomer. Below shows the scores of 2 reads of a dataset, I renamed them as "read 1" and "read 2".
1) Sequence and quality score displayed in Galaxy
@read 1 length=51
NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC
+read 1 length=51
#1=ADADEHHHHHIIGIHJGJJJHJIIJJJH@HEGBFH;FHEH>@HIJJJJ
@read 2 length=51
NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT
+read 2 length=51
#1=DDDEDHHFHHJJJJJIJJHIIIJJJIJJJJJJJIJIJJJJJJIJJJJI
2)
Sequence and one chanel quality score shown in SRA of NCBI when I downloaded the dataset.
>gnl|SRA|read 1
NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC
One channel quality score
2 16 28 32 35 32 35 36 39 39 39 39 39 40 40 38 40 39 41 38 41 41 41 39 41 40 40 41 41 41 39
31 39 36 38 33 37 39 26 37 39 36 39 29 31 39 40 41 41 41 41
>gnl|SRA|read 2
NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT
One channel quality score
2 16 28 35 35 35 36 35 39 39 37 39 39 41 41 41 41 41 40 41 41 39 40 40 40 41 41 41 40 41 41
41 41 41 41 41 40 41 40 41 41 41 41 41 41 40 41 41 41 41 40
Looks like the dataset is generated by illumina that is later than version 1.8 because some of the reads are at score quality of 41. Can I choose "sanger" as "Input FASTQ quality scores type" when I run FASTQ Groomer?
Thanks.
Jianguang Du