Hi All,

I downloaded some RNA-seq datasets from NCBI. The datasets were generated by Illumina Hiseq 2000. I am not sure which "Input FASTQ quality scores type" I should choose when run FASTQ Groomer. Below shows the scores of 2 reads of a dataset, I renamed them as "read 1" and "read 2".

 

1) Sequence and quality score displayed in Galaxy

@read 1 length=51

NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC

+read 1 length=51

#1=ADADEHHHHHIIGIHJGJJJHJIIJJJH@HEGBFH;FHEH>@HIJJJJ

@read 2 length=51

NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT

+read 2 length=51

#1=DDDEDHHFHHJJJJJIJJHIIIJJJIJJJJJJJIJIJJJJJJIJJJJI

 

2)

Sequence and one chanel quality score shown in SRA of NCBI when I downloaded the dataset.

>gnl|SRA|read 1

NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC

One channel quality score

 2 16 28 32 35 32 35 36 39 39 39 39 39 40 40 38 40 39 41 38 41 41 41 39 41 40 40 41 41 41 39 31 39 36 38 33 37 39 26 37 39 36 39 29 31 39 40 41 41 41 41

 

>gnl|SRA|read 2

NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT

One channel quality score

 2 16 28 35 35 35 36 35 39 39 37 39 39 41 41 41 41 41 40 41 41 39 40 40 40 41 41 41 40 41 41 41 41 41 41 41 40 41 40 41 41 41 41 41 41 40 41 41 41 41 40

 

Looks like the dataset is generated by illumina that is later than version 1.8 because some of the reads are at score quality of 41. Can I choose "sanger" as "Input FASTQ quality scores type" when I run FASTQ Groomer?

 

Thanks.

 

Jianguang Du