Hi Jianguang,

I agree - already Sanger Phred +33 offset quality scores, meaning you want datatype .fastqsanger (with near certainty). To double check, take a sample and run "FastQC" on it to be exact, or run this tool on the entire dataset if you plan on doing quality checks anyway (potential trimming, etc).

You also don't need to run the groomer - just assign the datatype by clicking on the pencil icon. Help is here and the screencast FASTQ Prep walks through a how-to (using SRA data as an example):
http://wiki.galaxyproject.org/Support#Dataset_special_cases

Hope this helps - but you are really already on the right track, I'm just agreeing!

Jen
Galaxy

On 8/29/13 12:53 PM, Du, Jianguang wrote:

Hi All,

I downloaded some RNA-seq datasets from NCBI. The datasets were generated by Illumina Hiseq 2000. I am not sure which "Input FASTQ quality scores type" I should choose when run FASTQ Groomer. Below shows the scores of 2 reads of a dataset, I renamed them as "read 1" and "read 2".

 

1) Sequence and quality score displayed in Galaxy

@read 1 length=51

NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC

+read 1 length=51

#1=ADADEHHHHHIIGIHJGJJJHJIIJJJH@HEGBFH;FHEH>@HIJJJJ

@read 2 length=51

NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT

+read 2 length=51

#1=DDDEDHHFHHJJJJJIJJHIIIJJJIJJJJJJJIJIJJJJJJIJJJJI

 

2)

Sequence and one chanel quality score shown in SRA of NCBI when I downloaded the dataset.

>gnl|SRA|read 1

NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC

One channel quality score

 2 16 28 32 35 32 35 36 39 39 39 39 39 40 40 38 40 39 41 38 41 41 41 39 41 40 40 41 41 41 39 31 39 36 38 33 37 39 26 37 39 36 39 29 31 39 40 41 41 41 41

 

>gnl|SRA|read 2

NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT

One channel quality score

 2 16 28 35 35 35 36 35 39 39 37 39 39 41 41 41 41 41 40 41 41 39 40 40 40 41 41 41 40 41 41 41 41 41 41 41 40 41 40 41 41 41 41 41 41 40 41 41 41 41 40

 

Looks like the dataset is generated by illumina that is later than version 1.8 because some of the reads are at score quality of 41. Can I choose "sanger" as "Input FASTQ quality scores type" when I run FASTQ Groomer?

 

Thanks.

 

Jianguang Du   

 



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org