Arthur,

When the data is coming from casavA 1.8 (actually I believe from 1.5 and above) I think it’s already in the proper format.

An excellent overview is here: http://en.wikipedia.org/wiki/FASTQ_format

Basically the headers of the fastq reads are my indication at the moment. Since 1.8 it changed to @SOMETHING<space>READINFO. Most current seqlabs deliver that format.

Good luck!

Alex

PS: correct me when wrong about the phred scoring. Probably PeterC knows this best since he wrote the python groomer.

 

Van: Arthur Zheng [mailto:haoz021@gmail.com]
Verzonden: dinsdag 14 februar
i 2012 6:01
Aan: Bossers, Alex
CC: galaxy-user@lists.bx.psu.edu
Onderwerp: Re: [galaxy-user] Large local file of NGS for FASTAQ Groomer

 

Dear Alex,

Thank you for the reminder.
I noticed that I am using illumina CASAVA 1.8.
How can I make sure whether it is already in Sanger format or not?

Arthur

On Mon, Feb 13, 2012 at 4:53 AM, Bossers, Alex <Alex.Bossers@wur.nl> wrote:

Are you sure the fastq's are in older format? Otherwise you won't need to groom the files anymore (as far as I understood) since the newer format is comparable Sanger quality score already.... Saves huge resources!
Alex