Arthur,
When the data is coming from casavA 1.8 (actually I believe from 1.5 and above) I think it’s already in the proper format.
An excellent overview is here: http://en.wikipedia.org/wiki/FASTQ_format
Basically the headers of the fastq reads are my indication at the moment. Since 1.8 it changed to @SOMETHING<space>READINFO. Most current seqlabs deliver that format.
Good luck!
Alex
PS: correct me when wrong about the phred scoring. Probably PeterC knows this best since he wrote the python groomer.
Van: Arthur Zheng [mailto:haoz021@gmail.com]
Verzonden: dinsdag 14 februari 2012 6:01
Aan: Bossers, Alex
CC: galaxy-user@lists.bx.psu.edu
Onderwerp: Re: [galaxy-user] Large local file of NGS for FASTAQ Groomer
Dear Alex,
Thank you for the reminder.
I noticed that I am using illumina CASAVA 1.8.
How can I make sure whether it is already in Sanger format or not?
Arthur
On Mon, Feb 13, 2012 at 4:53 AM, Bossers, Alex <Alex.Bossers@wur.nl> wrote:
Are you sure the fastq's are in older format? Otherwise you won't need to groom the files anymore (as far as I understood) since the newer format is comparable Sanger quality score already.... Saves huge resources!
Alex