When the data is coming from casavA 1.8 (actually I believe from 1.5 and above) I think it’s already in the proper format.

An excellent overview is here:

Basically the headers of the fastq reads are my indication at the moment. Since 1.8 it changed to @SOMETHING<space>READINFO. Most current seqlabs deliver that format.

Good luck!


PS: correct me when wrong about the phred scoring. Probably PeterC knows this best since he wrote the python groomer.


Van: Arthur Zheng []
Verzonden: dinsdag 14 februar
i 2012 6:01
Aan: Bossers, Alex
Onderwerp: Re: [galaxy-user] Large local file of NGS for FASTAQ Groomer


Dear Alex,

Thank you for the reminder.
I noticed that I am using illumina CASAVA 1.8.
How can I make sure whether it is already in Sanger format or not?


On Mon, Feb 13, 2012 at 4:53 AM, Bossers, Alex <> wrote:

Are you sure the fastq's are in older format? Otherwise you won't need to groom the files anymore (as far as I understood) since the newer format is comparable Sanger quality score already.... Saves huge resources!