When the data is coming from casavA 1.8 (actually I believe from 1.5 and above) I think it’s already in the proper format.

An excellent overview is here:

Basically the headers of the fastq reads are my indication at the moment. Since 1.8 it changed to @SOMETHING<space>READINFO. Most current seqlabs deliver that format.

Good luck!


PS: correct me when wrong about the phred scoring. Probably PeterC knows this best since he wrote the python groomer.


I noticed that I am using illumina CASAVA 1.8.
How can I make sure whether it is already in Sanger format or not?


Are you sure the fastq's are in older format? Otherwise you won't need to groom the files anymore (as far as I understood) since the newer format is comparable Sanger quality score already.... Saves huge resources!