Arthur, When the data is coming from casavA 1.8 (actually I believe from 1.5 and above) I think it's already in the proper format. An excellent overview is here: http://en.wikipedia.org/wiki/FASTQ_format Basically the headers of the fastq reads are my indication at the moment. Since 1.8 it changed to @SOMETHING<space>READINFO. Most current seqlabs deliver that format. Good luck! Alex PS: correct me when wrong about the phred scoring. Probably PeterC knows this best since he wrote the python groomer.
Van: Arthur Zheng [mailto:firstname.lastname@example.org] Verzonden: dinsdag 14 februari 2012 6:01 Aan: Bossers, Alex CC: email@example.com Onderwerp: Re: [galaxy-user] Large local file of NGS for FASTAQ Groomer
Thank you for the reminder. I noticed that I am using illumina CASAVA 1.8. How can I make sure whether it is already in Sanger format or not?
Arthur On Mon, Feb 13, 2012 at 4:53 AM, Bossers, Alex <Alex.Bossers@wur.nlmailto:Alex.Bossers@wur.nl> wrote: Are you sure the fastq's are in older format? Otherwise you won't need to groom the files anymore (as far as I understood) since the newer format is comparable Sanger quality score already.... Saves huge resources! Alex