Question about FASTQ Groomer
Hi Jianchao,
I am asking this question because I used to use Maq's sol2sanger (I guess it is just similar to your "Solexa") to convert all data generated by Illumina 1.5.
The different fastq formats are broadly summarised by: S - Sanger Phred+33, 41 values (0, 40) I - Illumina 1.3 Phred+64, 41 values (0, 40) X - Solexa Solexa+64, 68 values (-5, 62) However, at least in my version of the MAQ software (some months old), sol2sanger conversion converts from X to S and NOT from I to X. So if you feed I to the MAQ converter you are going to get slightly incorrect Sanger qualities (because it is expecting the input qualities to have been calculated using the Solexa formula but they have in fact been calculated using Phred). If you search on seqanswers.com you will find a post that details how you need to modify the MAQ conversion script to make the conversion from I to S. Could this explain the discrepancies you observe? Tim.
------------------------------
Message: 4 Date: Fri, 16 Apr 2010 11:20:09 -0400 From: "Yao, Jianchao" <jyao@cshl.edu> To: <galaxy-user@bx.psu.edu> Subject: [galaxy-user] Question about FASTQ Groomer Message-ID: <2A031782CDB83F44A26147B7A11C2C95013002CF@mailbox11.cshl.edu> Content-Type: text/plain; charset="us-ascii"
To Whom It May Concern:
I am a new user to Galaxy. In the function of "FASTQ Groomer", I noticed there is an option for "Input FASTQ quality scores type". My question is what different conversions you will do when I choose "Sloexa" or "Illumina 1.3+". I am asking this question because I used to use Maq's sol2sanger (I guess it is just similar to your "Solexa") to convert all data generated by Illumina 1.5. It seems like, based on your options, I should have chosen other conversion (e.g., your "Illumina 1.3+") to convert data generated by Illumina 1.5
Also, it looks like "Sloexa" and "Illumina 1.3+" just differ in the quality score calculation. But, when I use BWA and SAMtools to do mapping and call SNPs, I notice the size of the bam or pileup files are very different between those two different conversions. Also, it looks like even the coverage for some of the bases are different when choosing different conversions.
Can you tell me how the conversion can affect the final result in terms of coverage?
All your help will be greatly appreciated!
-Jianchao Yao
Here is a link to the post showing you how to modify the maq script to enable it for Illumina to Sanger conversion. http://seqanswers.com/forums/showthread.php?t=1453&highlight=conversion+solexa+illumina+sanger Tim. On Sat, Apr 17, 2010 at 7:42 AM, Timothy Hughes <tzhughes@gmail.com> wrote:
Hi Jianchao,
I am asking this question because I used to use Maq's sol2sanger (I guess it is just similar to your "Solexa") to convert all data generated by Illumina 1.5.
The different fastq formats are broadly summarised by:
S - Sanger Phred+33, 41 values (0, 40) I - Illumina 1.3 Phred+64, 41 values (0, 40) X - Solexa Solexa+64, 68 values (-5, 62)
However, at least in my version of the MAQ software (some months old), sol2sanger conversion converts from X to S and NOT from I to X. So if you feed I to the MAQ converter you are going to get slightly incorrect Sanger qualities (because it is expecting the input qualities to have been calculated using the Solexa formula but they have in fact been calculated using Phred). If you search on seqanswers.com you will find a post that details how you need to modify the MAQ conversion script to make the conversion from I to S.
Could this explain the discrepancies you observe?
Tim.
------------------------------
Message: 4 Date: Fri, 16 Apr 2010 11:20:09 -0400 From: "Yao, Jianchao" <jyao@cshl.edu> To: <galaxy-user@bx.psu.edu> Subject: [galaxy-user] Question about FASTQ Groomer Message-ID: <2A031782CDB83F44A26147B7A11C2C95013002CF@mailbox11.cshl.edu> Content-Type: text/plain; charset="us-ascii"
To Whom It May Concern:
I am a new user to Galaxy. In the function of "FASTQ Groomer", I noticed there is an option for "Input FASTQ quality scores type". My question is what different conversions you will do when I choose "Sloexa" or "Illumina 1.3+". I am asking this question because I used to use Maq's sol2sanger (I guess it is just similar to your "Solexa") to convert all data generated by Illumina 1.5. It seems like, based on your options, I should have chosen other conversion (e.g., your "Illumina 1.3+") to convert data generated by Illumina 1.5
Also, it looks like "Sloexa" and "Illumina 1.3+" just differ in the quality score calculation. But, when I use BWA and SAMtools to do mapping and call SNPs, I notice the size of the bam or pileup files are very different between those two different conversions. Also, it looks like even the coverage for some of the bases are different when choosing different conversions.
Can you tell me how the conversion can affect the final result in terms of coverage?
All your help will be greatly appreciated!
-Jianchao Yao
participants (1)
-
Timothy Hughes