Hi Assaf, Just a quick note that the standard bowtie tool in Galaxy was enhanced in changeset 5157:7a9476924daf to work on 'fastqillumina' and 'fastqsolexa' variants in addition to the already possible 'fastqsanger'. In general, it is not a good idea to have a tool accept dataset.ext=='fastq' unless it doesn't care about quality scores or it determines the correct offset/scale itself or the variant type is declared by the user in the tool interface. When files are added to Galaxy, the datatype can be directly set to any of the fastq variants (e.g. fastqillumina), which removes the requirement of grooming (but should only be done when users know what they are doing).
The one time I've tried to make the built-in Bowtie tool available, I got complaints about "why isn't my FASTQ file appear in the input list" - because it was "fastq" and not "fastqsanger" after grooming - this is a silly technical step that should not be a concern to users - so I'm taking it out of the equation here (not to mention that grooming two 14GB FASTQ files for every lane is a huge waste of space and time).
It should not be possible to have a data.ext=='fastq' after Grooming (unless manually changed by a user), please report the steps that lead to this. Thanks, Dan On Mar 29, 2011, at 10:25 AM, Assaf Gordon wrote:
Hi Peter,
Peter Cock wrote, On 03/29/2011 05:39 AM:
2. the tools accepts FASTA, FASTQ in both Sanger and Illumina format (no more need for grooming). Illumina is the default for newly uploaded FASTQ files.
I think that's a bad idea - use Sanger FASTQ as the default to be consistent with the rest of Galaxy, and also with CASAVA 1.8 Illumina machines will produce that too, see: http://seqanswers.com/forums/showthread.php?t=8895
Thanks for the link - very interesting read, I wasn't aware of it.
However, for our local Galaxy server - I'm sticking with Illumina scale until I see real samples with phred-33 in the wild.
The defaults can be easily changed (in the XML file, simply assume a different scale when the extension is "fastq"), or don't accept "fastq" at all and force the user to change the format to either "fastqillumina" or "fastqsanger".
I'll explain my reasoning: We (at our lab) deal mostly with Illumina FASTQ files, with the Illumina scale. I'm trying to make life as easy as possible for our users. When they upload a FASTQ file, it is by default an Illumina FASTQ file, I want them to be able to use a workflow on it immediately. All of our internal tools assume Illumina scale.
The one time I've tried to make the built-in Bowtie tool available, I got complaints about "why isn't my FASTQ file appear in the input list" - because it was "fastq" and not "fastqsanger" after grooming - this is a silly technical step that should not be a concern to users - so I'm taking it out of the equation here (not to mention that grooming two 14GB FASTQ files for every lane is a huge waste of space and time).
When CASAVA 1.8 is ready (that is - when it is actually running in our sequencing center), then we'll have to deal with it. Ideally - galaxy will have some metadata code that will scan the first 1,000,000 lines and heuristically detect which scale it is. I'm not leaving this choice for the users, because they will make the wrong choice and then come crying back.
Just my two cents, -gordon ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: