Re: [galaxy-dev] Alternative bowtie tools

29 Mar 2011

      Dan and Peter,

Peter Cock wrote, On 03/29/2011 12:08 PM:
...
Why not do the Illumina to Sanger conversion as part of your
pipeline that gets the data into Galaxy (and mark the files as
fastqsanger)? As Glen said, with a C tool that isn't really so slow.
That future proofs you for the pending Illumina CASAVA 1.8 release,
and means you don't need to maintain divergent Bowtie wrappers for
Galaxy.
I refuse to groom on a general principle.
The idea itself is unreasonable - all the tools support Illumina scale natively.
I'm not going to waste my disk space and users' time (and SGE time) by grooming.

When I'll see CASAVA 1.8 running then I'll switch (as we are software people, we know that there's a gap between the planning document and the real software). Note that even in that CASAVA 1.8 document they mention that the export files will still be in Illumina format, so it won't be completely gone.

Daniel Blankenberg wrote, On 03/29/2011 12:41 PM:
...
The Grooming step is currently very time consuming and can be quite 
wasteful in disk space if the source and target fastq files are the 
same.
It is wasteful in any case, not just if they are the same...
...
but I have seen many occasions where Grooming has 'saved the 
day' by e.g. detecting truncated files that may have gone undetected 
by downstream tools or by indicating to the user that the variant 
they had selected as the source was incorrect.
I would humbly guess that most of those truncated files are due to problematic HTTP uploads - so it saves the day from another problem, which should be avoided all together.
...
However, I have been thinking about adding a 'check only' option to 
the Groomer that would use a naive parser (assume exactly 4 lines to 
a read, ascii scores, require input variant==output variant, etc.) 
and reuse the underlying original dataset file as the output
(without writing over the file). This would be significantly faster
and not waste disk space, but it would require enhancements to the
framework.
I know you (the galaxy team) try very hard to have everything in native python (for easy deployment) but I still hold the opinion that these tools should not be done in python. No matter how much you minimize the processing, it will not be as efficient as good a compile program. Python (or perl, I don't discriminate) can probably do this entire "check only mode" in just a few lines of regexes - but try it on twenty 14GB FASTQ files and you'll realize it's not practical.

Bottom line - I wouldn't use a python "checker" anyhow.

-gordon

Re: [galaxy-dev] Alternative bowtie tools

Assaf Gordon