On Wed, Oct 30, 2013 at 4:03 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
This is just to announce I am working on a wrapper for "CLC Assembly Cell" which is the CLCbio commercial command line assembly tool suite. http://www.clcbio.com/products/clc-assembly-cell/
Our institute bought a licence primarily for use on plant genomes where other assemblers at the time required too much RAM to complete. This assembler is both fast and low memory, which can be very useful.
Wrapper development here: https://github.com/peterjc/pico_galaxy/tree/master/tools/clc_assembly_cell
Prototype releases will be on the Test Tool Shed (soon): http://testtoolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell
Stable Tool Shed releases will be here (later): http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell
I would be interested to hear from anyone else with access to a licensed copy of the tool interested in using it from Galaxy. e.g. Is it reasonable to assume the tools are on the $PATH, or is using a specific environment variable more helpful?
I've continued working on this and it seems to be working quite nicely on the Illumina examples I have tried. I have written a Galaxy wrapper for the clc_assembler command line tool (FASTA/FASTQ reads to a FASTA assembly), plus a combined wrapper for clc_mapper and clc_cas_to_sam (FASTA/FASTQ reads plus FASTA assembly to a BAM mapping file). This avoids the issues with attempting to define a Galaxy datatype for the CLCbio CAS file format - which is not self contained and therefore does not fit Galaxy's data model. I would prefer to have someone else test this on another Galaxy instance before I post it to the main Tool Shed, but any feedback would be welcome. Thanks, Peter