On Tue, Sep 17, 2013 at 10:34 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
Bjoern and I were talking about doing some work on more datatype definitions, but this raises the question about how to handle dependencies.
First of all, complex datatypes in Galaxy are defined using Python code, so any Python dependency must be available to the main Galaxy process (unlike Python dependencies for jobs which are run on separate processes). For example, if I wrote GenBank/EMBL definitions using Biopython, could this dependency be handled via the Tool Shed?
Second, the Python code for some datatypes may call a binary command line tool. For example, I was thinking of using the blastdbcmd binary within the BLAST database file format definitions to provide more useful peek information. Again, how could this dependency be handled via the Tool Shed?
Another real example: Right now I am working on wrapping MIRA v4, and defining a custom datatype 'mira' for its own assembly output format. I would like to handle conversion to other formats like ACE, SAM, etc (as part of the datatype definition) but this will mean a dependency on the miraconvert binary. I can instead provide a miraconvert wrapper as another tool, but that doesn't give the full potential that using a full datatype definition could. Is this the only sensible option at the moment? Regards, Peter