Hi Lance,
I looked
at this a bit ago and had similar concerns, particularly with the
outputs and inputs not being well-defined. In addition to the output tar
ball —> local, extract —> upload not being great, as you mention,
the input datatypes, etc, could use some work — in the very least, we
should definitely create a nice biom datatype and have some converters
available (import and export).
Definitely worth spending some
extra time to make sure that we have the data flowing well between each
of the different parts/tools, and even better to make sure that its done
in a way that allows mixing and matching with other non-qiime tools.
One
thing that we want to avoid is large amounts of manual massaging of the
automatically generated xml; fixing things up once might not be too
bad, but having to do it with each new tool version can be
“frustrating". Although perhaps having a good starting point and only
needing to manually modify for any updates could be good enough (I’m not
very familiar with the extent of typical changes between qiime versions
to make a call on how much changes).
Dan
(resending
since I received a message bounce from list)
Hi Lance,
I looked
at this a bit ago and had similar concerns, particularly with the
outputs and inputs not being well-defined. In addition to the output tar
ball —> local, extract —> upload not being great, as you mention,
the input datatypes, etc, could use some work — in the very least, we
should definitely create a nice biom datatype and have some converters
available (import and export).
Definitely worth spending some
extra time to make sure that we have the data flowing well between each
of the different parts/tools, and even better to make sure that its done
in a way that allows mixing and matching with other non-qiime tools.
One
thing that we want to avoid is large amounts of manual massaging of the
automatically generated xml; fixing things up once might not be too
bad, but having to do it with each new tool version can be
“frustrating". Although perhaps having a good starting point and only
needing to manually modify for any updates could be good enough (I’m not
very familiar with the extent of typical changes between qiime versions
to make a call on how much changes).
Dan
I was recently asked if I could
provide a QIIME analysis pipeline for
16S data in Galaxy using tools in the QIIME pipeline
(
http://qiime.org/).
I did a bit of looking around for existing Galaxy wrappers and found
an
application that generates the wrappers for QIIME scripts for Galaxy
(
https://github.com/qiime/qiime-galaxy). This is a very well written
application that does a great job of wrapping the QIIME scripts for
Galaxy. However, there are a few things about it that don't quite fit my
needs.
1. The tools output tgz files of all of the output files. This means
that to execute a pipeline, the user would have to download the tgz
files, untar, and then upload whichever file(s) are needed for the next
step.
2. There is no toolshed repository to install the dependencies for
these
tools, making it a tricky for administrators to automate and also
maintain various versions of QIIME going forward.
3. There are no toolshed versions of the tools themselves, which
also
makes installation and integration a bit tricky and makes it hard to me
to create and manage updates, fixes, tweaks, etc. There are also no
tests, etc.
For these reasons I decided to investigate the feasibility of using
the
generated wrappers as a basis for a "toolshed" version of QIIME. If
anyone is interested in helping, or has suggestions, or is working on
something related, I'd be very happy to collaborate.
The repository for the WIP is at
https://github.com/lparsons/galaxy_tools/tree/qiime/tools/qiime1.9.0.
There is also a package on the testtoolshed as well as a first pass at
package_qiime_1_9_1
(
https://github.com/lparsons/galaxy_tools/tree/qiime/packages/package_qiime_1_9_1).