I agree that it would be very nice to get the data flowing between each of the tools and to be able to mix/match with other tools. That is an area of Galaxy tool-dev that I'm less familiar with, so any help would be greatly appreciated.

As for the manual massaging, I agree, however, at this point I'm really hoping that the changes between Qiime versions are manageable, esp. once we have the data types and basic framework of the tools down.

Along that front, I would very much like to get something usable (at least for some specific workflows) done sooner rather than later, then be able to iterate, adding new tools, datatypes, etc. as we move forward. Does that sound reasonable? I love the idea of a Metagenomics toolfest (https://github.com/galaxyproject/tools-iuc/issues/299). One thing I would like to do soon then, would be to define some functionality for the first round. What would be very helpful would be if you, Bjoern, etc. could help make sure the first round lays the proper groundwork so I/we don't have to redo things for later iterations, but we can build on a solid foundation.

Thanks so much for the input, help, etc. It is very much appreciated.

- Lance

Daniel Blankenberg
October 6, 2015 at 10:21 AM
Hi Lance,

I looked at this a bit ago and had similar concerns, particularly with the outputs and inputs not being well-defined. In addition to the output tar ball —> local, extract —> upload not being great, as you mention, the input datatypes, etc, could use some work — in the very least, we should definitely create a nice biom datatype and have some converters available (import and export).

Definitely worth spending some extra time to make sure that we have the data flowing well between each of the different parts/tools, and even better to make sure that its done in a way that allows mixing and matching with other non-qiime tools.


One thing that we want to avoid is large amounts of manual massaging of the automatically generated xml; fixing things up once might not be too bad, but having to do it with each new tool version can be “frustrating". Although perhaps having a good starting point and only needing to manually modify for any updates could be good enough (I’m not very familiar with the extent of typical changes between qiime versions to make a call on how much changes).




Dan

(resending since I received a message bounce from list)



Daniel Blankenberg
October 6, 2015 at 9:59 AM
Hi Lance,

I looked at this a bit ago and had similar concerns, particularly with the outputs and inputs not being well-defined. In addition to the output tar ball —> local, extract —> upload not being great, as you mention, the input datatypes, etc, could use some work — in the very least, we should definitely create a nice biom datatype and have some converters available (import and export).

Definitely worth spending some extra time to make sure that we have the data flowing well between each of the different parts/tools, and even better to make sure that its done in a way that allows mixing and matching with other non-qiime tools.


One thing that we want to avoid is large amounts of manual massaging of the automatically generated xml; fixing things up once might not be too bad, but having to do it with each new tool version can be “frustrating". Although perhaps having a good starting point and only needing to manually modify for any updates could be good enough (I’m not very familiar with the extent of typical changes between qiime versions to make a call on how much changes).




Dan



Lance Parsons
October 5, 2015 at 5:26 PM
I was recently asked if I could provide a QIIME analysis pipeline for 16S data in Galaxy using tools in the QIIME pipeline (http://qiime.org/).

I did a bit of looking around for existing Galaxy wrappers and found an application that generates the wrappers for QIIME scripts for Galaxy (https://github.com/qiime/qiime-galaxy). This is a very well written application that does a great job of wrapping the QIIME scripts for Galaxy. However, there are a few things about it that don't quite fit my needs.

1. The tools output tgz files of all of the output files. This means that to execute a pipeline, the user would have to download the tgz files, untar, and then upload whichever file(s) are needed for the next step.
2. There is no toolshed repository to install the dependencies for these tools, making it a tricky for administrators to automate and also maintain various versions of QIIME going forward.
3. There are no toolshed versions of the tools themselves, which also makes installation and integration a bit tricky and makes it hard to me to create and manage updates, fixes, tweaks, etc. There are also no tests, etc.

For these reasons I decided to investigate the feasibility of using the generated wrappers as a basis for a "toolshed" version of QIIME. If anyone is interested in helping, or has suggestions, or is working on something related, I'd be very happy to collaborate.

The repository for the WIP is at https://github.com/lparsons/galaxy_tools/tree/qiime/tools/qiime1.9.0. There is also a package on the testtoolshed as well as a first pass at package_qiime_1_9_1 (https://github.com/lparsons/galaxy_tools/tree/qiime/packages/package_qiime_1_9_1).


--
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University