On Thu, Jun 2, 2011 at 6:39 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
On Jun 2, 2011, at 1:29 PM, Nate Coraor wrote:
Peter Cock wrote:
Hi all,
Something I've not needed to do until now is define a new file format in Galaxy. I understand the basic principle and defining a subclass in Python... however, how does this work with new tools on the Tool Shed? In particular, if an output format is likely to be used by more than one tool, can we get it added to the Galaxy core?
I think people have provided the new subclass as a patch with the tool, but probably many of them, if well written, could be added to the core.
As an example, the basic functionality of the Blast2GO for pipelines tool (b2g4pipe) takes a BLAST XML input file, and gives a tab separated annotation output file. Galaxy already has 'blastxml' and 'tabular' file formats defined, so I didn't need to do anything extra. However, the tool can also take (a directory of) InterProScan XML files as input, so here a new 'interproscanxml' format would useful. Then any wrapper using or producing InterProScan XML could take advantage of this. e.g. Konrad's InterProScan wrapper could then offer the XML output as an option in addition to or instead of the tabular output.
We will certainly include support for new data formats into the Galaxy core. In case you haven't seen it, details for adding new formats is available in our wiki at https://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes It's fairly straightforward. However, glancing at the wiki, it looks like there is no mention of functional tests for the new format. If we could get a patch that includes a functional test for uploading the format as new method(s) in ~/test/functional/test_get_data.py, it would be great.
Thanks for the link - I was aware my initial work on adding a generic XML filetype was missing some steps, but I ran out of time yesterday: https://bitbucket.org/galaxy/galaxy-central/issue/568 The test information is especially useful - but I don't see any use of doctest in test/functional/test_get_data.py - could you clarify if the docstring examples within the datatype classes are actually tested, and if so how? Thanks, Peter