Defining new file formats in Galaxy (for new tool wrappers)
Hi all, Something I've not needed to do until now is define a new file format in Galaxy. I understand the basic principle and defining a subclass in Python... however, how does this work with new tools on the Tool Shed? In particular, if an output format is likely to be used by more than one tool, can we get it added to the Galaxy core? As an example, the basic functionality of the Blast2GO for pipelines tool (b2g4pipe) takes a BLAST XML input file, and gives a tab separated annotation output file. Galaxy already has 'blastxml' and 'tabular' file formats defined, so I didn't need to do anything extra. However, the tool can also take (a directory of) InterProScan XML files as input, so here a new 'interproscanxml' format would useful. Then any wrapper using or producing InterProScan XML could take advantage of this. e.g. Konrad's InterProScan wrapper could then offer the XML output as an option in addition to or instead of the tabular output. Related to this example, why isn't there a generic base class for XML formats in general? https://bitbucket.org/galaxy/galaxy-central/issue/568/missing-xml-datatype-b... Regards, Peter
Peter Cock wrote:
Hi all,
Something I've not needed to do until now is define a new file format in Galaxy. I understand the basic principle and defining a subclass in Python... however, how does this work with new tools on the Tool Shed? In particular, if an output format is likely to be used by more than one tool, can we get it added to the Galaxy core?
I think people have provided the new subclass as a patch with the tool, but probably many of them, if well written, could be added to the core.
As an example, the basic functionality of the Blast2GO for pipelines tool (b2g4pipe) takes a BLAST XML input file, and gives a tab separated annotation output file. Galaxy already has 'blastxml' and 'tabular' file formats defined, so I didn't need to do anything extra. However, the tool can also take (a directory of) InterProScan XML files as input, so here a new 'interproscanxml' format would useful. Then any wrapper using or producing InterProScan XML could take advantage of this. e.g. Konrad's InterProScan wrapper could then offer the XML output as an option in addition to or instead of the tabular output.
Related to this example, why isn't there a generic base class for XML formats in general? https://bitbucket.org/galaxy/galaxy-central/issue/568/missing-xml-datatype-b...
It just hadn't been necessary in the past and no one had the time to write it, I agree it could be helpful since there are other more specific XML types. --nate
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Jun 2, 2011, at 1:29 PM, Nate Coraor wrote:
Peter Cock wrote:
Hi all,
Something I've not needed to do until now is define a new file format in Galaxy. I understand the basic principle and defining a subclass in Python... however, how does this work with new tools on the Tool Shed? In particular, if an output format is likely to be used by more than one tool, can we get it added to the Galaxy core?
I think people have provided the new subclass as a patch with the tool, but probably many of them, if well written, could be added to the core.
As an example, the basic functionality of the Blast2GO for pipelines tool (b2g4pipe) takes a BLAST XML input file, and gives a tab separated annotation output file. Galaxy already has 'blastxml' and 'tabular' file formats defined, so I didn't need to do anything extra. However, the tool can also take (a directory of) InterProScan XML files as input, so here a new 'interproscanxml' format would useful. Then any wrapper using or producing InterProScan XML could take advantage of this. e.g. Konrad's InterProScan wrapper could then offer the XML output as an option in addition to or instead of the tabular output.
We will certainly include support for new data formats into the Galaxy core. In case you haven't seen it, details for adding new formats is available in our wiki at https://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes. It's fairly straightforward. However, glancing at the wiki, it looks like there is no mention of functional tests for the new format. If we could get a patch that includes a functional test for uploading the format as new method(s) in ~/test/functional/test_get_data.py, it would be great.
Related to this example, why isn't there a generic base class for XML formats in general? https://bitbucket.org/galaxy/galaxy-central/issue/568/missing-xml-datatype-b...
It just hadn't been necessary in the past and no one had the time to write it, I agree it could be helpful since there are other more specific XML types.
Yes, XML formats have not yet been abstracted, and certainly can be. Just a matter of bandwidth...
--nate
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Thu, Jun 2, 2011 at 6:39 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
On Jun 2, 2011, at 1:29 PM, Nate Coraor wrote:
Peter Cock wrote:
Hi all,
Something I've not needed to do until now is define a new file format in Galaxy. I understand the basic principle and defining a subclass in Python... however, how does this work with new tools on the Tool Shed? In particular, if an output format is likely to be used by more than one tool, can we get it added to the Galaxy core?
I think people have provided the new subclass as a patch with the tool, but probably many of them, if well written, could be added to the core.
As an example, the basic functionality of the Blast2GO for pipelines tool (b2g4pipe) takes a BLAST XML input file, and gives a tab separated annotation output file. Galaxy already has 'blastxml' and 'tabular' file formats defined, so I didn't need to do anything extra. However, the tool can also take (a directory of) InterProScan XML files as input, so here a new 'interproscanxml' format would useful. Then any wrapper using or producing InterProScan XML could take advantage of this. e.g. Konrad's InterProScan wrapper could then offer the XML output as an option in addition to or instead of the tabular output.
We will certainly include support for new data formats into the Galaxy core. In case you haven't seen it, details for adding new formats is available in our wiki at https://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes It's fairly straightforward. However, glancing at the wiki, it looks like there is no mention of functional tests for the new format. If we could get a patch that includes a functional test for uploading the format as new method(s) in ~/test/functional/test_get_data.py, it would be great.
Thanks for the link - I was aware my initial work on adding a generic XML filetype was missing some steps, but I ran out of time yesterday: https://bitbucket.org/galaxy/galaxy-central/issue/568 The test information is especially useful - but I don't see any use of doctest in test/functional/test_get_data.py - could you clarify if the docstring examples within the datatype classes are actually tested, and if so how? Thanks, Peter
On Jun 3, 2011, at 5:03 AM, Peter Cock wrote:
On Thu, Jun 2, 2011 at 6:39 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
On Jun 2, 2011, at 1:29 PM, Nate Coraor wrote:
Peter Cock wrote:
Hi all,
Something I've not needed to do until now is define a new file format in Galaxy. I understand the basic principle and defining a subclass in Python... however, how does this work with new tools on the Tool Shed? In particular, if an output format is likely to be used by more than one tool, can we get it added to the Galaxy core?
I think people have provided the new subclass as a patch with the tool, but probably many of them, if well written, could be added to the core.
As an example, the basic functionality of the Blast2GO for pipelines tool (b2g4pipe) takes a BLAST XML input file, and gives a tab separated annotation output file. Galaxy already has 'blastxml' and 'tabular' file formats defined, so I didn't need to do anything extra. However, the tool can also take (a directory of) InterProScan XML files as input, so here a new 'interproscanxml' format would useful. Then any wrapper using or producing InterProScan XML could take advantage of this. e.g. Konrad's InterProScan wrapper could then offer the XML output as an option in addition to or instead of the tabular output.
We will certainly include support for new data formats into the Galaxy core. In case you haven't seen it, details for adding new formats is available in our wiki at https://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes It's fairly straightforward. However, glancing at the wiki, it looks like there is no mention of functional tests for the new format. If we could get a patch that includes a functional test for uploading the format as new method(s) in ~/test/functional/test_get_data.py, it would be great.
Thanks for the link - I was aware my initial work on adding a generic XML filetype was missing some steps, but I ran out of time yesterday: https://bitbucket.org/galaxy/galaxy-central/issue/568
The test information is especially useful - but I don't see any use of doctest in test/functional/test_get_data.py - could you clarify if the docstring examples within the datatype classes are actually tested, and if so how?
The unit tests within the datatype classes are executed via: %sh run_unit_tests.sh in the Galaxy root directory, whereas the functional tests are executed via: %sh run_functional_tests.sh in the Galaxy root directory.
Thanks,
Peter
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Fri, Jun 3, 2011 at 10:03 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Thanks for the link - I was aware my initial work on adding a generic XML filetype was missing some steps, but I ran out of time yesterday: https://bitbucket.org/galaxy/galaxy-central/issue/568
Hi all, Could someone please review this patch: https://bitbucket.org/peterjc/galaxy-central/changeset/83c4366e0641 This is currently the one and only commit to my xml_filetype branch, https://bitbucket.org/peterjc/galaxy-central/src/xml_filetype This attempts to define a new basic data format 'xml' as per issue 568, https://bitbucket.org/galaxy/galaxy-central/issue/568 This might help with a user error I was just presented with, where an Excel spreadsheet was uploaded and miss-identified as text - when it was clearly XML. Clicking on the 'eye' tried to display it and gave a cryptic error message in the central panel. In fact, this example makes me wonder if the proposed base XML datatype class should NOT be a subclass of text (as it is now with the blastxml datatype). Peter
Peter Cock wrote:
On Fri, Jun 3, 2011 at 10:03 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Thanks for the link - I was aware my initial work on adding a generic XML filetype was missing some steps, but I ran out of time yesterday: https://bitbucket.org/galaxy/galaxy-central/issue/568
Hi all,
Could someone please review this patch:
https://bitbucket.org/peterjc/galaxy-central/changeset/83c4366e0641
Hi Peter, I've merged your branch with these changes in 5897:6165799c4e49. Thanks! --nate
This is currently the one and only commit to my xml_filetype branch,
https://bitbucket.org/peterjc/galaxy-central/src/xml_filetype
This attempts to define a new basic data format 'xml' as per issue 568,
https://bitbucket.org/galaxy/galaxy-central/issue/568
This might help with a user error I was just presented with, where an Excel spreadsheet was uploaded and miss-identified as text - when it was clearly XML. Clicking on the 'eye' tried to display it and gave a cryptic error message in the central panel.
In fact, this example makes me wonder if the proposed base XML datatype class should NOT be a subclass of text (as it is now with the blastxml datatype).
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Mon, Aug 15, 2011 at 6:33 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Peter Cock wrote:
On Fri, Jun 3, 2011 at 10:03 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Thanks for the link - I was aware my initial work on adding a generic XML filetype was missing some steps, but I ran out of time yesterday: https://bitbucket.org/galaxy/galaxy-central/issue/568
Hi all,
Could someone please review this patch:
https://bitbucket.org/peterjc/galaxy-central/changeset/83c4366e0641
Hi Peter,
I've merged your branch with these changes in 5897:6165799c4e49. Thanks!
--nate
Thanks Nate :)
This is currently the one and only commit to my xml_filetype branch,
https://bitbucket.org/peterjc/galaxy-central/src/xml_filetype
This attempts to define a new basic data format 'xml' as per issue 568,
https://bitbucket.org/galaxy/galaxy-central/issue/568
This might help with a user error I was just presented with, where an Excel spreadsheet was uploaded and miss-identified as text - when it was clearly XML. Clicking on the 'eye' tried to display it and gave a cryptic error message in the central panel.
In fact, this example makes me wonder if the proposed base XML datatype class should NOT be a subclass of text (as it is now with the blastxml datatype).
Peter
What are your thoughts on the above? I guess for some tasks it makes sense to regard XML as text, but for others not. Peter
On Thu, Jun 2, 2011 at 6:39 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
We will certainly include support for new data formats into the Galaxy core. In case you haven't seen it, details for adding new formats is available in our wiki at https://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes.
Hi Greg, Should that page talk about lib/galaxy/datatypes/registry.py as well? That seems to be where mime types are specified, and for some reason (a historical fall back?), there is another sniffer listing here too (as well as in datatypes_conf.xml). Peter
Hello Peter, On Jun 6, 2011, at 6:41 AM, Peter Cock wrote:
On Thu, Jun 2, 2011 at 6:39 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
We will certainly include support for new data formats into the Galaxy core. In case you haven't seen it, details for adding new formats is available in our wiki at https://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes.
Hi Greg,
Should that page talk about lib/galaxy/datatypes/registry.py as well?
Sure! If you have found weaknesses or missing details in the wiki that made it harder for you to add a new datatype, please feel free to improve the wiki content. Sometimes we miss things because we're so close to the implementation that things make sense to us, but not to others.
That seems to be where mime types are specified, and for some reason (a historical fall back?), there is another sniffer listing here too (as well as in datatypes_conf.xml).
The sniffer listing and the mimetypes and datatypes by extension in registry.py are default values in case (form some reason) these datatypes are not listed in the datatypes_conf.xml.
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Thu, Jun 9, 2011 at 3:17 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hello Peter,
On Jun 6, 2011, at 6:41 AM, Peter Cock wrote:
On Thu, Jun 2, 2011 at 6:39 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
We will certainly include support for new data formats into the Galaxy core. In case you haven't seen it, details for adding new formats is available in our wiki at https://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes.
Hi Greg,
Should that page talk about lib/galaxy/datatypes/registry.py as well?
Sure! If you have found weaknesses or missing details in the wiki that made it harder for you to add a new datatype, please feel free to improve the wiki content. Sometimes we miss things because we're so close to the implementation that things make sense to us, but not to others.
Is there anything documented about the built-in format conversion within the Galaxy core, rather than tools.
That seems to be where mime types are specified, and for some reason (a historical fall back?), there is another sniffer listing here too (as well as in datatypes_conf.xml).
The sniffer listing and the mimetypes and datatypes by extension in registry.py are default values in case (form some reason) these datatypes are not listed in the datatypes_conf.xml.
That's what I thought, Peter
Hi Peter, On Jun 9, 2011, at 12:02 PM, Peter Cock wrote:
Is there anything documented about the built-in format conversion within the Galaxy core, rather than tools.
No, not currently - except for a few details in various slide presentations. Nothing forma though. We're trying to keep up on some of this documentation, so hopefully we'll have something soon. Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Thu, Jun 2, 2011 at 6:29 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Peter Cock wrote:
Hi all,
Something I've not needed to do until now is define a new file format in Galaxy. I understand the basic principle and defining a subclass in Python... however, how does this work with new tools on the Tool Shed? In particular, if an output format is likely to be used by more than one tool, can we get it added to the Galaxy core?
I think people have provided the new subclass as a patch with the tool, but probably many of them, if well written, could be added to the core.
I think that should be encouraged, especially if you want to make tool installation as automatic as possible (patching Galaxy to add a new format is non-trivial). Peter
participants (3)
-
Greg Von Kuster
-
Nate Coraor
-
Peter Cock