Problems with custom data types and sniffers in shed tools
Hi, We're developing a suite of galaxy tools for doing proteomics. As part of that we're developing a display application ( https://bitbucket.org/Andrew_Brock/proteomics-visualise ) for viewing pepXML and protXML files and we rely very heavily on custom datatypes. We've got this working in a fork of galaxy-dist ( https://bitbucket.org/iracooke/galaxy-proteomics ) but would love to be able to get rid of this fork and integrate all our customisations into a shed tool. I initially tried following the instructions here http://wiki.g2.bx.psu.edu/Tool%20Shed#Including_proprietary_data_types_that_... and was able to successfully get my custom datatypes to load. Unfortunately though I could not get my sniffers to work. I think there are two reasons for this; The first problem is that it looks like the sniffers are never loaded. I get errors like this when the tool is loaded; galaxy.datatypes.registry DEBUG 2012-02-06 10:26:51,317 Loading datatypes from /Users/iracooke/Sources/shed_tools_galaxy_central/toolshed.g2.bx.psu.edu/repos/jjohnson/gmap/93911bac43da/gmap/tool-data/datatypes_conf.xml galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,318 Error appending sniffer for datatype galaxy.datatypes.gmap:IntervalAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:SpliceSiteAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:IntronAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:SNPAnnotation to sniff_order: No module named gmap These errors aren't just restricted to my tool ... as you can see from the above they also occur in the gmap example installed from the main galaxy toolshed. I tried hacking the code in registration.py to get this to work ... it looks like the section where sniffers are loaded does not use the "imported_modules" variable. I was able to get this error message to go away, but I still don't get proper sniffer behaviour (eg when I click "Auto detect" when editing a dataset the dataset is set to a generic type). Another issue is that I would like the sniffers loaded from my shed_tool to be used by the "Upload" tool .. but if I look at the source for upload.py and upload.xml I see -- Upload.xml <command interpreter="python"> upload.py $GALAXY_ROOT_DIR $GALAXY_DATATYPES_CONF_FILE $paramfile --Upload.py def __main__(): .... registry = Registry() registry.load_datatypes( root_dir=sys.argv[1], config=sys.argv[2] ) which looks like the upload tool is not configured to use the custom datatypes available in shed tools. Would it be possible to make the following changes; (a) Add custom sniffers to the sniff order when shed tools are loaded. Importantly since custom datatypes are usually quite specific I would suggest that these are loaded at the top of the sniff order? Or alternatively if a sniffer is for a datatype that descends from a superclass it should have priority over the parent class (since by definition it is more specific). (b) Change the upload tool so that it respects custom sniffers in shed tools. I guess that our case is a bit unusual in that we are trying to co-opt galaxy ( a genomics tool) to do proteomics ... so I understand that these changes might not be a priority. Nevertheless, if this could be done it would be fantastic for us as we could abandon our fork and have all our functionality included in a shed tool. Regards Ira
Hello Ira, I will fix the sniff problem with proprietary data types included in tool sheds - I should be able to have this fixed within the next day or two. With regard to the upload tool respecting these sniffers, I've already designed this tool as well as the metadata setting components to do so, so as soon as I get the sniffer loading problem fixed, all should work. I will make sure the proprietary tool shed sniffers are loaded before sniffers in the Galaxy distribution. Thanks for reporting this, Greg On Feb 5, 2012, at 7:04 PM, Ira Cooke wrote:
Hi,
We're developing a suite of galaxy tools for doing proteomics. As part of that we're developing a display application ( https://bitbucket.org/Andrew_Brock/proteomics-visualise ) for viewing pepXML and protXML files and we rely very heavily on custom datatypes. We've got this working in a fork of galaxy-dist ( https://bitbucket.org/iracooke/galaxy-proteomics ) but would love to be able to get rid of this fork and integrate all our customisations into a shed tool.
I initially tried following the instructions here http://wiki.g2.bx.psu.edu/Tool%20Shed#Including_proprietary_data_types_that_... and was able to successfully get my custom datatypes to load. Unfortunately though I could not get my sniffers to work. I think there are two reasons for this;
The first problem is that it looks like the sniffers are never loaded. I get errors like this when the tool is loaded;
galaxy.datatypes.registry DEBUG 2012-02-06 10:26:51,317 Loading datatypes from /Users/iracooke/Sources/shed_tools_galaxy_central/toolshed.g2.bx.psu.edu/repos/jjohnson/gmap/93911bac43da/gmap/tool-data/datatypes_conf.xml galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,318 Error appending sniffer for datatype galaxy.datatypes.gmap:IntervalAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:SpliceSiteAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:IntronAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:SNPAnnotation to sniff_order: No module named gmap
These errors aren't just restricted to my tool ... as you can see from the above they also occur in the gmap example installed from the main galaxy toolshed.
I tried hacking the code in registration.py to get this to work ... it looks like the section where sniffers are loaded does not use the "imported_modules" variable. I was able to get this error message to go away, but I still don't get proper sniffer behaviour (eg when I click "Auto detect" when editing a dataset the dataset is set to a generic type).
Another issue is that I would like the sniffers loaded from my shed_tool to be used by the "Upload" tool .. but if I look at the source for upload.py and upload.xml I see
-- Upload.xml <command interpreter="python"> upload.py $GALAXY_ROOT_DIR $GALAXY_DATATYPES_CONF_FILE $paramfile
--Upload.py def __main__():
....
registry = Registry() registry.load_datatypes( root_dir=sys.argv[1], config=sys.argv[2] )
which looks like the upload tool is not configured to use the custom datatypes available in shed tools.
Would it be possible to make the following changes;
(a) Add custom sniffers to the sniff order when shed tools are loaded. Importantly since custom datatypes are usually quite specific I would suggest that these are loaded at the top of the sniff order? Or alternatively if a sniffer is for a datatype that descends from a superclass it should have priority over the parent class (since by definition it is more specific).
(b) Change the upload tool so that it respects custom sniffers in shed tools.
I guess that our case is a bit unusual in that we are trying to co-opt galaxy ( a genomics tool) to do proteomics ... so I understand that these changes might not be a priority. Nevertheless, if this could be done it would be fantastic for us as we could abandon our fork and have all our functionality included in a shed tool.
Regards Ira ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hello Ira, I believe proprietary datatype sniffers included in tool shed repositories are loading as expected - at least I cannot reproduce the behavior you are seeing. The datatypes_conf.xml file included in the latest revision of the gmap repository on the main Galaxy tool shed looks like this: <?xml version="1.0"?> <datatypes> <datatype_files> <datatype_file name="gmap.py"/> </datatype_files> <registration> <datatype extension="gmapdb" type="galaxy.datatypes.gmap:GmapDB" display_in_upload="False"/> <datatype extension="gmapsnpindex" type="galaxy.datatypes.gmap:GmapSnpIndex" display_in_upload="False"/> <datatype extension="iit" type="galaxy.datatypes.gmap:IntervalIndexTree" display_in_upload="True"/> <datatype extension="splicesites.iit" type="galaxy.datatypes.gmap:SpliceSitesIntervalIndexTree" display_in_upload="True"/> <datatype extension="introns.iit" type="galaxy.datatypes.gmap:IntronsIntervalIndexTree" display_in_upload="True"/> <datatype extension="snps.iit" type="galaxy.datatypes.gmap:SNPsIntervalIndexTree" display_in_upload="True"/> <datatype extension="gmap_annotation" type="galaxy.datatypes.gmap:IntervalAnnotation" display_in_upload="False"/> <datatype extension="gmap_splicesites" type="galaxy.datatypes.gmap:SpliceSiteAnnotation" display_in_upload="True"/> <datatype extension="gmap_introns" type="galaxy.datatypes.gmap:IntronAnnotation" display_in_upload="True"/> <datatype extension="gmap_snps" type="galaxy.datatypes.gmap:SNPAnnotation" display_in_upload="True"/> </registration> <sniffers> <sniffer type="galaxy.datatypes.gmap:IntervalAnnotation"/> <sniffer type="galaxy.datatypes.gmap:SpliceSiteAnnotation"/> <sniffer type="galaxy.datatypes.gmap:IntronAnnotation"/> <sniffer type="galaxy.datatypes.gmap:SNPAnnotation"/> </sniffers> </datatypes> Here is the snippet of my paster log when I install the gmap tool shed repository - notice that the tools, datatypes and sniffers are all loaded. I'm installing it from a local tool shed, but I downloaded the latest version from the main Galaxy tool shed and uploaded it with no changes to my local tool shed for testing. galaxy.web.controllers.admin_toolshed DEBUG 2012-02-06 10:50:33,686 Loading new tool panel section: gmap galaxy.util.shed_util DEBUG 2012-02-06 10:50:33,687 Installing repository 'gmap' galaxy.util.shed_util DEBUG 2012-02-06 10:50:33,687 Cloning http://test@gvk.bx.psu.edu:9009/repos/test/gmap destination directory: gmap requesting all changes adding changesets adding manifests adding file changes added 1 changesets with 10 changes to 10 files updating to branch default 10 files updated, 0 files merged, 0 files removed, 0 files unresolved galaxy.util.shed_util DEBUG 2012-02-06 10:50:34,062 Updating cloned repository to revision "dbcccd1e4dfd" 0 files updated, 0 files merged, 0 files removed, 0 files unresolved docutils WARNING 2012-02-06 10:50:34,388 <string>:10: (WARNING/2) Explicit markup ends without a blank line; unexpected unindent. galaxy.util.shed_util DEBUG 2012-02-06 10:50:34,437 Adding new row (or updating an existing row) for repository 'gmap' in the tool_shed_repository table. docutils WARNING 2012-02-06 10:50:34,533 <string>:10: (WARNING/2) Explicit markup ends without a blank line; unexpected unindent. docutils WARNING 2012-02-06 10:50:34,657 <string>:10: (WARNING/2) Explicit markup ends without a blank line; unexpected unindent. galaxy.tools DEBUG 2012-02-06 10:50:34,718 Reloading section: gmap galaxy.tools DEBUG 2012-02-06 10:50:34,769 Loaded tool id: gvk.bx.psu.edu:9009/repos/test/gmap/gmap/2.0.1, version: 2.0.1. galaxy.tools DEBUG 2012-02-06 10:50:34,813 Loaded tool id: gvk.bx.psu.edu:9009/repos/test/gmap/gmap_build/2.0.0, version: 2.0.0. docutils WARNING 2012-02-06 10:50:34,846 <string>:10: (WARNING/2) Explicit markup ends without a blank line; unexpected unindent. galaxy.tools DEBUG 2012-02-06 10:50:34,877 Loaded tool id: gvk.bx.psu.edu:9009/repos/test/gmap/gsnap/2.0.1, version: 2.0.1. galaxy.tools DEBUG 2012-02-06 10:50:34,911 Loaded tool id: gvk.bx.psu.edu:9009/repos/test/gmap/gmap_iit_store/2.0.0, version: 2.0.0. galaxy.tools DEBUG 2012-02-06 10:50:34,962 Loaded tool id: gvk.bx.psu.edu:9009/repos/test/gmap/gmap_snpindex/2.0.0, version: 2.0.0. galaxy.datatypes.registry DEBUG 2012-02-06 10:50:35,147 Loading datatypes from /Users/gvk/workspaces_2008/shed_tools/gvk.bx.psu.edu/repos/test/gmap/dbcccd1e4dfd/gmap/gmap-93911bac43da/tool-data/datatypes_conf.xml galaxy.datatypes.registry DEBUG 2012-02-06 10:50:35,159 Loaded sniffer for datatype: galaxy.datatypes.gmap:IntervalAnnotation galaxy.datatypes.registry DEBUG 2012-02-06 10:50:35,160 Loaded sniffer for datatype: galaxy.datatypes.gmap:SpliceSiteAnnotation galaxy.datatypes.registry DEBUG 2012-02-06 10:50:35,161 Loaded sniffer for datatype: galaxy.datatypes.gmap:IntronAnnotation galaxy.datatypes.registry DEBUG 2012-02-06 10:50:35,161 Loaded sniffer for datatype: galaxy.datatypes.gmap:SNPAnnotation If I stop and restart my Galaxy server after I've installed the gmap repository, everything loads correctly as well: galaxy.datatypes.registry DEBUG 2012-02-06 10:57:10,713 Loading datatypes from ../shed_tools/gvk.bx.psu.edu/repos/test/gmap/dbcccd1e4dfd/gmap/gmap-93911bac43da/tool-data/datatypes_conf.xml galaxy.datatypes.registry DEBUG 2012-02-06 10:57:10,715 Loaded sniffer for datatype: galaxy.datatypes.gmap:IntervalAnnotation galaxy.datatypes.registry DEBUG 2012-02-06 10:57:10,715 Loaded sniffer for datatype: galaxy.datatypes.gmap:SpliceSiteAnnotation galaxy.datatypes.registry DEBUG 2012-02-06 10:57:10,716 Loaded sniffer for datatype: galaxy.datatypes.gmap:IntronAnnotation galaxy.datatypes.registry DEBUG 2012-02-06 10:57:10,716 Loaded sniffer for datatype: galaxy.datatypes.gmap:SNPAnnotation Have you made any changes to your local Galaxy instance that may have resulted in proprietary datatypes / sniffers not being loaded correctly? What version of the Galaxy distribution are you running? You should be at 6672:e38a9eb21336 from the central repository for the latest tool shed code. However, proprietary datatype support has not been touched in some time. On Feb 5, 2012, at 7:04 PM, Ira Cooke wrote:
Hi,
We're developing a suite of galaxy tools for doing proteomics. As part of that we're developing a display application ( https://bitbucket.org/Andrew_Brock/proteomics-visualise ) for viewing pepXML and protXML files and we rely very heavily on custom datatypes. We've got this working in a fork of galaxy-dist ( https://bitbucket.org/iracooke/galaxy-proteomics ) but would love to be able to get rid of this fork and integrate all our customisations into a shed tool.
I initially tried following the instructions here http://wiki.g2.bx.psu.edu/Tool%20Shed#Including_proprietary_data_types_that_... and was able to successfully get my custom datatypes to load. Unfortunately though I could not get my sniffers to work. I think there are two reasons for this;
The first problem is that it looks like the sniffers are never loaded. I get errors like this when the tool is loaded;
galaxy.datatypes.registry DEBUG 2012-02-06 10:26:51,317 Loading datatypes from /Users/iracooke/Sources/shed_tools_galaxy_central/toolshed.g2.bx.psu.edu/repos/jjohnson/gmap/93911bac43da/gmap/tool-data/datatypes_conf.xml galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,318 Error appending sniffer for datatype galaxy.datatypes.gmap:IntervalAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:SpliceSiteAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:IntronAnnotation to sniff_order: No module named gmap galaxy.datatypes.registry WARNING 2012-02-06 10:26:51,319 Error appending sniffer for datatype galaxy.datatypes.gmap:SNPAnnotation to sniff_order: No module named gmap
These errors aren't just restricted to my tool ... as you can see from the above they also occur in the gmap example installed from the main galaxy toolshed.
I tried hacking the code in registration.py to get this to work ... it looks like the section where sniffers are loaded does not use the "imported_modules" variable. I was able to get this error message to go away, but I still don't get proper sniffer behaviour (eg when I click "Auto detect" when editing a dataset the dataset is set to a generic type).
Another issue is that I would like the sniffers loaded from my shed_tool to be used by the "Upload" tool .. but if I look at the source for upload.py and upload.xml I see
-- Upload.xml <command interpreter="python"> upload.py $GALAXY_ROOT_DIR $GALAXY_DATATYPES_CONF_FILE $paramfile
--Upload.py def __main__():
....
registry = Registry() registry.load_datatypes( root_dir=sys.argv[1], config=sys.argv[2] )
which looks like the upload tool is not configured to use the custom datatypes available in shed tools.
Would it be possible to make the following changes;
(a) Add custom sniffers to the sniff order when shed tools are loaded. Importantly since custom datatypes are usually quite specific I would suggest that these are loaded at the top of the sniff order? Or alternatively if a sniffer is for a datatype that descends from a superclass it should have priority over the parent class (since by definition it is more specific).
(b) Change the upload tool so that it respects custom sniffers in shed tools.
I guess that our case is a bit unusual in that we are trying to co-opt galaxy ( a genomics tool) to do proteomics ... so I understand that these changes might not be a priority. Nevertheless, if this could be done it would be fantastic for us as we could abandon our fork and have all our functionality included in a shed tool.
Regards Ira ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
Greg Von Kuster
-
Ira Cooke