Hi Greg et al, I've just been looking over your slides from last week about the new 'Galaxy Tool Shed', which are posted online here: http://wiki.g2.bx.psu.edu/GCC2011 http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf They talk about how you will be tracking individual tools in hg repositories. I can see two ways this might work: (1) Each of these tool specific repositories (or branches if you just make one repository for each tool owner) would be a full fork of the Galaxy code base. This allows in principle tools to include changes to core functionality (but that seems dangerous due to potential merge clashes), and any existing tool contributor's pre-existing hg forks on bitbucket might be reused. (2) Each of these tool specific repositories would ONLY track the tool specific files you'd add to Galaxy to install the tool. So, typically there would be an XML file, perhaps a wrapper script, maybe a sample loc file, and a plain text readme file. I'm guessing you've gone for something along the lines of idea (2), but I would love to hear more about how this will all work. e.g. Where would the tool shed repositories be hosted, and would tool authors use hg to work with them, or something like the current web based tool upload? Regards, Peter
Hi Peter, Greg will probably reply, but I'll throw in my $0.02 as well. Peter Cock wrote:
Hi Greg et al,
I've just been looking over your slides from last week about the new 'Galaxy Tool Shed', which are posted online here:
http://wiki.g2.bx.psu.edu/GCC2011
http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf
They talk about how you will be tracking individual tools in hg repositories.
I can see two ways this might work:
(1) Each of these tool specific repositories (or branches if you just make one repository for each tool owner) would be a full fork of the Galaxy code base. This allows in principle tools to include changes to core functionality (but that seems dangerous due to potential merge clashes), and any existing tool contributor's pre-existing hg forks on bitbucket might be reused.
The tool shed isn't really intended for framework changes - I would suggest keeping these as bitbucket forks, although it would certainly be good if we had a way to locate the list of such forks centrally.
(2) Each of these tool specific repositories would ONLY track the tool specific files you'd add to Galaxy to install the tool. So, typically there would be an XML file, perhaps a wrapper script, maybe a sample loc file, and a plain text readme file.
I'm guessing you've gone for something along the lines of idea (2), but I
Yep.
would love to hear more about how this will all work. e.g. Where would the tool shed repositories be hosted, and would tool authors use hg to work with them, or something like the current web based tool upload?
They're hosted here, and you can check them out and work with them locally as you do the Galaxy source itself, or use the new web-based upload to upload individual files or tarballs. Have a look at the test instance of the next-gen toolshed here if you'd like to see how it works: http://testtoolshed.g2.bx.psu.edu/ Please feel free to use this as a sandbox and report any issues you find. --nate
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Wed, Jun 1, 2011 at 3:22 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi Peter,
Greg will probably reply, but I'll throw in my $0.02 as well.
Great - but with your answers you've triggered more questions ;)
Peter Cock wrote:
Hi Greg et al,
I've just been looking over your slides from last week about the new 'Galaxy Tool Shed', which are posted online here:
http://wiki.g2.bx.psu.edu/GCC2011
http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf
They talk about how you will be tracking individual tools in hg repositories.
I can see two ways this might work:
(1) Each of these tool specific repositories (or branches if you just make one repository for each tool owner) would be a full fork of the Galaxy code base. This allows in principle tools to include changes to core functionality (but that seems dangerous due to potential merge clashes), and any existing tool contributor's pre-existing hg forks on bitbucket might be reused.
The tool shed isn't really intended for framework changes - I would suggest keeping these as bitbucket forks, although it would certainly be good if we had a way to locate the list of such forks centrally.
Well, as long as the repository is created by forking on bitbucket, then the link existing in the bitbucket web interface. https://bitbucket.org/galaxy/galaxy-central/descendants
(2) Each of these tool specific repositories would ONLY track the tool specific files you'd add to Galaxy to install the tool. So, typically there would be an XML file, perhaps a wrapper script, maybe a sample loc file, and a plain text readme file.
I'm guessing you've gone for something along the lines of idea (2), but I
Yep.
It did seem the most likely route.
would love to hear more about how this will all work. e.g. Where would the tool shed repositories be hosted, and would tool authors use hg to work with them, or something like the current web based tool upload?
They're hosted here, and you can check them out and work with them locally as you do the Galaxy source itself, or use the new web-based upload to upload individual files or tarballs.
Have a look at the test instance of the next-gen toolshed here if you'd like to see how it works:
http://testtoolshed.g2.bx.psu.edu/
Please feel free to use this as a sandbox and report any issues you find.
I see the existing usernames and passwords from the old Tool Shed were transferred - that makes life easier. And it lists the hg information, e.g. hg clone http://peterjc@testtoolshed.g2.bx.psu.edu/repos/peterjc/venn_list hg clone http://peterjc@testtoolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp What happens with branches? Would the Tool Shed just show the default branch? That seems best for a simple UI. I have a query regarding the way the tools are shown in tables and the "version" column, which shows a changeset and revision number. According to Greg's slides (slide #10, titled "Simpler tool versioning" which seems ironic to me), the old numerical version is still there in the XML - and I'd prefer to see that. How about having both shown (two columns, perhaps call them "Public version" and "hg version" or "hg revision"). With regards to the planned installation functionality, what happens when a tool repository (aka Tool Suite in the old model) contains several XML wrappers - would you be able to choose which are wanted? The use case I have here is when several tools share some common dependency (which should be tracked in a single repository), and were therefore useful to bundle together as a suite, but where not all the tools will be of global interest (e.g. My TMHMM, SignalP, etc suite). Peter
On Wed, Jun 1, 2011 at 4:00 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Peter Cock wrote:
would love to hear more about how this will all work. e.g. Where would the tool shed repositories be hosted, and would tool authors use hg to work with them, or something like the current web based tool upload?
They're hosted here, and you can check them out and work with them locally as you do the Galaxy source itself, or use the new web-based upload to upload individual files or tarballs.
Have a look at the test instance of the next-gen toolshed here if you'd like to see how it works:
http://testtoolshed.g2.bx.psu.edu/
Please feel free to use this as a sandbox ...
Does that mean it will be cleared as some point before taking over, so we can make deliberate test changes without the fear of them being applied by other Galaxy administrators? If so, please stick a big warning on the http://testtoolshed.g2.bx.psu.edu/ test server (e.g. replace the top left link "Galaxy Tool Shed" with "Galaxy TESTING Tool Shed"), and ideally some text telling people to continue to use http://community.g2.bx.psu.edu/ for production servers.
... and report any issues you find.
First bug report: https://bitbucket.org/galaxy/galaxy-central/issue/564/ It seems you've making a lot of work for yourselves by reimplementing a web GUI for an hg repository. Isn't there an existing web server thing you would have running on http://testtoolshed.g2.bx.psu.edu/ to take care of this side of things? Ideally something you could theme and embed within the frames of the current Tool Shed UI. Peter
On Jun 1, 2011, at 11:19 AM, Peter Cock wrote:
On Wed, Jun 1, 2011 at 4:00 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Peter Cock wrote:
would love to hear more about how this will all work. e.g. Where would the tool shed repositories be hosted, and would tool authors use hg to work with them, or something like the current web based tool upload?
They're hosted here, and you can check them out and work with them locally as you do the Galaxy source itself, or use the new web-based upload to upload individual files or tarballs.
Have a look at the test instance of the next-gen toolshed here if you'd like to see how it works:
http://testtoolshed.g2.bx.psu.edu/
Please feel free to use this as a sandbox ...
Does that mean it will be cleared as some point before taking over, so we can make deliberate test changes without the fear of them being applied by other Galaxy administrators? If so, please stick a big warning on the http://testtoolshed.g2.bx.psu.edu/ test server (e.g. replace the top left link "Galaxy Tool Shed" with "Galaxy TESTING Tool Shed"), and ideally some text telling people to continue to use http://community.g2.bx.psu.edu/ for production servers.
Yes - we'll do this. This test tool shed should be used for testing ( we'll keep it available indefinitely ) much like the Galaxy test instance we host here at Penn State. Fell free to mess with anything you want. Please report bugs and I'll fix them as fast as possible. Very soon there will be a main production tool shed available at http://toolshed.g2.bx.psu.edu.
... and report any issues you find.
First bug report: https://bitbucket.org/galaxy/galaxy-central/issue/564/
It seems you've making a lot of work for yourselves by reimplementing a web GUI for an hg repository. Isn't there an existing web server thing you would have running on http://testtoolshed.g2.bx.psu.edu/ to take care of this side of things? Ideally something you could theme and embed within the frames of the current Tool Shed UI.
On my list - thanks for reporting it!
Peter
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Hello Peter - I finally got a chance to jump in - see my inline comments... On Jun 1, 2011, at 11:00 AM, Peter Cock wrote:
On Wed, Jun 1, 2011 at 3:22 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi Peter,
Greg will probably reply, but I'll throw in my $0.02 as well.
Great - but with your answers you've triggered more questions ;)
Peter Cock wrote:
Hi Greg et al,
I've just been looking over your slides from last week about the new 'Galaxy Tool Shed', which are posted online here:
http://wiki.g2.bx.psu.edu/GCC2011
http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf
They talk about how you will be tracking individual tools in hg repositories.
I can see two ways this might work:
(1) Each of these tool specific repositories (or branches if you just make one repository for each tool owner) would be a full fork of the Galaxy code base. This allows in principle tools to include changes to core functionality (but that seems dangerous due to potential merge clashes), and any existing tool contributor's pre-existing hg forks on bitbucket might be reused.
The tool shed isn't really intended for framework changes - I would suggest keeping these as bitbucket forks, although it would certainly be good if we had a way to locate the list of such forks centrally.
Well, as long as the repository is created by forking on bitbucket, then the link existing in the bitbucket web interface. https://bitbucket.org/galaxy/galaxy-central/descendants
What's important here is that each tool or set of tools is it's own separate entity - see the future "big picture" highlights below for reasons.
(2) Each of these tool specific repositories would ONLY track the tool specific files you'd add to Galaxy to install the tool. So, typically there would be an XML file, perhaps a wrapper script, maybe a sample loc file, and a plain text readme file.
I'm guessing you've gone for something along the lines of idea (2), but I
Yep.
It did seem the most likely route.
would love to hear more about how this will all work. e.g. Where would the tool shed repositories be hosted, and would tool authors use hg to work with them, or something like the current web based tool upload?
They're hosted here, and you can check them out and work with them locally as you do the Galaxy source itself, or use the new web-based upload to upload individual files or tarballs.
Have a look at the test instance of the next-gen toolshed here if you'd like to see how it works:
http://testtoolshed.g2.bx.psu.edu/
Please feel free to use this as a sandbox and report any issues you find.
I see the existing usernames and passwords from the old Tool Shed were transferred - that makes life easier. And it lists the hg information, e.g.
hg clone http://peterjc@testtoolshed.g2.bx.psu.edu/repos/peterjc/venn_list hg clone http://peterjc@testtoolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp
What happens with branches? Would the Tool Shed just show the default branch? That seems best for a simple UI.
Some of the branching details are yet to be worked out, but forks are easy because repository urls include the unique username of the Galaxy user.
I have a query regarding the way the tools are shown in tables and the "version" column, which shows a changeset and revision number. According to Greg's slides (slide #10, titled "Simpler tool versioning" which seems ironic to me), the old numerical version is still there in the XML - and I'd prefer to see that. How about having both shown (two columns, perhaps call them "Public version" and "hg version" or "hg revision").
We can certainly do this, but what would you like to see for tool suites and other tool "types"? The old Galaxy tool shed strictly required a suite_config.xml file that included the overall version of the suite. To make tool development easier, we're no longer requiring the inclusion of a suite_config.xml file ( we don't even differentiate types of tools since everything is a repository ). The definition of a tool in the next gen tool shed, is fairly loose. A tool could be data, it could be an exported workflow, it could be a suite of tools, a single tool, or just a set of files. So we'll need to define an easy way to provide a version of the tool if it will be different than the version of the repository tip.
With regards to the planned installation functionality, what happens when a tool repository (aka Tool Suite in the old model) contains several XML wrappers - would you be able to choose which are wanted?
Yes - see below...
The use case I have here is when several tools share some common dependency (which should be tracked in a single repository), and were therefore useful to bundle together as a suite, but where not all the tools will be of global interest (e.g. My TMHMM, SignalP, etc suite).
Here's the future "big picture" highlights. Many of the details are yet to be defined and fleshed out... We're hoping that in the near future there will be many local tool sheds ( just like Galaxy instances ). I'm thinking that there will be a central tool shed "broker" of sorts that is hosted by the Galaxy team. This broker will provide 2 basic functions. It will enable local tool sheds ( including the current tool shed hosted by the Galaxy team ) to advertise their tools, and it will allow local Galaxy instances to use those advertisements to find tools that the local Galaxy instance's users are interested in. This specific point has not yet been discussed to any depth, so consider it fluid for now. When a Galaxy instance's admin locates tools within a specific tool shed that they want to install, they will be able to install them via a Galaxy tool installation control panel. Think of a UI that provides a check-boxed list of tools that have been found in some tool shed or sheds. The Galaxy admin will check those tools he wants to install, and the tools, along with all dependencies will automatically be installed in the local Galaxy instance. Dependencies could include 3rd party binaries, maybe some form of data, and other forms of dependencies. This is another good reason to keep tools separated in their own repositories. The installation will be virtually automatic, requiring little or no manual intervention via a "package manage" of sorts. This will be done using a combination of fabric scripts, and other components. All of the underlying mercurial stuff will be handled beneath the UI layer.
Peter
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Wed, Jun 1, 2011 at 4:22 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hello Peter - I finally got a chance to jump in - see my inline comments...
Hi :)
What happens with branches? Would the Tool Shed just show the default branch? That seems best for a simple UI.
Some of the branching details are yet to be worked out, but forks are easy because repository urls include the unique username of the Galaxy user.
Well, yes and no - as long as there are competing versions of a Galaxy tool (e.g. from an original author and a fork by a second author), and they use the same ID in their XML, you have a clash. This will have to be considered in the (automated) install interface. i.e. In general, when installing or updating any tool, there may be existing versions of some components already present. In fact two completely unrelated tools could even have the same XML ID by accident.
I have a query regarding the way the tools are shown in tables and the "version" column, which shows a changeset and revision number. According to Greg's slides (slide #10, titled "Simpler tool versioning" which seems ironic to me), the old numerical version is still there in the XML - and I'd prefer to see that. How about having both shown (two columns, perhaps call them "Public version" and "hg version" or "hg revision").
We can certainly do this, but what would you like to see for tool suites and other tool "types"? The old Galaxy tool shed strictly required a suite_config.xml file that included the overall version of the suite. To make tool development easier, we're no longer requiring the inclusion of a suite_config.xml file ( we don't even differentiate types of tools since everything is a repository ). The definition of a tool in the next gen tool shed, is fairly loose. A tool could be data, it could be an exported workflow, it could be a suite of tools, a single tool, or just a set of files. So we'll need to define an easy way to provide a version of the tool if it will be different than the version of the repository tip.
I see what you mean for the "suite" case. Maybe on the view details page each constituent tool could be shown with its "classical" version number from the XML file?
Here's the future "big picture" highlights. Many of the details are yet to be defined and fleshed out...
We're hoping that in the near future there will be many local tool sheds ( just like Galaxy instances ). I'm thinking that there will be a central tool shed "broker" of sorts that is hosted by the Galaxy team. This broker will provide 2 basic functions. It will enable local tool sheds ( including the current tool shed hosted by the Galaxy team ) to advertise their tools, and it will allow local Galaxy instances to use those advertisements to find tools that the local Galaxy instance's users are interested in. This specific point has not yet been discussed to any depth, so consider it fluid for now.
I'm not immediately sold on this plan. To me one of the big plus points of having a single "Official" Tool Shed looked after by the Galaxy team is the convenience factor (a one stop shop), which requires critical mass, plus whatever QA happens as part of the current approval process. I would regard it as a step backwards if in order to hunt for a wrapper for a given tool, I had to resort to Google in order to find all the individual Galaxy Tool Sheds.
When a Galaxy instance's admin locates tools within a specific tool shed that they want to install, they will be able to install them via a Galaxy tool installation control panel. Think of a UI that provides a check-boxed list of tools that have been found in some tool shed or sheds. The Galaxy admin will check those tools he wants to install, and the tools, along with all dependencies will automatically be installed in the local Galaxy instance. Dependencies could include 3rd party binaries, maybe some form of data, and other forms of dependencies. This is another good reason to keep tools separated in their own repositories.
If you mean by "dependencies" the small task of installing the tool XML and associated scripts and data files currently bundled in the tar balls on the current Tool Shed, that seems fine. Anything beyond that seems difficult and likely to impose a significant extra load on tool wrapper authors.
The installation will be virtually automatic, requiring little or no manual intervention via a "package manage" of sorts. This will be done using a combination of fabric scripts, and other components. All of the underlying mercurial stuff will be handled beneath the UI layer.
This larger aim of installing the underlying dependencies is impossible in general - but that seems to be what you want to aim for. Consider obvious use case of closed source (non-redistributable) 3rd party binaries. I can think of several examples from the current Tool Shed wrappers, including the Roche "Newbler" off instrument applications, TMHMM and SignalP. Even if you just hope to cover open source tool dependencies, this is another big problem which seems like something Galaxy shouldn't be taking on. Frankly the only way I expect this grand plan to have any practical chance of success is if you limit yourselves to a single existing Linux package management platform like RPM or Deb files (although doing that would limit Galaxy's appeal). e.g. Work hand in hand with Debian-Med to ensure any missing tool is covered. Are you biting off more than you can chew? I hope I am misinterpreting your plans. (And for the umpteenth time, I am frustrated I couldn't make it to the Galaxy conference last week in person - more for this kind of discussion rather than the talks themselves. Will you be at BOSC or ISMB 2011 in Vienna? Maybe that could be another thread...) Regards, Peter
Peter Cock wrote:
Well, yes and no - as long as there are competing versions of a Galaxy tool (e.g. from an original author and a fork by a second author), and they use the same ID in their XML, you have a clash. This will have to be considered in the (automated) install interface. i.e. In general, when installing or updating any tool, there may be existing versions of some components already present. In fact two completely unrelated tools could even have the same XML ID by accident.
I agree there could be a problem with tool ID uniqueness. We've talked about suggesting that people namespace their tool IDs to prevent this, but nothing formal has materialized at this point.
I'm not immediately sold on this plan. To me one of the big plus points of having a single "Official" Tool Shed looked after by the Galaxy team is the convenience factor (a one stop shop), which requires critical mass, plus whatever QA happens as part of the current approval process. I would regard it as a step backwards if in order to hunt for a wrapper for a given tool, I had to resort to Google in order to find all the individual Galaxy Tool Sheds.
It'll be possible for people to run their own Tool Sheds if they'd like, for whatever purpose - and this may be necessary for sharing extremely large data which we can't possibly host at the main Shed, but there should be an aggregator somewhere which lists all of the available public Sheds and makes it easy to add them as new sources to your Galaxy install. Like a slightly more organized Debian APT system.
If you mean by "dependencies" the small task of installing the tool XML and associated scripts and data files currently bundled in the tar balls on the current Tool Shed, that seems fine. Anything beyond that seems difficult and likely to impose a significant extra load on tool wrapper authors.
It'll be up to the authors to decide what level of complexity they care to handle, but we want to move away from the situation where someone installs a "tool" but finds that it's unusable because the actual underlying dependency doesn't exist and is non-trivial to install.
This larger aim of installing the underlying dependencies is impossible in general - but that seems to be what you want to aim for. Consider obvious use case of closed source (non-redistributable) 3rd party binaries. I can think of several examples from the current Tool Shed wrappers, including the Roche "Newbler" off instrument applications, TMHMM and SignalP.
Agreed, thankfully, the current dependency system (tool_dependency_dir in the config file (not in the sample config, sorry, I'll rememdy that shortly!)) only requires that you have an environment file that configures whatever is necessary (generally just $PATH) to find a dependency. So the tools in the Tool Shed would provide the XML, wrapper script (if necessary), and then instructions or perhaps an interface to configure the env file.
Even if you just hope to cover open source tool dependencies, this is another big problem which seems like something Galaxy shouldn't be taking on. Frankly the only way I expect this grand plan to have any practical chance of success is if you limit yourselves to a single existing Linux package management platform like RPM or Deb files (although doing that would limit Galaxy's appeal). e.g. Work hand in hand with Debian-Med to ensure any missing tool is covered.
Distributing binaries for the core platforms (Linux i686/x86_64) and Mac OS X is probably not terribly difficult for us, but would be more work for for 3rd party developers - but the choice to do this is up to them. I also haven't given too much though about how this would work. dpkg and rpm have the upside of being deterministic, but the downside of being platform-specific, requiring root, and not having much ability to install to varying paths. A fallback to source if binaries are not available would also be nice, if it's possible to write some easy instructions on how to compile, but of course this won't always be the case.
Are you biting off more than you can chew? I hope I am misinterpreting your plans.
Hopefully not! We're trying to think this through pretty thoroughly before we get started, thanks for joining in the discussion. =)
(And for the umpteenth time, I am frustrated I couldn't make it to the Galaxy conference last week in person - more for this kind of discussion rather than the talks themselves. Will you be at BOSC or ISMB 2011 in Vienna? Maybe that could be another thread...)
Agreed! I do believe there are some people going to BOSC, Dave will hopefully chime in with the details (when he's awake, I think he was only flying back today). --nate
Regards,
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Wed, Jun 1, 2011 at 5:25 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Peter Cock wrote:
Well, yes and no - as long as there are competing versions of a Galaxy tool (e.g. from an original author and a fork by a second author), and they use the same ID in their XML, you have a clash. This will have to be considered in the (automated) install interface. i.e. In general, when installing or updating any tool, there may be existing versions of some components already present. In fact two completely unrelated tools could even have the same XML ID by accident.
I agree there could be a problem with tool ID uniqueness. We've talked about suggesting that people namespace their tool IDs to prevent this, but nothing formal has materialized at this point.
That sounds sensible, and the sooner the better.
I'm not immediately sold on this plan. To me one of the big plus points of having a single "Official" Tool Shed looked after by the Galaxy team is the convenience factor (a one stop shop), which requires critical mass, plus whatever QA happens as part of the current approval process. I would regard it as a step backwards if in order to hunt for a wrapper for a given tool, I had to resort to Google in order to find all the individual Galaxy Tool Sheds.
It'll be possible for people to run their own Tool Sheds if they'd like, for whatever purpose - and this may be necessary for sharing extremely large data which we can't possibly host at the main Shed, but there should be an aggregator somewhere which lists all of the available public Sheds and makes it easy to add them as new sources to your Galaxy install. Like a slightly more organized Debian APT system.
If there is an official "meta tool shed" aggregator, that would address my main concern about fragmenting things.
If you mean by "dependencies" the small task of installing the tool XML and associated scripts and data files currently bundled in the tar balls on the current Tool Shed, that seems fine. Anything beyond that seems difficult and likely to impose a significant extra load on tool wrapper authors.
It'll be up to the authors to decide what level of complexity they care to handle,
Good - that silences a lot of my worries.
... but we want to move away from the situation where someone installs a "tool" but finds that it's unusable because the actual underlying dependency doesn't exist and is non-trivial to install.
Improving the documentation shown on the tool shed could help here - make it easier for the tool wrapper to tell the Tool Shed user what will be required. Currently we get a short plain text box as part of the upload (no markup), and can include a (plain text) readme file which is easily viewable from the tool shed. I've just filed an enhancement request on a related idea: https://bitbucket.org/galaxy/galaxy-central/issue/565/ Show mockup of tool GUI in Galaxy Tool Shed
This larger aim of installing the underlying dependencies is impossible in general - but that seems to be what you want to aim for. Consider obvious use case of closed source (non-redistributable) 3rd party binaries. I can think of several examples from the current Tool Shed wrappers, including the Roche "Newbler" off instrument applications, TMHMM and SignalP.
Agreed, thankfully, the current dependency system (tool_dependency_dir in the config file (not in the sample config, sorry, I'll rememdy that shortly!)) only requires that you have an environment file that configures whatever is necessary (generally just $PATH) to find a dependency. So the tools in the Tool Shed would provide the XML, wrapper script (if necessary), and then instructions or perhaps an interface to configure the env file.
I'd hope the common case where all that is required is the tool binary to be on the path, would not require any extra configuration files. See also: https://bitbucket.org/galaxy/galaxy-central/issue/82
[cut]
Are you biting off more than you can chew? I hope I am misinterpreting your plans.
Hopefully not! We're trying to think this through pretty thoroughly before we get started, thanks for joining in the discussion. =)
I've been reassured :) Peter
(apologies in advance, limiting my response to the two questions below) On Jun 1, 2011, at 11:54 AM, Peter Cock wrote:
On Wed, Jun 1, 2011 at 5:25 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Peter Cock wrote:
Well, yes and no - as long as there are competing versions of a Galaxy tool (e.g. from an original author and a fork by a second author), and they use the same ID in their XML, you have a clash. This will have to be considered in the (automated) install interface. i.e. In general, when installing or updating any tool, there may be existing versions of some components already present. In fact two completely unrelated tools could even have the same XML ID by accident.
I agree there could be a problem with tool ID uniqueness. We've talked about suggesting that people namespace their tool IDs to prevent this, but nothing formal has materialized at this point.
That sounds sensible, and the sooner the better.
Agreed. I think simple namespace prefixes (maybe hg account?) is the easiest option.
I'm not immediately sold on this plan. To me one of the big plus points of having a single "Official" Tool Shed looked after by the Galaxy team is the convenience factor (a one stop shop), which requires critical mass, plus whatever QA happens as part of the current approval process. I would regard it as a step backwards if in order to hunt for a wrapper for a given tool, I had to resort to Google in order to find all the individual Galaxy Tool Sheds.
It'll be possible for people to run their own Tool Sheds if they'd like, for whatever purpose - and this may be necessary for sharing extremely large data which we can't possibly host at the main Shed, but there should be an aggregator somewhere which lists all of the available public Sheds and makes it easy to add them as new sources to your Galaxy install. Like a slightly more organized Debian APT system.
If there is an official "meta tool shed" aggregator, that would address my main concern about fragmenting things.
Not sure how feasible this is, but could you use hg subrepositories for this purpose? For instance, have a 'blessed' set of galaxy tool sheds (as subrepos) listed in a main tool shed repository. One of the nice advantages of this is it could allow one to use git or svn, though I think sticking with hg-only repos is the simplest option for now. chris PS - wonderful conference, sorry that Peter couldn't make it!
On Wednesday, June 1, 2011, Chris Fields <cjfields@illinois.edu> wrote:
(apologies in advance, limiting my response to the two questions below)
On Jun 1, 2011, at 11:54 AM, Peter Cock wrote:
On Wed, Jun 1, 2011 at 5:25 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Peter Cock wrote:
Well, yes and no - as long as there are competing versions of a Galaxy tool (e.g. from an original author and a fork by a second author), and they use the same ID in their XML, you have a clash. This will have to be considered in the (automated) install interface. i.e. In general, when installing or updating any tool, there may be existing versions of some components already present. In fact two completely unrelated tools could even have the same XML ID by accident.
I agree there could be a problem with tool ID uniqueness. We've talked about suggesting that people namespace their tool IDs to prevent this, but nothing formal has materialized at this point.
That sounds sensible, and the sooner the better.
Agreed. I think simple namespace prefixes (maybe hg account?) is the easiest option.
That sounds good - although I'd suggest the group's name might be a valid alternative - then an underscore or hyphen, and the tool specific ID which would typically be based on the name of the tool being wrapped. If it were up to me I'd go further and recommend a restricted set of characters (e.g. Alphanumeric and one of hyphen and underscore), with the additional recommendation that the tool's XML filename follows suit. e.g. signalp.xml with ID peterjc-signalp Obviously we'd have to have a "grandfather clause" exemption for all tools to date because changing their ID would break saved workflows. As an aside, I regret including the word "wrapper" in the NCBI BLAST+ wrappers since most Galaxy tools are just wrappers around existing tools, but it's done now. Peter
Peter Cock wrote:
If there is an official "meta tool shed" aggregator, that would address my main concern about fragmenting things.
If nothing else, there can be a wiki page, although something programatic would be more ideal.
... but we want to move away from the situation where someone installs a "tool" but finds that it's unusable because the actual underlying dependency doesn't exist and is non-trivial to install.
Improving the documentation shown on the tool shed could help here - make it easier for the tool wrapper to tell the Tool Shed user what will be required.
Currently we get a short plain text box as part of the upload (no markup), and can include a (plain text) readme file which is easily viewable from the tool shed. I've just filed an enhancement request on a related idea:
https://bitbucket.org/galaxy/galaxy-central/issue/565/ Show mockup of tool GUI in Galaxy Tool Shed
Yeah, eventually we'll have to parse the tool configs in the repo, so functionality like this should show up as the Shed matures. Not sure about the difficulty of doing the tool form mockup, but I like the idea.
This larger aim of installing the underlying dependencies is impossible in general - but that seems to be what you want to aim for. Consider obvious use case of closed source (non-redistributable) 3rd party binaries. I can think of several examples from the current Tool Shed wrappers, including the Roche "Newbler" off instrument applications, TMHMM and SignalP.
Agreed, thankfully, the current dependency system (tool_dependency_dir in the config file (not in the sample config, sorry, I'll rememdy that shortly!)) only requires that you have an environment file that configures whatever is necessary (generally just $PATH) to find a dependency. So the tools in the Tool Shed would provide the XML, wrapper script (if necessary), and then instructions or perhaps an interface to configure the env file.
I'd hope the common case where all that is required is the tool binary to be on the path, would not require any extra configuration files. See also: https://bitbucket.org/galaxy/galaxy-central/issue/82
Well, use of the dependency system isn't required, so just setting things up on the $PATH is always a possibility. I was going to suggest that your patch could be applied if it was conditional on the local runner and checked after any <requirement type="package"> dependencies were setup, but there's still the problem of people running jobs through the local runner which are actually sent to the cluster without Galaxy's knowledge. Perhaps this is something we shouldn't worry too much about, but I know there are people doing it. --nate
[cut]
Are you biting off more than you can chew? I hope I am misinterpreting your plans.
Hopefully not! We're trying to think this through pretty thoroughly before we get started, thanks for joining in the discussion. =)
I've been reassured :)
Peter
On Wednesday, June 1, 2011, Nate Coraor <nate@bx.psu.edu> wrote:
Peter Cock wrote:
... but we want to move away from the situation where someone installs a "tool" but finds that it's unusable because the actual underlying dependency doesn't exist and is non-trivial to install.
Improving the documentation shown on the tool shed could help here - make it easier for the tool wrapper to tell the Tool Shed user what will be required.
Currently we get a short plain text box as part of the upload (no markup), and can include a (plain text) readme file which is easily viewable from the tool shed. I've just filed an enhancement request on a related idea:
https://bitbucket.org/galaxy/galaxy-central/issue/565/ Show mockup of tool GUI in Galaxy Tool Shed
Yeah, eventually we'll have to parse the tool configs in the repo, so functionality like this should show up as the Shed matures. Not sure about the difficulty of doing the tool form mockup, but I like the idea.
That's a start :)
This larger aim of installing the underlying dependencies is impossible in general - but that seems to be what you want to aim for. Consider obvious use case of closed source (non-redistributable) 3rd party binaries. I can think of several examples from the current Tool Shed wrappers, including the Roche "Newbler" off instrument applications, TMHMM and SignalP.
Agreed, thankfully, the current dependency system (tool_dependency_dir in the config file (not in the sample config, sorry, I'll rememdy that shortly!)) only requires that you have an environment file that configures whatever is necessary (generally just $PATH) to find a dependency. So the tools in the Tool Shed would provide the XML, wrapper script (if necessary), and then instructions or perhaps an interface to configure the env file.
I'd hope the common case where all that is required is the tool binary to be on the path, would not require any extra configuration files. See also: https://bitbucket.org/galaxy/galaxy-central/issue/82
Well, use of the dependency system isn't required, so just setting things up on the $PATH is always a possibility. I was going to suggest that your patch could be applied if it was conditional on the local runner and checked after any <requirement type="package"> dependencies were setup, ...
Is that a request for me to update the patch? I've not delved into the job runner code before, so it might take me a bit longer that it would take you. Hint hint ;) I'd help with testing though.
... but there's still the problem of people running jobs through the local runner which are actually sent to the cluster without Galaxy's knowledge. Perhaps this is something we shouldn't worry too much about, but I know there are people doing it.
You mean if Galaxy blindly calls a tool or script, and that script then submits the job to the cluster? I'd say checking the cluster dependencies there was the tool author's responsibility. Peter
Peter Cock wrote:
Well, use of the dependency system isn't required, so just setting things up on the $PATH is always a possibility. I was going to suggest that your patch could be applied if it was conditional on the local runner and checked after any <requirement type="package"> dependencies were setup, ...
Is that a request for me to update the patch? I've not delved into the job runner code before, so it might take me a bit longer that it would take you. Hint hint ;) I'd help with testing though.
It's not a completely trivial thing, which is why I didn't do it at the time. It's probably something that should be added to the DRM wrapper script so that a nice error message can be supplied. I can't think of a way to check at tool load that wouldn't be painfully slow.
... but there's still the problem of people running jobs through the local runner which are actually sent to the cluster without Galaxy's knowledge. Perhaps this is something we shouldn't worry too much about, but I know there are people doing it.
You mean if Galaxy blindly calls a tool or script, and that script then submits the job to the cluster? I'd say checking the cluster dependencies there was the tool author's responsibility.
Yeah, that's the idea. Unfortunately, if the binary isn't installed on the Galaxy server (which is irrelevant), the tool won't load, which is certainly not what we want. --nate
Peter
I apologize for jumping on to this thread a bit late. I read below that there is a plan to pull tools into a galaxy installation automagically. I wonder if you plan on providing some kind of API to query the tool registry and discover the tools and install them into an existing galaxy installation. PS: The link : How to upload, download and install tools under Help seems to be broken. On Jun 1, 2011, at 3:00 PM, Nate Coraor wrote:
Peter Cock wrote:
Well, use of the dependency system isn't required, so just setting things up on the $PATH is always a possibility. I was going to suggest that your patch could be applied if it was conditional on the local runner and checked after any <requirement type="package"> dependencies were setup, ...
Is that a request for me to update the patch? I've not delved into the job runner code before, so it might take me a bit longer that it would take you. Hint hint ;) I'd help with testing though.
It's not a completely trivial thing, which is why I didn't do it at the time. It's probably something that should be added to the DRM wrapper script so that a nice error message can be supplied. I can't think of a way to check at tool load that wouldn't be painfully slow.
... but there's still the problem of people running jobs through the local runner which are actually sent to the cluster without Galaxy's knowledge. Perhaps this is something we shouldn't worry too much about, but I know there are people doing it.
You mean if Galaxy blindly calls a tool or script, and that script then submits the job to the cluster? I'd say checking the cluster dependencies there was the tool author's responsibility.
Yeah, that's the idea. Unfortunately, if the binary isn't installed on the Galaxy server (which is irrelevant), the tool won't load, which is certainly not what we want.
--nate
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ravi K Madduri The Globus Alliance | Argonne National Laboratory | University of Chicago http://www.mcs.anl.gov/~madduri
On Thu, Jun 16, 2011 at 3:00 AM, Ravi Madduri <madduri@mcs.anl.gov> wrote:
I apologize for jumping on to this thread a bit late. I read below that there is a plan to pull tools into a galaxy installation automagically. I wonder if you plan on providing some kind of API to query the tool registry and discover the tools and install them into an existing galaxy installation.
Yes, have a look at Greg's slides from the Galaxy Community Conference http://wiki.g2.bx.psu.edu/GCC2011 Peter
participants (5)
-
Chris Fields
-
Greg Von Kuster
-
Nate Coraor
-
Peter Cock
-
Ravi Madduri