On Oct 29, 2012, at 9:23 PM, Todd Oakley wrote:
I changed the name of this thread, to go in a related by new direction:
I wonder if the Galaxy developers and community have any opinions on what is the best way to organize tools into repositories. We've developed a large number of tools to allow my lab to conduct phylogenetic analyses in Galaxy. Inspired by the mothur package in Galaxy, which is all in one repo, I made the decision to add all our related tools to 1 repo on the tool shed. However, it seems that makes individual tools like raxml difficult to find for other users. Recently, we started putting these tools on to bitbucket, and organizing them in different categories (alignment, phylogenetics, orthologies, etc), which is a compromise between all-in-one-repo and each-its-own-repo.
The thing is that many of the tools do not stand alone, and really are designed to function with other tools in the package. Any philosophies or opinions are welcome, as I feel like I have not come to a good solution on this...
Todd
I've added the following tool shed wiki page to provide discussion points related to this question. http://wiki.galaxyproject.org/AToolOrASuitePerRepository Here are the current contents of the page: A single tool or a suite of tools per repository Many tool developers in the Galaxy community question the best way to organize tools in their tool shed repositories. Some groups have developed a large number of tools to allow their labs to perform analyses in Galaxy and took the approach of including all related tools in a single repository in the tool shed. Others have chosen to restrict each repository to include a single tool. What is the "best practice"? Both approaches are ok, but here are some points to consider when making this decision. Notice that these points are valid at the time this page was written, so this discussion will evolve as new tool shed features and Galaxy-related tool shed features are introduced and mature over time. The benefits of a single tool per repository Restricting a repository to include a single tool provides more flexibility to Galaxy administrators to install only those specific tools in which their users have interest. Sometimes installing a suite of tools in order to get only one or two of them is not optimal. Some time in the future, Galaxy workflows may provide the ability to search for tools defined in the workflow that are not available in the Galaxy instance. Ideally the Galaxy administrator will be able to locate and install only the precise list of missing tools in order to enable the workflow to run. For example, assume a user imported a workflow into their local Galaxy instance that was developed by someone else in a different Galaxy instance. The Galaxy workflow UI may provide a feature that searches available tool sheds for the tools required by the imported workflow that are not available in the Galaxy instance. Restricting repository contents to single tools would enable installation of only those missing tools required by the workflow. Tool shed repository that include tools have certain mercurial change set revisions that are installable into a local Galaxy instance. These revisions are defined by the versions of the tools included in the repository. Repositories that are restricted to contain a single tool will ensure that a new revision installation will be required only when that tool version changes. Repositories that include multiple tools require a new installation revision when the version of any one of the tools changes, possibly resulting in multiple versions of the same tool installed into a single Galaxy instance. Of course, Galaxy will load only a single instance of a tool version into the tool panel, but the tool and related files will still be installed on disk multiple times. The weaknesses of a single tool per repository With current tool shed features, if multiple tools share required third-party dependencies and you design your repository to install them when the repository is installed into a Galaxy instance (by including a file named tool_dependencies.xml in your repository), restricting a repository to a single tool will force you to include the same tool_dependencies.xml file in each repository whose contained tool requires the same dependency. This will also install and compile the same dependencies separately for each repository when it is installed into a Galaxy instance. In the near future, the tool shed will include a new feature that we are calling cross-repository dependencies which will eliminate this weakness. This feature will provide a means of defining a repository as a dependency for another repository. For example, the current emboss_datatypes repository in the main Galaxy tool shed will be defined as a dependency for the current emboss_5 repository in the same tool shed. So when the emboss_5 repository is installed, the emboss_datatypes repository will be automatically installed along with it. If tools are not intended to provide meaningful analyses on their own, but are designed to function with other tools, restricting a repository to a single tool will require a Galaxy administrator to install multiple repositories in order to provide all necessary tools to their users. The benefits of a suite of tools per repository With current tool shed features, if multiple tools share required third-party dependencies and you design your repository to install them when the repository is installed into a Galaxy instance (by including a file named tool_dependencies.xml in your repository), then all tools included in the repository can share the same third-party dependency, ensuring that the dependency only needs to be installed and compiled once for multiple tools. This benefit will be eliminated in the near future with the planned introduction of the cross-repository dependencies feature described above. In some cases multiple tools are not intended to provide meaningful analyses on their own, but are designed to function with other tools in the suite. In these cases, it makes sense for all tools to be installed into a Galaxy instance, and thus, the tools may all be included in a single repository. In the near future, the tool shed will include a new feature that we are calling cross-repository dependencies (see above). This feature will enable a repository to be defined as a "tool suite" of sorts, where the repository includes only atool_dependencies.xml file that defines multiple separate repositories that should be installed. Each of these repositories could contain a single tool, allowing a Galaxy administrator to install each tool separately. If the administrator chooses to install the "tool suite" repository, each separate repository would be automatically installed, providing the entire suite of tools with the single installation. This new feature could ultimately eliminate the benefits of including a suite of tools in a single repository since, as discussed above, it will eliminate the issue of having to install and compile the same version of a tool dependency for each dependent tool in separate repositories. The weaknesses of a suite of tools per repository Sometimes installing a suite of tools in order to get only one or two of them is not optimal. Restricting a repository to include a single tool provides more flexibility to Galaxy administrators to install only those specific tools in which their users have interest. Including multiple tools in a single repository may make individual tools more difficult to find with the current tool shed features. Although it is currently possible to search for specific tools by partial or complete tool names and descriptions, the ability to browse for tools in the tool shed directly in addition to browsing for repositories is planned for the near future.