Hi Peter and others,


On Oct 8, 2013, at 10:22 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

Hi Greg, Jean-Frédéric,

I'm returning to this old thread rather than starting a new one,
since it is nicely aligned with something I wanted to raise.

On Tue, Feb 19, 2013 at 2:23 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hello Jean-Frédéric,

Sorry for the delay in this response.  Please see my inline comments.

On Feb 8, 2013, at 10:33 AM, Jean-Frédéric Berthelot wrote:

Hi list,

The tool I am currently wrapping has built-in data, which may be used by the
tool users (through a relevant <from_data_table> + .LOC file configuration).
They are .fasta databases which are rather small and are thus bundled in the
tool distribution package.

Thanks to the tool_dependencies.xml file, said distribution package is
downloaded at install time, code is compiled, and since they are here,
the data files are copied to $INSTALL_DIR too, ready to be used.

After that, the user still has to edit tool-data/my_fancy_data_files.loc ;
but the thing is, during the install I know where these data files are
(since I copied those there), so I would like to save the user the trouble
and set up this file automagically.

I would have two questions:

1/ Is it okay to have tool built-in data files in $INSTALL_DIR, or would
it be considered bad practice?


This is difficult to answer.  Generally, data files should be located in a
shared location so that other tools can access them as well.  However, there
are potentially exceptions to this that are acceptable.  The fact that the
fasta data files are small and you are using a tool_dependencies.xml file to
define a relationship to them for your tools is a good approach because it
allows the data files to be used by other tools in separate repositories via
a complex repository dependency definition in the remote repository.

If these fasta data files are available for download via a clone or a url,
then in the near future the new Galaxy Data Manager (which uses a new,
special category of Galaxy tools which are of type "data_manager") may be
useful in this scenario.  Data Manager tools can be associated with tools in
a repository like yours using repository dependency definitions, so they
will be installed along with the selected repository.  These data manager
tools allow for specified data to be installed into the Galaxy environment
for use by tools.  This new component is not yet released, but it is close.
In the meantime, your approach is the only way to make this work.

If your files are not downloadable, then we might plan to allow simplified
bootstrapping of .loc files in the tol-data directory with files included in
the repository.  This would take some planning, and it's availability would
not be in the short term

Any news Greg? I see there is an empty page on the wiki here:
http://wiki.galaxyproject.org/Admin/Tools/DataManagers

And some actual content here:
http://wiki.galaxyproject.org/Admin/Tools/DataManagers/HowTo/Define


Dan Blankenberg has completed the initial implementation of the Data Manager tools and will be creating the documentation at some point.




2/ Is there a way to set up the tool-data/my_fancy_data_files.loc during the
install? Here are the options I though of:
*shipping a “real” my_fancy_data_files.loc.sample with the good paths
already set-up, which is going to be copied as the .loc file (a rather ugly
hack)


Assuming you use a file name that is not already in the Galaxy tool-data
subdirectory, the above approach is probably the only way you can do this in
a fully automated right now.  Again, when the new Data Manager is released,
it will handle this kind of automated configuration.  But in the meantime,
manual intervention is generally required to add the information to
appropriate .loc files in the tool-data directory.


Is that still the case today?


Dan will be able to provide the ideal answer to this question.




*using more <action type="shell_command"> during install to create
my_fancy_data_files.loc (but deploying this file it is not part of the tool
dependency install per se)


I advise against the above approach.  The "best practice" use of tool
dependency definitions is to restrict movement of files to location within
the defined $INSTALL_DIR (the installation directory of the tol dependency
package) or $REPOSITORY_INSTALL_DIR (the installation directory of the
repository), which is set at installation time.  Hard-coding file paths in
<action> tags is fragile, and not recommeded.


*variant of the previous : shipping my_fancy_data_files.loc as part of the
tool distribution package, and copy it through shell_command (same concern
than above).


The above approach is not recommended either - same issue as above.


I may not be following your recommendation - in a couple of tools
I provide a functional working *.loc.sample file which is installed
as the default *.loc file.

I do this in both the Blast2GO and EffectiveT3 wrappers, but in
both cases I've avoided the need for absolute paths (and the
worry about where to put the files) and used relative paths
(and put the files in $INSTALL_DIR). This works quite well:

http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go
http://toolshed.g2.bx.psu.edu/view/peterjc/effectiveT3

I believe your approach is correct and follows the "best practice" descibed in this tool shed wiki section:

http://wiki.galaxyproject.org/InstallingRepositoriesToGalaxy#Installing_Galaxy_tool_shed_repository_tools_into_a_local_Galaxy_instance

Specifically, the following paragraphs:

==================

Tool shed repositories that contain tools that include dynamically generated select list parameters that refer to an entry in the tool_data_table_conf.xml file must contain a tool_data_table_conf.xml.sample file that contains the required entry for each dynamic parameter. Similarly, any index files (i.e., ~/tool-data/xxx.loc files) to which the tool_data_table_conf.xml file entries refer must be defined in xxx.loc.sample files included in the tool shed repository along with the tools. If any of these tool_data_table_conf.xml entries or any of the required xxx.loc.sample files are missing from the tool shed repository, the tools will not properly load and metadata will not be generated for the repository. This means that the tools cannot be automatically installed into a Galaxy instance.

For those tools that include dynamically generated select list parameters that require a missing entry in the tool_data_table_conf.xml file, this file will be modified in real time by adding the entry from a tool_data_table_conf.xml.sample file contained in the tool shed repository. 

==================



However, for something like NCBI BLAST setting up some test
databases via the <action> tags would be a bit more fiddly -
although it could let me increase the tools' test coverage.

This may be fine, although I'm not quite clear on what you would be doing here.


As an aside, I've asked before about why the function tests look
at *.loc rather than *.loc.sample and not had a clear answer.

The functional tests look at .loc files because they will have uncommented, functionally correct entries.  The .loc.sample files usually have commented "sample" entries that provide an idea to the Galaxy admin as to what should actually go into the associated .loc file.  For example, twobit.loc.sample has:

#droPer1 /depot/data2/galaxy/droPer1/droPer1.2bit
#apiMel2 /depot/data2/galaxy/apiMel2/apiMel2.2bit
#droAna1 /depot/data2/galaxy/droAna1/droAna1.2bit
#droAna2 /depot/data2/galaxy/droAna2/droAna2.2bit

while twobit.loc has:

droPer1 /depot/data2/galaxy/droPer1/droPer1.2bit
apiMel2 /depot/data2/galaxy/apiMel2/apiMel2.2bit
droAna1 /depot/data2/galaxy/droAna1/droAna1.2bit
droAna2 /depot/data2/galaxy/droAna2/droAna2.2bit


As
soon as the local administrator edits the provided default *.loc
files, this could break functional tests using the *.loc.sample
values.


The intent is that the local administrator manually edits the .loc file to include the functionally correct entries based on entries in the .loc.sample file.


The simple fix is for the test framework to preferentially
load the *.loc.sample file if present:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014370.html
http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-August/016159.html

I don't agree with this - the sample files should be used as guidance for the admin to create functionally correct .loc files.  This is the same aopproach used for all Galaxy .sample files ( e.g., universe_wsgi.ini.sample <-> universe_wsgi.ini, etc )


Regards,

Peter