Hi all,
In discussion about adding an NCBI BLAST data manager https://github.com/peterjc/galaxy_blast/issues/22 based on Dan's example, Michael Li has suggested using the new(ish) Data Table functionality of Galaxy for using *.loc files: https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables
Currently the BLAST+ wrappers access the blastdb.loc file for picking a system installed nucleotide BLAST database like this:
<param name="database" type="select" label="Nucleotide BLAST database"> <options from_file="blastdb.loc"> <column name="value" index="0"/> <column name="name" index="1"/> <column name="path" index="2"/> </options> </param>
See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/nc...
With the from_data_table feature this which would be much shorter:
<param name="database" type="select" label="Nucleotide BLAST database"> <options from_data_table="blastdb" /> </param>
For this to work, the column information must instead be defined centrally in ``tool_data_table_conf.xml`` (via a ``tool_data_table_conf.xml.sample`` file), e.g.
<table name="blastdb" comment_char="#"> <columns>value, name, path</columns> <file path="tool-data/blastdb.loc" /> </table>
For simple tools this seems quite neat, but within a single tool suite using XML macros seems equally effective for centrally defining the columns in the *.loc files (we do this currently).
However, what worries me is the data table XML configuration file adds a new complexity for dependency management between different ToolShed repositories using a *.loc file (like the *.loc files for BLAST databases).
For the BLAST database *.loc files, the simplest solution seems to be not to use the Data Tables feature (as we do now).
The next best solution seems to be to put the sample *.loc files and associated data table definition XML files into a shared ToolShed repository (called called blast_data_tables, or blast_databases?) which would be declared as a dependency of anything using the BLAST database *.loc files (e.g. the BLAST+ wrappers and any data managers).
[This would be like the existing blast_datatypes ToolShed repository which is a declared dependency of many tools using BLAST]
Is that a good plan? What benefits does it have over simply not using the Data Table functionality?
Thanks,
Peter