New subject: Data Tables and *.loc files: Using named columns versus from_data_table

9 Apr 2014

      Hi all,

In discussion about adding an NCBI BLAST data manager
https://github.com/peterjc/galaxy_blast/issues/22 based on
Dan's example, Michael Li has suggested using the new(ish)
Data Table functionality of Galaxy for using *.loc files:
https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables

Currently the BLAST+ wrappers access the blastdb.loc file
for picking a system installed nucleotide BLAST database
like this:

    <param name="database" type="select" label="Nucleotide BLAST database">
        <options from_file="blastdb.loc">
            <column name="value" index="0"/>
            <column name="name" index="1"/>
            <column name="path" index="2"/>
        </options>
    </param>

See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/nc...

With the from_data_table feature this which would be much shorter:

    <param name="database" type="select" label="Nucleotide BLAST database">
        <options from_data_table="blastdb" />
    </param>

For this to work, the column information must instead be
defined centrally in ``tool_data_table_conf.xml`` (via a
``tool_data_table_conf.xml.sample`` file), e.g.

    <table name="blastdb" comment_char="#">
        <columns>value, name, path</columns>
        <file path="tool-data/blastdb.loc" />
    </table>

For simple tools this seems quite neat, but within a single tool
suite using XML macros seems equally effective for centrally
defining the columns in the *.loc files (we do this currently).

However, what worries me is the data table XML configuration
file adds a new complexity for dependency management between
different ToolShed repositories using a *.loc file (like the *.loc
files for BLAST databases).

For the BLAST database *.loc files, the simplest solution seems
to be not to use the Data Tables feature (as we do now).

The next best solution seems to be to put the sample *.loc files
and associated data table definition XML files into a shared
ToolShed repository (called called blast_data_tables, or
blast_databases?) which would be declared as a dependency
of anything using the BLAST database *.loc files (e.g. the
BLAST+ wrappers and any data managers).

[This would be like the existing blast_datatypes ToolShed
repository which is a declared dependency of many tools
using BLAST]

Is that a good plan? What benefits does it have over simply
not using the Data Table functionality?

Thanks,

Peter

Data Tables and *.loc files: Using named columns versus from_data_table

Peter Cock

Daniel Blankenberg

Peter Cock

tags

participants (2)