Data Tables and *.loc files: Using named columns versus from_data_table
Hi all, In discussion about adding an NCBI BLAST data manager https://github.com/peterjc/galaxy_blast/issues/22 based on Dan's example, Michael Li has suggested using the new(ish) Data Table functionality of Galaxy for using *.loc files: https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables Currently the BLAST+ wrappers access the blastdb.loc file for picking a system installed nucleotide BLAST database like this: <param name="database" type="select" label="Nucleotide BLAST database"> <options from_file="blastdb.loc"> <column name="value" index="0"/> <column name="name" index="1"/> <column name="path" index="2"/> </options> </param> See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/nc... With the from_data_table feature this which would be much shorter: <param name="database" type="select" label="Nucleotide BLAST database"> <options from_data_table="blastdb" /> </param> For this to work, the column information must instead be defined centrally in ``tool_data_table_conf.xml`` (via a ``tool_data_table_conf.xml.sample`` file), e.g. <table name="blastdb" comment_char="#"> <columns>value, name, path</columns> <file path="tool-data/blastdb.loc" /> </table> For simple tools this seems quite neat, but within a single tool suite using XML macros seems equally effective for centrally defining the columns in the *.loc files (we do this currently). However, what worries me is the data table XML configuration file adds a new complexity for dependency management between different ToolShed repositories using a *.loc file (like the *.loc files for BLAST databases). For the BLAST database *.loc files, the simplest solution seems to be not to use the Data Tables feature (as we do now). The next best solution seems to be to put the sample *.loc files and associated data table definition XML files into a shared ToolShed repository (called called blast_data_tables, or blast_databases?) which would be declared as a dependency of anything using the BLAST database *.loc files (e.g. the BLAST+ wrappers and any data managers). [This would be like the existing blast_datatypes ToolShed repository which is a declared dependency of many tools using BLAST] Is that a good plan? What benefits does it have over simply not using the Data Table functionality? Thanks, Peter
Hi Peter, Having a standalone repository that just contained the tool data table and .loc file that could be a dependency of other repositories would be a good way to go here. Unfortunately, this isn’t supported right now. I’ve opened a trello card for this: https://trello.com/c/VZxV08Qt However, even though you currently need to include the tool data table definition and .loc sample in each repository in order for the tool to be valid, it is still a best practice to use tool data tables. Thanks, Dan On Apr 9, 2014, at 7:04 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi all,
In discussion about adding an NCBI BLAST data manager https://github.com/peterjc/galaxy_blast/issues/22 based on Dan's example, Michael Li has suggested using the new(ish) Data Table functionality of Galaxy for using *.loc files: https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables
Currently the BLAST+ wrappers access the blastdb.loc file for picking a system installed nucleotide BLAST database like this:
<param name="database" type="select" label="Nucleotide BLAST database"> <options from_file="blastdb.loc"> <column name="value" index="0"/> <column name="name" index="1"/> <column name="path" index="2"/> </options> </param>
See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/nc...
With the from_data_table feature this which would be much shorter:
<param name="database" type="select" label="Nucleotide BLAST database"> <options from_data_table="blastdb" /> </param>
For this to work, the column information must instead be defined centrally in ``tool_data_table_conf.xml`` (via a ``tool_data_table_conf.xml.sample`` file), e.g.
<table name="blastdb" comment_char="#"> <columns>value, name, path</columns> <file path="tool-data/blastdb.loc" /> </table>
For simple tools this seems quite neat, but within a single tool suite using XML macros seems equally effective for centrally defining the columns in the *.loc files (we do this currently).
However, what worries me is the data table XML configuration file adds a new complexity for dependency management between different ToolShed repositories using a *.loc file (like the *.loc files for BLAST databases).
For the BLAST database *.loc files, the simplest solution seems to be not to use the Data Tables feature (as we do now).
The next best solution seems to be to put the sample *.loc files and associated data table definition XML files into a shared ToolShed repository (called called blast_data_tables, or blast_databases?) which would be declared as a dependency of anything using the BLAST database *.loc files (e.g. the BLAST+ wrappers and any data managers).
[This would be like the existing blast_datatypes ToolShed repository which is a declared dependency of many tools using BLAST]
Is that a good plan? What benefits does it have over simply not using the Data Table functionality?
Thanks,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Wed, Apr 9, 2014 at 4:14 PM, Daniel Blankenberg <dan@bx.psu.edu> wrote:
Hi Peter,
Having a standalone repository that just contained the tool data table and .loc file that could be a dependency of other repositories would be a good way to go here. Unfortunately, this isn’t supported right now. I’ve opened a trello card for this: https://trello.com/c/VZxV08Qt
However, even though you currently need to include the tool data table definition and .loc sample in each repository in order for the tool to be valid, it is still a best practice to use tool data tables.
OK, thanks Dan. Peter
participants (2)
-
Daniel Blankenberg
-
Peter Cock