From p.j.a.cock@googlemail.com Wed Apr 9 07:04:22 2014 From: Peter Cock To: galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Data Tables and *.loc files: Using named columns versus from_data_table Date: Wed, 09 Apr 2014 12:04:11 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============9191318633861772883==" --===============9191318633861772883== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi all, In discussion about adding an NCBI BLAST data manager https://github.com/peterjc/galaxy_blast/issues/22 based on Dan's example, Michael Li has suggested using the new(ish) Data Table functionality of Galaxy for using *.loc files: https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables Currently the BLAST+ wrappers access the blastdb.loc file for picking a system installed nucleotide BLAST database like this: See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus= /ncbi_macros.xml With the from_data_table feature this which would be much shorter: For this to work, the column information must instead be defined centrally in ``tool_data_table_conf.xml`` (via a ``tool_data_table_conf.xml.sample`` file), e.g. value, name, path
For simple tools this seems quite neat, but within a single tool suite using XML macros seems equally effective for centrally defining the columns in the *.loc files (we do this currently). However, what worries me is the data table XML configuration file adds a new complexity for dependency management between different ToolShed repositories using a *.loc file (like the *.loc files for BLAST databases). For the BLAST database *.loc files, the simplest solution seems to be not to use the Data Tables feature (as we do now). The next best solution seems to be to put the sample *.loc files and associated data table definition XML files into a shared ToolShed repository (called called blast_data_tables, or blast_databases?) which would be declared as a dependency of anything using the BLAST database *.loc files (e.g. the BLAST+ wrappers and any data managers). [This would be like the existing blast_datatypes ToolShed repository which is a declared dependency of many tools using BLAST] Is that a good plan? What benefits does it have over simply not using the Data Table functionality? Thanks, Peter --===============9191318633861772883==-- From dan@bx.psu.edu Wed Apr 9 11:14:20 2014 From: Daniel Blankenberg To: galaxy-dev@lists.galaxyproject.org Subject: Re: [galaxy-dev] Data Tables and *.loc files: Using named columns versus from_data_table Date: Wed, 09 Apr 2014 11:14:18 -0400 Message-ID: <4A1994A3-412D-478E-B9D2-36B71F1EE998@bx.psu.edu> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7088091603408269563==" --===============7088091603408269563== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Peter, Having a standalone repository that just contained the tool data table and .l= oc file that could be a dependency of other repositories would be a good way = to go here. Unfortunately, this isn=E2=80=99t supported right now. I=E2=80=99= ve opened a trello card for this: https://trello.com/c/VZxV08Qt However, even though you currently need to include the tool data table defini= tion and .loc sample in each repository in order for the tool to be valid, it= is still a best practice to use tool data tables. Thanks, Dan On Apr 9, 2014, at 7:04 AM, Peter Cock wrote: > Hi all, >=20 > In discussion about adding an NCBI BLAST data manager > https://github.com/peterjc/galaxy_blast/issues/22 based on > Dan's example, Michael Li has suggested using the new(ish) > Data Table functionality of Galaxy for using *.loc files: > https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables >=20 > Currently the BLAST+ wrappers access the blastdb.loc file > for picking a system installed nucleotide BLAST database > like this: >=20 > > > > > > > >=20 > See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_pl= us/ncbi_macros.xml >=20 > With the from_data_table feature this which would be much shorter: >=20 > > > >=20 > For this to work, the column information must instead be > defined centrally in ``tool_data_table_conf.xml`` (via a > ``tool_data_table_conf.xml.sample`` file), e.g. >=20 > > value, name, path > >
>=20 > For simple tools this seems quite neat, but within a single tool > suite using XML macros seems equally effective for centrally > defining the columns in the *.loc files (we do this currently). >=20 > However, what worries me is the data table XML configuration > file adds a new complexity for dependency management between > different ToolShed repositories using a *.loc file (like the *.loc > files for BLAST databases). >=20 > For the BLAST database *.loc files, the simplest solution seems > to be not to use the Data Tables feature (as we do now). >=20 > The next best solution seems to be to put the sample *.loc files > and associated data table definition XML files into a shared > ToolShed repository (called called blast_data_tables, or > blast_databases?) which would be declared as a dependency > of anything using the BLAST database *.loc files (e.g. the > BLAST+ wrappers and any data managers). >=20 > [This would be like the existing blast_datatypes ToolShed > repository which is a declared dependency of many tools > using BLAST] >=20 > Is that a good plan? What benefits does it have over simply > not using the Data Table functionality? >=20 > Thanks, >=20 > Peter > ___________________________________________________________ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > http://lists.bx.psu.edu/ >=20 > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ --===============7088091603408269563==-- From p.j.a.cock@googlemail.com Wed Apr 9 11:28:36 2014 From: Peter Cock To: galaxy-dev@lists.galaxyproject.org Subject: Re: [galaxy-dev] Data Tables and *.loc files: Using named columns versus from_data_table Date: Wed, 09 Apr 2014 16:28:25 +0100 Message-ID: In-Reply-To: <4A1994A3-412D-478E-B9D2-36B71F1EE998@bx.psu.edu> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6658388432725242824==" --===============6658388432725242824== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit On Wed, Apr 9, 2014 at 4:14 PM, Daniel Blankenberg wrote: > Hi Peter, > > Having a standalone repository that just contained the tool data table > and .loc file that could be a dependency of other repositories would > be a good way to go here. Unfortunately, this isn’t supported right > now. I’ve opened a trello card for this: https://trello.com/c/VZxV08Qt > > However, even though you currently need to include the tool data table > definition and .loc sample in each repository in order for the tool to be > valid, it is still a best practice to use tool data tables. OK, thanks Dan. Peter --===============6658388432725242824==--