Hi Dan, On Tue, Oct 15, 2013 at 7:40 PM, Daniel Blankenberg <dan@bx.psu.edu> wrote:
Hi all,
I think what we have are two similar, but somewhat separate problems: 1.) We need a way via the UI for an admin to be able to add additional configuration entries to data tables / .loc files.
For 1.), we now have Data Managers. A Data Manager will do all the heavy lifting of adding additional data table entries. e.g. for bwa, it can build the mapping indexes and add the properly delimited line to the .loc file. These are accessed through the admin interface, under Manage local data. Data Managers are installed from a ToolShed, or can be installed manually. In addition to direct interactive usage, Data Manager tools can be included in workflows or accessed via the tools API. Not only does the use of a Data Manager remove the technical burdens/ concerns of adding new entries to a data table / .loc file, it also provides for the same reproducibility and provenance tracking that is afforded to regular Galaxy tools.
You said there Data Managers can be used within a workflow. I don't quite follow - aren't the Data Managers restricted to administrators only? If you don't mind me picking two specific examples of direct personal interest - which lead me to ask if there a default Data Manager which just offers a web GUI for editing any *.loc file as a table? -- Blast2GO - http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go This tool wrapper uses blast2go.loc which should list one or more Blast2G) *.properties files. These can in principle be used for advanced things like changing evidence weighting codes etc. However, the primary point is to point to different Blast2GO databases. There have been a series of (date stamped) public (free) Blast2GO databases, and my tool installation script already sets up the *.properties files for the most recent databases (which it uses for a unit test), which was your point 2 (below). The local Galaxy administrator may need to add extra entries to the blast2go.loc file, for instance when there is a new public database release, or if they setup a local database (recommended). This seems to be an easy case (since there is little that we can automate). A simple interface for adding lines to the *.loc files would be enough, assuming it includes a file select browser. -- BLAST+ - http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/ This uses blastdb.loc (nucleotides), blastdb_p.loc (proteins) etc. A simple interface for adding lines to the *.loc files would be useful, although the oddities of BLAST database naming might need a little code on top of a plain file select browser (the database name if the file path temp without the *.nal, *.pal, etc extension). There is potential for offering to automatically create databases from this all_fasta data table you mention below?
The documentation for Data Managers is currently limited to the tutorial-style doc here: http://wiki.galaxyproject.org/Admin/Tools/DataManagers/HowTo/Define; a more formal / config syntax type of page will also be made available, although the tutorial is a pretty inclusive description of the steps needed to define a Data Manager.
Could I suggest you add that information (paraphrase what you just said in this email) to the main page: http://wiki.galaxyproject.org/Admin/Tools/DataManagers I think that would help.
2.) We need a way to bootstrap/initialize a Galaxy installation with data table/ .loc file entries ('built-in data') during installation for a.) a 'production' Galaxy instance - this would include local dev/testing/etc instances b.) automated testing framework - tests should run fast, but meaningfully test a tool, e.g., the horse mitochondrial genome could be a fine built-in genome for running automated tool tests, but not desired to be automatically installed into a production Galaxy instance
For 2.): bootstrapping data during an installation process is something that still needs to be more completely spec'd out and implemented. ...
OK, so the Data Manager work does not yet cover bootstrapping (installing data as part of tool installation from the tool shed etc). Regarding 2(b), Greg and I talked about this earlier in the thread and I filed Trello Card 1165 on a related issue: https://trello.com/c/P90b5Pa0/1165-functional-tests-need-separate-loc-files-... Thanks, Peter