This changeset undoes part of the change to data tables in my commit, which was not an accident. The blastdb.loc.sample and blastdb_p.loc.sample now do not match the columns expected in the ncbi_blast_plus tools. Megablast uses the blastdb.loc file and it expects it to match the spec in tool_data_table_conf.xml. Data tables is far more flexible than the raw loc approach, which is why we changed it. The unique ID is necessary for the data tables approach and allows for the structure of the loc file and/or data location to change without breaking things. The former approach was to store the path as the value for the parameter. This means that if it was set in a workflow and if that path changed (i.e. the data directory was restructured), the workflow would no longer work. However, if we use the unique ID, it's possible to maintain backwards compatibility. Instead of the path, it stores the unique ID, which can be used to obtain the path so that it can be passed to the Python file. And for items that were already in the loc file, you set the unique ID to be the same as the original path, so that the parameter values in existing workflows is still the same. But new items can have nicer-looking IDs. And if extra columns ever need to be added, it's easy. Ever since James' original data tables commit (in August), we have been wanting to change everything over to this style, so I am going to change these files back. If you don't want to reformat the loc files, just use the tool_data_table_conf.xml.oldlocstyle instead of tool_data_table_conf.xml.sample as the source for tool_data_table_conf.xml. This is where the columns are defined, and it's just a matter of defining name and value. (In case you're not seeing it, it's not showing up in one of my clones for some reason, but it is definitely in the repository.) I'm working on a wiki page that will explain data tables, since they're pretty much undocumented at this point. Kelly On Nov 16, 2010, at 11:43 AM, Peter wrote:
On Fri, Nov 12, 2010 at 2:05 AM, Kanwei Li <kanwei@gmail.com> wrote:
All changesets in the please_merge branch have been merged. Thanks for the contribution!
-Kanwei
Hi Kanwei & Kelly,
I've just updated my test installation of Galaxy and realised that there is a problem with the loc file handling for BLAST+ due to this commit from Kelly Vincent:
"Converted several tools to data table style of loc file handling (Bowtie, BWA, Lastz, Megablast, PerM, SRMA). Cleaned up several tool XML files, removing unnecessary None parameters."
http://bitbucket.org/galaxy/galaxy-central/changeset/535d276c92bc
When I wrote the BLAST+ wrappers, blastdb.loc (for nucleotides) used two columns only (caption and path). Likewise for the introduced blastdb_p.loc file.
The legacy megablast_wrapper.xml treated the first word of the caption as an ID and passed it to megablast_wrapper.py which used the loc file to look up the real path to use to call blastall. This seems convoluted to me.
For my BLAST+ wrappers I just need the caption (to show to the user) and the path (to use at the command line), which were column indices 0 and 1 (python counting), thus:
<options from_file="blastdb.loc"> <column name="name" index="1"/> <column name="value" index="2"/> </options>
Then came this patch, from Kelly Vincent: "Converted several tools to data table style of loc file handling (Bowtie, BWA, Lastz, Megablast, PerM, SRMA). Cleaned up several tool XML files, removing unnecessary None parameters." http://bitbucket.org/galaxy/galaxy-central/changeset/535d276c92bc
After this patch, the blastdb.loc and blastdb_p.loc files have three columns (id, caption, path), with the recommendation that if you were using the old megablast_wrapper.xml then pick the first word of the caption as the id (for backwards compatibility).
The XML for the BLAST+ wrappers now (wrongly) uses this,
<options from_file="blastdb.loc"> <column name="name" index="2"/> <column name="value" index="0"/> </options>
That means the name shown to the users is column 2 (in python speak, i.e. the third column) which is the path (!) and the value used to call the executable is column 0 (in python speak, i.e. the first column) which is the new identifier column.
Is it possible that this would run, but only if the identifier was actually the name of a valid blast database (e.g. nr) which was on the blast database path. Maybe that is the case on Kelly's machine?
What it should be using is column indexes 1 and 2 (for the caption and path, ignoring the new id column):
<options from_file="blastdb.loc"> <column name="name" index="1"/> <column name="value" index="2"/> </options>
This is done in the following changeset: http://bitbucket.org/peterjc/galaxy-central/changeset/6b499b39b804
Could one of you apply that please?
I'd also like to know why the extra ID column was added - I don't understand what it is for. Can we remove it again?
Regards,
Peter