Peter, It's good news that the description field is available in tabular format. That has been the primary reason for using the xml format. The tabular format allows use of the task parallel feature. I think you have good default options for output format. But, I would think we should offer the option to get all available result information. The multi-select checkboxes could serve that purpose. Is NCBI good about maintaining the column position if their tabular over successive versions? Thanks, JJ On 11/28/13, 11:13 AM, Peter Cock wrote:
Hello all,
FAO: Administrators of local Galaxy instances using the NCBI BLAST+ wrappers.
Over on the galaxy_blast repository I have been updating the NCBI BLAST+ wrappers (including unit tests) to work with the current release, NCBI BLAST+ 2.2.28 (aka BLAST 2.2.28+): https://github.com/peterjc/galaxy_blast
The initial set of changes is now on the Test Tool Shed, http://testtoolshed.g2.bx.psu.edu/view/peterjc/ncbi_blast_plus
This includes a workaround for a known regression in the makeblastdb tool dealing with duplicated identifiers: https://github.com/peterjc/galaxy_blast/commit/349e31c6cec4429c5523fde5975e2...
In terms of end-user features, the big improvement in the BLAST+ 2.2.28 release was the ability to get the BLAST match descriptions in the tabular output, and other fields: http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions....
staxids means Subject Taxonomy ID(s), separated by a ';' sscinames means Subject Scientific Name(s), separated by a ';' scomnames means Subject Common Name(s), separated by a ';' sblastnames means Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP
On this branch I am including the new salltitles field as the 25th column in the extended BLAST tabular output offered within the Galaxy interface:
https://github.com/peterjc/galaxy_blast/tree/c25
However, I'm not so sure about the taxonomy fields. Since (thus far) they are not available via the XML, I am leaning to introducing a third tabular mode, e.g.
* Standard 12 columns (can convert from XML) * Extended 25 columns (can convert from XML) * Extended also with taxonomy (cannot currently convert from XML)
Instead, we could offer a pick-you-own columns route (in all the primary BLAST tools, handled via macros)?:
* Standard 12 columns (can convert from XML) * Extended 25 columns (can convert from XML) * Pick your own columns from the full NCBI list (depending on columns, can convert from XML)
This is inspired by JJ's changes to the BLAST XML to tabular conversion tool for Galaxy-P, https://github.com/jmchilton/galaxy_blast/commit/d79afc03522768323494818a40a...
I would be much keener on the pick-you-own columns option if it was possible for the tool to record arbitrary column names for a tabular file in Galaxy's metadata (I can't find a trello card, but I'm sure I've asked about this before).
Any thoughts or comments? eg Hurry up and just release this branch adding the hit descriptions as column 25 - we want that now ;) [*]
Regards,
Peter
[*] For our local instance, the taxonomy stuff will be useful, but right now I would prioritise the description, which we currently get via the BLAST XML using this tool: https://github.com/peterjc/galaxy_blast/tree/master/tools/blastxml_to_top_de...
-- James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota