On Wed, Nov 24, 2010 at 11:02 PM, Bossers, Alex wrote:
Peter,
a nice extra feature welcomed by myself would be to allow the optional inclusion of the Hit_defline in the output table. In many workflows we would need to blast, get the id from the table, use id to get human readible name and insert/use it.... which is silly of course since that data is available in the xml anyway.
I don't know python and about hg changesets but I modified your python and xml file to incorporate this (see attachement). By default its normal blast tabular output but optionally it can include the defline. The hit_defline needed to be split (I hope I did it in a python way) to eliminate multiple discriptions separated by >gi (nt and nr) or plain semicolons for swissprot.... maybe there are more but not sure...
Have a look and test and maybe it will find the way in some form into your suite. Anyway its very useful in this way to us.
cheers Alex
Hi Alex,
I'm glad to see the BLAST+ wrappers being used already, and to get positive feedback.
I had a quick look at your modifications - I think it could be made more beautiful, but it looks like it would work fine. I understand the aim behind your suggested change, but I have another solution in mind.
I was already planning to write another tool for splitting a column in a tabular file - e.g. splitting on the pipe character could be very useful to extract the GI number from a typical NCBI identifier string. Such a tool could also be used on the BLAST output to do what you are asking for (splitting the hit IDs), or to grab a particular word from formatted text (by spitting on spaces). I'm surprised this isn't in Galaxy already to be honest - maybe it is and I haven't found it yet ;)
I'd also like to explain that I deliberately kept the provided XML to tabular functionality simple to start with - all it tried to do is recreate the default tabular output, but even that turned out to be non-trivial. I have several ideas for extension which I will try to outline here.
The BLAST+ suite actually lets you ask for certain other predefined columns in the tabular output. I am wondering about offering a "full" tabular output option in the BLAST+ wrappers - this seems simpler than making the user pick and choose which columns they want. e.g. for blastp:
The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession sallacc means All subject accessions qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame
Note that calculating and recording of the above will add computation cost and IO load - so keeping the default std set of columns as the default in the Galaxy wrapper makes sense to me.
Potentially the BLAST XML output can be converted into this full tabular output too - I expect so but it may not be so easy.
Another avenue by which to extend the BLAST+ suite is to teach Galaxy about the BLAST ASN.1 output format, and wrap the new blast_formatter application for turning ASN.1 into another BLAST output format.
Regards,
Peter
On Wed, Nov 24, 2010 at 11:44 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
The BLAST+ suite actually lets you ask for certain other predefined columns in the tabular output. I am wondering about offering a "full" tabular output option in the BLAST+ wrappers - this seems simpler than making the user pick and choose which columns they want. e.g. for blastp:
The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession sallacc means All subject accessions qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame
Note that calculating and recording of the above will add computation cost and IO load - so keeping the default std set of columns as the default in the Galaxy wrapper makes sense to me.
Potentially the BLAST XML output can be converted into this full tabular output too - I expect so but it may not be so easy.
I've started work on an extended 22 column tabular output option from the BLAST tools, covering what I consider to be all the important extra fields available, including the ability to convert from BLAST XML to this 22 column format.
The work in progress is here if anyone wants to look: http://bitbucket.org/peterjc/galaxy-central/changeset/blastplus_nov25
Peter
On Thu, Nov 25, 2010 at 7:05 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
I've started work on an extended 22 column tabular output option from the BLAST tools, covering what I consider to be all the important extra fields available, including the ability to convert from BLAST XML to this 22 column format.
The work in progress is here if anyone wants to look: http://bitbucket.org/peterjc/galaxy-central/changeset/blastplus_nov25
Hello all,
I spent the morning tracking down the root of a strange inconsistency in the XML to tabular conversion regarding the percentage and number of identities. My conclusion was this was down to a bug blastp (BLAST 2.2.24+) which I have just contacted the NCBI about.
I think this extension to the BLAST+ wrappers to offer an extended tabular output, and to convert from XML to this extended tabular output, is fit for testing now:
http://bitbucket.org/peterjc/galaxy-central/changeset/blastplus_nov25
Alex - are you willing to try this out (and comment on it) before I ask for it to be merged to the trunk?
Thanks,
Peter
On Fri, Nov 26, 2010 at 12:34 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Thu, Nov 25, 2010 at 7:05 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
I've started work on an extended 22 column tabular output option from the BLAST tools, covering what I consider to be all the important extra fields available, including the ability to convert from BLAST XML to this 22 column format.
The work in progress is here if anyone wants to look: http://bitbucket.org/peterjc/galaxy-central/changeset/blastplus_nov25
Hello all,
I spent the morning tracking down the root of a strange inconsistency in the XML to tabular conversion regarding the percentage and number of identities. My conclusion was this was down to a bug blastp (BLAST 2.2.24+) which I have just contacted the NCBI about.
I think this extension to the BLAST+ wrappers to offer an extended tabular output, and to convert from XML to this extended tabular output, is fit for testing now:
http://bitbucket.org/peterjc/galaxy-central/changeset/blastplus_nov25
Alex - are you willing to try this out (and comment on it) before I ask for it to be merged to the trunk?
Thanks,
Peter
Hi all,
Alex - were you able to find time to try this out? Or should I ask Kanwei to merge it as it is?
Thanks,
Peter
Peter, sorry my schedule at the moment won't allow sufficient testing before the weekend the earliest... Keep you posted. Alex ________________________________________ Van: Peter Cock [p.j.a.cock@googlemail.com] Verzonden: dinsdag 14 december 2010 11:46 Aan: Bossers, Alex CC: Galaxy Dev Onderwerp: Re: [galaxy-dev] BLAST+ enhancements, was: blastxml to tabular bug fix
On Fri, Nov 26, 2010 at 12:34 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Thu, Nov 25, 2010 at 7:05 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
I've started work on an extended 22 column tabular output option from the BLAST tools, covering what I consider to be all the important extra fields available, including the ability to convert from BLAST XML to this 22 column format.
The work in progress is here if anyone wants to look: http://bitbucket.org/peterjc/galaxy-central/changeset/blastplus_nov25
Hello all,
I spent the morning tracking down the root of a strange inconsistency in the XML to tabular conversion regarding the percentage and number of identities. My conclusion was this was down to a bug blastp (BLAST 2.2.24+) which I have just contacted the NCBI about.
I think this extension to the BLAST+ wrappers to offer an extended tabular output, and to convert from XML to this extended tabular output, is fit for testing now:
http://bitbucket.org/peterjc/galaxy-central/changeset/blastplus_nov25
Alex - are you willing to try this out (and comment on it) before I ask for it to be merged to the trunk?
Thanks,
Peter
Hi all,
Alex - were you able to find time to try this out? Or should I ask Kanwei to merge it as it is?
Thanks,
Peter
On Wed, Dec 15, 2010 at 7:52 PM, Bossers, Alex Alex.Bossers@wur.nl wrote:
Peter, sorry my schedule at the moment won't allow sufficient testing before the weekend the earliest... Keep you posted. Alex
No problem - good testing is worth waiting for, and testing by other people tends to turn up things the author completely misses ;)
Thanks!
Peter
galaxy-dev@lists.galaxyproject.org