Merging BLAST database support into Galaxy?
Hi Edward, We're now running BLAST+ searches on our local Galaxy via our cluster, and some of the cluster nodes have relatively small amounts of RAM. This means I've become more aware of limitations in the NCBI BLAST+ tools' support for using a subject FASTA file (instead of making a local BLAST database), which turns out to be surprisingly RAM hungry. The logical step is to allow users to build a BLAST database as a new datatype in Galaxy - which is what you (Edward) did some time ago as a fork, later posted to the Galaxy Tool Shed. Edward - are you happy for me to merge your work into the main wrappers? I mentioned idea this a couple of months ago: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-February/008544.html Thanks, Peter
On Wed, Apr 18, 2012 at 10:53 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi Edward,
We're now running BLAST+ searches on our local Galaxy via our cluster, and some of the cluster nodes have relatively small amounts of RAM. This means I've become more aware of limitations in the NCBI BLAST+ tools' support for using a subject FASTA file (instead of making a local BLAST database), which turns out to be surprisingly RAM hungry.
The logical step is to allow users to build a BLAST database as a new datatype in Galaxy - which is what you (Edward) did some time ago as a fork, later posted to the Galaxy Tool Shed.
Edward - are you happy for me to merge your work into the main wrappers? I mentioned idea this a couple of months ago: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-February/008544.html
Note this will take some extra work - we need to support protein BLAST databases as well, not just nucleotide database. Peter
sounds great, thanks peter. i granted you access to my toolshed repo, but perhaps we want only one tool in the toolshed when all done. On Wed, Apr 18, 2012 at 3:20 AM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Wed, Apr 18, 2012 at 10:53 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi Edward,
We're now running BLAST+ searches on our local Galaxy via our cluster, and some of the cluster nodes have relatively small amounts of RAM. This means I've become more aware of limitations in the NCBI BLAST+ tools' support for using a subject FASTA file (instead of making a local BLAST database), which turns out to be surprisingly RAM hungry.
The logical step is to allow users to build a BLAST database as a new datatype in Galaxy - which is what you (Edward) did some time ago as a fork, later posted to the Galaxy Tool Shed.
Edward - are you happy for me to merge your work into the main wrappers? I mentioned idea this a couple of months ago: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-February/008544.html
Note this will take some extra work - we need to support protein BLAST databases as well, not just nucleotide database.
Peter
On Wed, Apr 18, 2012 at 6:10 PM, Edward Kirton <eskirton@lbl.gov> wrote:
sounds great, thanks peter. i granted you access to my toolshed repo, but perhaps we want only one tool in the toolshed when all done.
Thanks - but given the tools currently live in the main repository with Galaxy itself, I'll be focussing my efforts there for now - and adding bits of code from your work as appropriate (with a credit). Peter
Hi Edward, I've started work on this in earnest now. I see you only defined one new datatype, blastdb, which worked for nucleotide databases. I want to handle protein databases too, so I think two datatypes makes sense - which I am currently calling blastdbn and blastdbp. That won't be compatible with your existing tools & history, but other than that seems sensible to me. I suppose we could use blastdb and blastdb_p which would match the *.loc files? What do you think? Peter
your suggestion for blastdbn and blastdbp sounds fine. it's okay if a few of our users need to edit the metadata of the dbs in their history. thanks for asking and doing this. On Thu, Apr 26, 2012 at 5:37 AM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
Hi Edward,
I've started work on this in earnest now. I see you only defined one new datatype, blastdb, which worked for nucleotide databases. I want to handle protein databases too, so I think two datatypes makes sense - which I am currently calling blastdbn and blastdbp.
That won't be compatible with your existing tools & history, but other than that seems sensible to me. I suppose we could use blastdb and blastdb_p which would match the *.loc files?
What do you think?
Peter
On Thu, Apr 26, 2012 at 10:40 PM, Edward Kirton <eskirton@lbl.gov> wrote:
your suggestion for blastdbn and blastdbp sounds fine. it's okay if a few of our users need to edit the metadata of the dbs in their history. thanks for asking and doing this.
Great. Perhaps you can throw some light on the peek issue I raised here: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-April/009525.html Is looking at BLAST (nucleotide) databases in the history still working fine on your local install? What about from the latest galaxy-central? Thanks, Peter P.S. My branch is here - not really finished yet: https://bitbucket.org/peterjc/galaxy-central/src/blastdb
participants (2)
-
Edward Kirton
-
Peter Cock