blast+ wrapper with remote searching
Hello all, The newly developed blast+ wrappers seem to be very useful, but it seems like they do not contain the function to blast against remote servers instead of installing the databases locally. Am I missing something there? I need to do a remote search for a sequence within galaxy instance so I can automate the process. Help is much appreciated. Thanks a lot, Nilaksha
On Wed, Mar 19, 2014 at 6:42 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Hello all,
The newly developed blast+ wrappers seem to be very useful, but it seems like they do not contain the function to blast against remote servers instead of installing the databases locally. Am I missing something there? I need to do a remote search for a sequence within galaxy instance so I can automate the process. Help is much appreciated.
Thanks a lot, Nilaksha
Hi Nilaksha, You are correct, the BLAST+ wrappers for Galaxy don't currently support the -remote option to connect to the NCBI over the internet to run the searches there - we've talked about this though, and I've filed an issue to track this in future: https://github.com/peterjc/galaxy_blast/issues/39 What do you mean by you need remote search for automation? You can download all the NCBI managed databases like NT/NR locally, so most tasks can be done without the -remote option. Peter P.S. The BLAST+ wrappers aren't 'newly developed', work started way back in 2011 ;)
Thanks a lot for the information. :) I'm new to the field so get confused at times. I started downloading the NCBI databases locally, but I have two questions. 1) There is no proper updating process for the locally installed NCBI databases. (according to my knowledge) So it seems I have to re download the database totally if I need to get them updated. And those databases are almost always being updating. (.sigh) 2) After installing databases, is there a particular way to let galaxy know where are my databases located? So that they can be included in the drop down menu of the blast+ wrappers for me to select :) Thanks a lot in advance Nilaksha Neththikumara. On Wed, Mar 19, 2014 at 3:33 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Wed, Mar 19, 2014 at 6:42 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Hello all,
The newly developed blast+ wrappers seem to be very useful, but it seems like they do not contain the function to blast against remote servers instead of installing the databases locally. Am I missing something there? I need to do a remote search for a sequence within galaxy instance so I can automate the process. Help is much appreciated.
Thanks a lot, Nilaksha
Hi Nilaksha,
You are correct, the BLAST+ wrappers for Galaxy don't currently support the -remote option to connect to the NCBI over the internet to run the searches there - we've talked about this though, and I've filed an issue to track this in future: https://github.com/peterjc/galaxy_blast/issues/39
What do you mean by you need remote search for automation? You can download all the NCBI managed databases like NT/NR locally, so most tasks can be done without the -remote option.
Peter
P.S. The BLAST+ wrappers aren't 'newly developed', work started way back in 2011 ;)
On Thu, Mar 20, 2014 at 5:46 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Thanks a lot for the information. :) I'm new to the field so get confused at times. I started downloading the NCBI databases locally, but I have two questions.
1) There is no proper updating process for the locally installed NCBI databases. (according to my knowledge) So it seems I have to re download the database totally if I need to get them updated. And those databases are almost always being updating. (.sigh)
The NCBI provide a perl script update_blastdb.pl to automate this, usually run via cron on a regular basis (e.g. once a week). But yes, basically when the NCBI makes an update, the new files are just downloaded again. Often your institute's Linux administrators would have setup a central shared copy of the NCBI BLAST databases to avoid duplication between researchers all making their own copies. See ftp://ftp.ncbi.nlm.nih.gov/blast/db/README If you want to have a single always (nearly) up to date copy of the NCBI BLAST databases, then your Galaxy blastdb.loc and blastdb_p.loc files just need to point there. However, for full reproducibility the Galaxy approach would be to have multiple (data stamped) copies of the database, each with a separate entry in the *.loc file. This is more work to setup and maintain, and needs more disk space - but it does ensure you can rerun old BLAST searches and get the same results.
2) After installing databases, is there a particular way to let galaxy know where are my databases located? So that they can be included in the drop down menu of the blast+ wrappers for me to select :)
Thanks a lot in advance
Nilaksha Neththikumara.
Yes, you need to add each databases to relevant *.loc file (nucleotide or protein), see the README file - either on the ToolShed or here: https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/RE... Exactly where the *.loc files are on disk will depend on how you installed the BLAST+ wrappers. Peter
Thank you very much. I was able to download blast locally and configure the loc. file so now it is up and running. :) But another problem encountered when I'm trying to align a fasta file with 4mb, giving an error called blastn(708,0xa03ca1a8) malloc: *** mach_vm_map(size=1048576) failed (error code=3) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Bus error: 10 I referred a bit and the only solution i could come across was that it is some kind of an error encountered when overloading the memory of a single thread. So I quit galaxy, jumped into the terminal and performed the same task with num_threads =16 (my mac pro got two quad cores with virtual dual cores : 2*4*2 =16) So far good. When examining the code in galaxy it was using a value called ${GALAXY_SLOTS:-4} to the num_thread argument yet I'm sure it only utilised a single core. Can I configure it to use all the 16 cores? Any advice please? PS: Since my new questions are out of track with my first question (blast+ wrapper with remote searching) do I need to start a new thread? Sorry if I'm doing anything wrong here. I'm just very new and novice. (got my appointment in the beginning of March right after my graduation , no body is familiar with bioinformatics here in Sri Lanka, so I'm struggling to make my move alone with the help of you all over the world ) On Thu, Mar 20, 2014 at 3:54 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Thu, Mar 20, 2014 at 5:46 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Thanks a lot for the information. :) I'm new to the field so get confused at times. I started downloading the NCBI databases locally, but I have two questions.
1) There is no proper updating process for the locally installed NCBI databases. (according to my knowledge) So it seems I have to re download the database totally if I need to get them updated. And those databases are almost always being updating. (.sigh)
The NCBI provide a perl script update_blastdb.pl to automate this, usually run via cron on a regular basis (e.g. once a week). But yes, basically when the NCBI makes an update, the new files are just downloaded again.
Often your institute's Linux administrators would have setup a central shared copy of the NCBI BLAST databases to avoid duplication between researchers all making their own copies.
See ftp://ftp.ncbi.nlm.nih.gov/blast/db/README
If you want to have a single always (nearly) up to date copy of the NCBI BLAST databases, then your Galaxy blastdb.loc and blastdb_p.loc files just need to point there.
However, for full reproducibility the Galaxy approach would be to have multiple (data stamped) copies of the database, each with a separate entry in the *.loc file. This is more work to setup and maintain, and needs more disk space - but it does ensure you can rerun old BLAST searches and get the same results.
2) After installing databases, is there a particular way to let galaxy know where are my databases located? So that they can be included in the drop down menu of the blast+ wrappers for me to select :)
Thanks a lot in advance
Nilaksha Neththikumara.
Yes, you need to add each databases to relevant *.loc file (nucleotide or protein), see the README file - either on the ToolShed or here:
https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/RE...
Exactly where the *.loc files are on disk will depend on how you installed the BLAST+ wrappers.
Peter
Hi Nilaksha, GALAXY_SLOTS is a features to specify a variable number of cores to specific tools outside of the wrappers. You can do that via editing the job_conf.xml file. An example is documented as comments in the latest job_conf.xml https://bitbucket.org/galaxy/galaxy-central/src/dec159841d670560582aadc1327b... Ciao, Bjoern Am 21.03.2014 06:42, schrieb Nilaksha Neththikumara:
Thank you very much. I was able to download blast locally and configure the loc. file so now it is up and running. :) But another problem encountered when I'm trying to align a fasta file with 4mb, giving an error called
blastn(708,0xa03ca1a8) malloc: *** mach_vm_map(size=1048576) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Bus error: 10
I referred a bit and the only solution i could come across was that it is some kind of an error encountered when overloading the memory of a single thread. So I quit galaxy, jumped into the terminal and performed the same task with num_threads =16 (my mac pro got two quad cores with virtual dual cores : 2*4*2 =16) So far good. When examining the code in galaxy it was using a value called ${GALAXY_SLOTS:-4} to the num_thread argument yet I'm sure it only utilised a single core. Can I configure it to use all the 16 cores? Any advice please?
PS: Since my new questions are out of track with my first question (blast+ wrapper with remote searching) do I need to start a new thread? Sorry if I'm doing anything wrong here. I'm just very new and novice. (got my appointment in the beginning of March right after my graduation , no body is familiar with bioinformatics here in Sri Lanka, so I'm struggling to make my move alone with the help of you all over the world )
On Thu, Mar 20, 2014 at 3:54 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Thu, Mar 20, 2014 at 5:46 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Thanks a lot for the information. :) I'm new to the field so get confused at times. I started downloading the NCBI databases locally, but I have two questions.
1) There is no proper updating process for the locally installed NCBI databases. (according to my knowledge) So it seems I have to re download the database totally if I need to get them updated. And those databases are almost always being updating. (.sigh)
The NCBI provide a perl script update_blastdb.pl to automate this, usually run via cron on a regular basis (e.g. once a week). But yes, basically when the NCBI makes an update, the new files are just downloaded again.
Often your institute's Linux administrators would have setup a central shared copy of the NCBI BLAST databases to avoid duplication between researchers all making their own copies.
See ftp://ftp.ncbi.nlm.nih.gov/blast/db/README
If you want to have a single always (nearly) up to date copy of the NCBI BLAST databases, then your Galaxy blastdb.loc and blastdb_p.loc files just need to point there.
However, for full reproducibility the Galaxy approach would be to have multiple (data stamped) copies of the database, each with a separate entry in the *.loc file. This is more work to setup and maintain, and needs more disk space - but it does ensure you can rerun old BLAST searches and get the same results.
2) After installing databases, is there a particular way to let galaxy know where are my databases located? So that they can be included in the drop down menu of the blast+ wrappers for me to select :)
Thanks a lot in advance
Nilaksha Neththikumara.
Yes, you need to add each databases to relevant *.loc file (nucleotide or protein), see the README file - either on the ToolShed or here:
https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/RE...
Exactly where the *.loc files are on disk will depend on how you installed the BLAST+ wrappers.
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Fri, Mar 21, 2014 at 5:42 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Thank you very much. I was able to download blast locally and configure the loc. file so now it is up and running. :) But another problem encountered when I'm trying to align a fasta file with 4mb, giving an error called
blastn(708,0xa03ca1a8) malloc: *** mach_vm_map(size=1048576) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Bus error: 10
I referred a bit and the only solution i could come across was that it is some kind of an error encountered when overloading the memory of a single thread.
It does sounds like a memory (RAM) problem, yes. However in general using more threads will not save memory (it may even need more).
So I quit galaxy, jumped into the terminal and performed the same task with num_threads =16 (my mac pro got two quad cores with virtual dual cores : 2*4*2 =16) So far good.
Stange. Perhaps you had some other memory hungry tasks running before at the same time as the failed BLAST?
When examining the code in galaxy it was using a value called ${GALAXY_SLOTS:-4} to the num_thread argument yet I'm sure it only utilised a single core. Can I configure it to use all the 16 cores? Any advice please?
Bjoern answered that; This would have defaulted to 4 threads if the GALAXY_SLOTS setting wasn't setup.
PS: Since my new questions are out of track with my first question (blast+ wrapper with remote searching) do I need to start a new thread? Sorry if I'm doing anything wrong here. I'm just very new and novice. (got my appointment in the beginning of March right after my graduation , no body is familiar with bioinformatics here in Sri Lanka, so I'm struggling to make my move alone with the help of you all over the world )
Yes, next time you have a completely new question, start a new thread. Also try reading the forums at seqanswers.com and Q&A at biostars.org Regards, Peter
participants (3)
-
Björn Grüning
-
Nilaksha Neththikumara
-
Peter Cock