Thank you very much. I was able to download blast locally and configure the loc. file so now it is up and running. :) But another problem encountered when I'm trying to align a fasta file with 4mb, giving an error called blastn(708,0xa03ca1a8) malloc: *** mach_vm_map(size=1048576) failed (error code=3) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Bus error: 10 I referred a bit and the only solution i could come across was that it is some kind of an error encountered when overloading the memory of a single thread. So I quit galaxy, jumped into the terminal and performed the same task with num_threads =16 (my mac pro got two quad cores with virtual dual cores : 2*4*2 =16) So far good. When examining the code in galaxy it was using a value called ${GALAXY_SLOTS:-4} to the num_thread argument yet I'm sure it only utilised a single core. Can I configure it to use all the 16 cores? Any advice please? PS: Since my new questions are out of track with my first question (blast+ wrapper with remote searching) do I need to start a new thread? Sorry if I'm doing anything wrong here. I'm just very new and novice. (got my appointment in the beginning of March right after my graduation , no body is familiar with bioinformatics here in Sri Lanka, so I'm struggling to make my move alone with the help of you all over the world ) On Thu, Mar 20, 2014 at 3:54 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Thu, Mar 20, 2014 at 5:46 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Thanks a lot for the information. :) I'm new to the field so get confused at times. I started downloading the NCBI databases locally, but I have two questions.
1) There is no proper updating process for the locally installed NCBI databases. (according to my knowledge) So it seems I have to re download the database totally if I need to get them updated. And those databases are almost always being updating. (.sigh)
The NCBI provide a perl script update_blastdb.pl to automate this, usually run via cron on a regular basis (e.g. once a week). But yes, basically when the NCBI makes an update, the new files are just downloaded again.
Often your institute's Linux administrators would have setup a central shared copy of the NCBI BLAST databases to avoid duplication between researchers all making their own copies.
See ftp://ftp.ncbi.nlm.nih.gov/blast/db/README
If you want to have a single always (nearly) up to date copy of the NCBI BLAST databases, then your Galaxy blastdb.loc and blastdb_p.loc files just need to point there.
However, for full reproducibility the Galaxy approach would be to have multiple (data stamped) copies of the database, each with a separate entry in the *.loc file. This is more work to setup and maintain, and needs more disk space - but it does ensure you can rerun old BLAST searches and get the same results.
2) After installing databases, is there a particular way to let galaxy know where are my databases located? So that they can be included in the drop down menu of the blast+ wrappers for me to select :)
Thanks a lot in advance
Nilaksha Neththikumara.
Yes, you need to add each databases to relevant *.loc file (nucleotide or protein), see the README file - either on the ToolShed or here:
https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/RE...
Exactly where the *.loc files are on disk will depend on how you installed the BLAST+ wrappers.
Peter