Thank you very much. I was able to download blast locally and configure the loc. file so now it is up and running. :) But another problem encountered when I'm trying to align a fasta file with 4mb, giving an error called

blastn(708,0xa03ca1a8) malloc: *** mach_vm_map(size=1048576) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

Bus error: 10

I referred a bit and the only solution i could come across was that it is some kind of an error encountered when overloading the memory of a single thread. So I quit galaxy, jumped into the terminal and performed the same task with num_threads =16 (my mac pro got two quad cores with virtual dual cores : 2*4*2 =16) So far good. When examining the code in galaxy it was using a value called ${GALAXY_SLOTS:-4} to the num_thread argument yet I'm sure it only utilised a single core. Can I configure it to use all the 16 cores? Any advice please?

PS: Since my new questions are out of track with my first question (blast+ wrapper with remote searching) do I need to start a new thread? Sorry if I'm doing anything wrong here. I'm just very new and novice. (got my appointment in the beginning of March right after my graduation , no body is familiar with bioinformatics here in Sri Lanka, so I'm struggling to make my move alone with the help of you all over the world )



On Thu, Mar 20, 2014 at 3:54 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Thu, Mar 20, 2014 at 5:46 AM, Nilaksha Neththikumara
<nilakshafreezon@gmail.com> wrote:
> Thanks a lot for the information. :) I'm new to the field so get confused at
> times. I started downloading the NCBI databases locally, but I have two
> questions.
>
> 1) There is no proper updating process for the locally installed NCBI
> databases. (according to my knowledge) So it seems I have to re download
> the database totally if I need to get them updated. And those databases
> are almost always being updating. (.sigh)

The NCBI provide a perl script update_blastdb.pl to automate this,
usually run via cron on a regular basis (e.g. once a week). But
yes, basically when the NCBI makes an update, the new files
are just downloaded again.

Often your institute's Linux administrators would have setup a
central shared copy of the NCBI BLAST databases to avoid
duplication between researchers all making their own copies.

See ftp://ftp.ncbi.nlm.nih.gov/blast/db/README

If you want to have a single always (nearly) up to date copy
of the NCBI BLAST databases, then your Galaxy blastdb.loc
and blastdb_p.loc files just need to point there.

However, for full reproducibility the Galaxy approach would
be to have multiple (data stamped) copies of the database,
each with a separate entry in the *.loc file. This is more work
to setup and maintain, and needs more disk space - but it
does ensure you can rerun old BLAST searches and get
the same results.

> 2) After installing databases, is there a particular way to  let galaxy know
> where are my databases located? So that they can be included in the drop
> down menu of the blast+ wrappers for me to select :)
>
> Thanks a lot in advance
>
> Nilaksha Neththikumara.

Yes, you need to add each databases to relevant *.loc file
(nucleotide or protein), see the README file - either on
the ToolShed or here:

https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/README.rst

Exactly where the *.loc files are on disk will depend on
how you installed the BLAST+ wrappers.

Peter