On Thu, Mar 20, 2014 at 5:46 AM, Nilaksha Neththikumara <nilakshafreezon@gmail.com> wrote:
Thanks a lot for the information. :) I'm new to the field so get confused at times. I started downloading the NCBI databases locally, but I have two questions.
1) There is no proper updating process for the locally installed NCBI databases. (according to my knowledge) So it seems I have to re download the database totally if I need to get them updated. And those databases are almost always being updating. (.sigh)
The NCBI provide a perl script update_blastdb.pl to automate this, usually run via cron on a regular basis (e.g. once a week). But yes, basically when the NCBI makes an update, the new files are just downloaded again. Often your institute's Linux administrators would have setup a central shared copy of the NCBI BLAST databases to avoid duplication between researchers all making their own copies. See ftp://ftp.ncbi.nlm.nih.gov/blast/db/README If you want to have a single always (nearly) up to date copy of the NCBI BLAST databases, then your Galaxy blastdb.loc and blastdb_p.loc files just need to point there. However, for full reproducibility the Galaxy approach would be to have multiple (data stamped) copies of the database, each with a separate entry in the *.loc file. This is more work to setup and maintain, and needs more disk space - but it does ensure you can rerun old BLAST searches and get the same results.
2) After installing databases, is there a particular way to let galaxy know where are my databases located? So that they can be included in the drop down menu of the blast+ wrappers for me to select :)
Thanks a lot in advance
Nilaksha Neththikumara.
Yes, you need to add each databases to relevant *.loc file (nucleotide or protein), see the README file - either on the ToolShed or here: https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/RE... Exactly where the *.loc files are on disk will depend on how you installed the BLAST+ wrappers. Peter