Hi Mike,
What query sequences are you using? I'd try just one or two protein
On Tue, Apr 30, 2013 at 12:54 PM, Mike Dyall-Smith
<mike.dyallsmith@gmail.com> wrote:
> Dear Peter, thanks for the advice. I think I can now run blastp from the
> commandline, both in the host and in the linux virtual machine. I say
> 'think' because it runs in both cases, but then either never completes (I
> cancelled after 30 min on OS X) or freezes (VBox/ubuntu). However, the
> blastp search in galaxy gives the same error as before. This, I don't
> understand.
sequences in a small FASTA file, and rerun blastp against nr.
Meanwhile monitor the system with top or similar to see what
the CPU usage, RAM usage, and disk IO is like.
That should help determine if this is a simple as not enough RAM
leading to lots of paging to disk, and therefore a very slow search.
That looks OK.
> I simplified the directory structure (so the path) to the database, and
> altered the appropriate configuration files (blastdb_p.loc,
> blast_environment.sh). With 'env', I see: BLASTDB=/media/sf_mikeds_bioinf/db
> I also checked that "ls /media/sf_mikeds_bioinf/db/nr*" listed all the
> database files.
>
> The relevant lines are now:
> ----------------from blastdb_p.loc file-------------------------
> #Your blastdb_p.loc file should include an entry per line for each "base
> name"
> #you have stored. For example:
> #
> #nr_05Jun2010 NCBI NR (non redundant) 05 Jun 2010
> /data/blastdb/05Jun2010/nr
> #nr_15Aug2010 NCBI NR (non redundant) 15 Aug 2010
> /data/blastdb/15Aug2010/nr
> nr_08_Apr2013 NCBI_nrprot_08Apr2013 /media/sf_mikeds_bioinf/db/nr
> ...
> ----------------------------------------------------------------------
Very strange, clearly something is not right with the Galaxy config.
> The blast+ blastp error from galaxy is:
> -----------------------------------------------------------------------
> An error occurred running this job: blastp: 2.2.26+
> Package: blast 2.2.26, build Aug 15 2012 17:48:54
> BLAST Database error: No alias or index file found for protein database
> [/media/sf_mikeds_bioinf/db/nr] in search path
> [/var/lib/galaxy-server/database/job_working_directory/000/18::]
> -----------------------------------------------------------------------
>
on the bright side, Galaxy is finding the binaries . Could you try this
on the Galaxy log file,
$ grep blastp paster.log
The oldest cluster nodes we're still using have 8GB of RAM, and are
> However, while this might point to a real issue for the use of galaxy within
> VirtualBox/ubuntu (and with the database on an external USB3 drive), I
> suspect my idea of running a local instance of galaxy on my macbook is not
> possible. I have 8 Gb of RAM, and set 4Gb for use in the linux guest system.
> The Genbank nr protein db directory has files totaling 24 Gb.
fine to run BLASTP against NR with. The previous nodes only have
2GB and could not cope - so the threshold is somewhere in between.
I suspect your guest VM with only 4GB of RAM is struggling.
Could you try running BLAST from the host Mac OX X instead, with
access to the full 8GB of RAM?
...
> Running blastp
> from the commandline does not give a result in any reasonable length of
> time, even when a single query sequence is used (both on the host and guest
> systems). Even if I were able to handle raw sequence datasets of moderate
> size (less than 1 Gb? as yet untested) in Galaxy, it would be of little use
> to me if I can't blastp or blastx search the resulting contigs. Do I set up
> galaxy to send blast search queries to NCBI (i.e. many thousands of
> queries??) or is there some more elegant solution?
Regards,
Peter