Hi Mike, On Tue, Apr 30, 2013 at 11:17 PM, Mike Dyall-Smith <mike.dyallsmith@gmail.com> wrote:
Thanks Peter. My answers are below:
What query sequences are you using? I have just been using one fasta protein sequence.
OK - this and the fact it works on the host machine is good to know.
Meanwhile monitor the system with top Top says only a max of 27% cpu usage, but the linux screen eventually freezes, and I have to restart. I am not sure how to read out RAM and disk IO from top.
Linux 'top' does list memory usage, both per process and for the system in the text at the top. There are other tools for monitoring IO like iotop - but you could also probably just watch the host Apple OS X System Monitor for this. This does sound like the VM hasn't got enough RAM to run BLASTP against NR efficiently.
grep blastp paster.log Tried that, and it says there is no such file or directory
In the Galaxy folder? Maybe the default log filename has changed since I setup my machine... are you running Galaxy as a daemon, or running run.sh at the terminal directly? If the later, try something like this: $ sh run.sh | grep -i blast (If you don't get much output, adjust the logging level in the universe_wgsi.ini configuration file.)
Could you try running BLAST from the host Mac OX Yes. And it works fine! I get a good match in a relatively short time.
That's progress - the hardware itself is capable :)
I then made a very small protein database, checked it by commandline blastp in both host OS X and in guest linux, and it worked fine. Added it to the blastdb_p.loc file, restarted and saw it listed in galaxy. Tried to use it for a blastp, and got the same error as before with the huge NCBI nr database. So, it is not a matter of size....
There are clearly two separate issues, (1) getting BLAST to run nicely on your VM - which I think is running out of RAM, and (2) sorting out your Galaxy BLAST database configuration. Something to check is the read permissions on the BLAST database files (which Linux user are you running Galaxy as, and can that user read the database files and their folder?). I'm keen to see the Galaxy log output to see what exactly was the command line being used to run BLAST, which would help with debugging where the problem is.
Thanks for your comments about RAM and blast searches. It gives me hope that I can get galaxy running usefully. I only chose biolinux because of the suite of programs and the apparent ease of use. The other reason was that I could not install galaxy on OS X (10.6). I get errors that others have noted on the discussion lists but no-one seems to have a solution for.
I used to do my Galaxy tool development on Mac OS X, but it didn't work 100% right, and in any case many of the tools I wanted to wrap and run within Galaxy were Linux only - so now I just ssh into a Linux server to do Galaxy work. Given the main Galaxy development and the Penn state Galaxy server all happens on Linux, you'll have a much easier time under Linux than Mac OS X. Regards, Peter