Question about installing NCBI BLAST+ onto Galaxy
Hey everyone, Hope all is well! I was wondering if someone could help me with another error I ran into. I recently downloaded the NCBI BLAST+ toolkit and it automatically installed itself into Galaxy. I'm just wondering if anyone knows where I am supposed to put the directory with the database files it needs to run correctly, or if it makes a difference. I have configured the blastdb.loc file as shown below, and the database now appears in the drop-down menu for the NCBI tools, but when I try executing any of the BLASTs Galaxy returns the following error, regardless of the path permutation I try: When I tried putting the databases within the galaxy installation (within galaxy_test) and I used the whole path: BLAST Database error: No alias or index file found for nucleotide database [/Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ tool-data/blastdb/refseq_rna] in search path [/Users/burtonigenomics/ Rosa_Files/bin/fastx_bin/galaxy_test/database/job_working_directory/ 28::] Return error code 2 from command: tblastx -query /Users/ burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/files/ 000/dataset_27.dat -db "/Users/burtonigenomics/Rosa_Files/bin/ fastx_bin/galaxy_test/tool-data/blastdb/refseq_rna" -evalue 0.001 - out /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ database/files/000/dataset_32.dat -outfmt 6 -num_threads 8 When I tried putting the databases within the galaxy installation (within galaxy_test) and I used the path from the galaxy root: BLAST Database error: No alias or index file found for nucleotide database [/blastdb/refseq_rna] in search path` [/Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/ job_working_directory/24::] Return error code 2 from command: blastn - query /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ database/files/000/dataset_27.dat -db /blastdb/refseq_rna -task megablast -evalue 0.001 -out /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/ files/000/dataset_28.dat -outfmt 6 -num_threads 8 When I tried putting the databases outside of the galaxy installation: BLAST Database error: No alias or index file found for nucleotide database [/Users/burtonigenomics/Rosa_Files/data/BLAST_databases/ refseq_rna] in search path [/Users/burtonigenomics/Rosa_Files/bin/ fastx_bin/galaxy_test/database/job_working_directory/29::] Return error code 2 from command: tblastx -query /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/ galaxy_test/database/files/000/dataset_27.dat -db "/Users/ burtonigenomics/Rosa_Files/data/BLAST_databases/refseq_rna " -evalue 0.001 -out /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ database/files/000/dataset_33.dat -outfmt 6 -num_threads 8 ________________________________________________________________________________________________________________________ The index file is in the folder specified in the nucleotide database [/blastdb/refseq_rna], but I don't know why/if the search path is actually going to the right place. job_working_directory doesn't seem to have any files, is it something that gets changed when galaxy starts running? Do you know why it's looking there? _______________________________________________________________________________________________________________________ My current blastdb.loc file (which is supposed to point out the path) currently reads: #This is a sample file distributed with Galaxy that is used to define a #list of nucleotide BLAST databases, using three columns tab separated #(longer whitespace are TAB characters): # #<unique_id> <database_caption> <base_name_path> # #The captions typically contain spaces and might end with the build date. #It is important that the actual database name does not have a space in it, #and that the first tab that appears in the line is right before the path. # #So, for example, if your database is nt and the path to your base name #is /depot/data2/galaxy/blastdb/nt/nt.chunk, then the blastdb.loc entry #would look like this: # #nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk # #and your /depot/data2/galaxy/blastdb/nt directory would contain all of #your "base names" (e.g.): # #-rw-r--r-- 1 wychung galaxy 23437408 2008-04-09 11:26 nt.chunk.00.nhr #-rw-r--r-- 1 wychung galaxy 3689920 2008-04-09 11:26 nt.chunk.00.nin #-rw-r--r-- 1 wychung galaxy 251215198 2008-04-09 11:26 nt.chunk.00.nsq #...etc... # #Your blastdb.loc file should include an entry per line for each "base name" #you have stored. For example: # #nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk #wgs_30_Nov_2009 wgs 30 Nov 2009 /depot/data2/galaxy/blastdb/wgs/wgs.chunk #test_20_Sep_2008 test 20 Sep 2008 /depot/data2/galaxy/blastdb/test/test #...etc... # #See also blastdb_p.loc which is for any protein BLAST database. # #Note that for backwards compatibility with workflows, the unique ID of #an entry must be the path that was in the original loc file, because that #is the value stored in the workflow for that parameter. # refseq_rna Reference RNA Sequence /Users/burtonigenomics/Rosa_Files/ bin/fastx_bin/galaxy_test/tool-data/blastdb/refseq_rna # I've also tried: # refseq_rna Reference RNA Sequence /tool-data/blastdb/refseq_rna # and refseq_rna Reference RNA Sequence /blastdb/refseq_rna # but it still seems to sending to a default thats not working or something? _______________________________________________________________________________________________________ I would really appreciate help with this, if there's anyone who is more knowledgeable about the BLAST tools and Galaxy at the NCBI that I should talk to let me know, but I feel like this might be something you could help me with. Let me know if there's any other information you would need, and thanks for your time! Best, George Michopoulos Fernald Lab Stanford University
On Tue, Jun 28, 2011 at 9:52 PM, George Michopoulos <giorgos@stanford.edu> wrote:
Hey everyone, Hope all is well! I was wondering if someone could help me with another error I ran into. I recently downloaded the NCBI BLAST+ toolkit and it automatically installed itself into Galaxy. I'm just wondering if anyone knows where I am supposed to put the directory with the database files it needs to run correctly, or if it makes a difference.
It shouldn't matter as long a you use an appropriate path in the blastdb.loc and blastdb_p.loc files. We use /data/blastdb/ for ours.
I have configured the blastdb.loc file as shown below, and the database now appears in the drop-down menu for the NCBI tools, but when I try executing any of the BLASTs Galaxy returns the following error, regardless of the path permutation I try:
At least Galaxy is finding the loc file :)
When I tried putting the databases within the galaxy installation (within galaxy_test) and I used the whole path: BLAST Database error: No alias or index file found for nucleotide database [/Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/tool-data/blastdb/refseq_rna] in search path [/Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/job_working_directory/28::] Return error code 2 from command: tblastx -query /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/files/000/dataset_27.dat -db "/Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/tool-data/blastdb/refseq_rna" -evalue 0.001 -out /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/files/000/dataset_32.dat -outfmt 6 -num_threads 8
Does BLAST+ work at the command line? Does BLAST+ work within Galaxy for FASTA vs FASTA (rather than FASTA vs database)? Also what does this give: ls /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/tool-data/blastdb/refseq_rna.* My guess is you have tried a valid path, but that the Galaxy user does not have read permission so BLAST+ can't open the database. Peter
participants (2)
-
George Michopoulos
-
Peter Cock