Hi, I've tried using the Data Manager (Admin > Data > Manage local data (beta)) to install builds for BWA and Samtools on my local Galaxy instance. Previous to using the Data Manager, I used to add the build to tool-data/shared/ucsc/builds.txt, create the .fai indexes (for samtools) from the command line, add them to tool-data/sam_fa_indices.loc and restart Galaxy (obviously doing a similar thing for BWA and adding the build to bwa_index.loc).
I thought I'd try using the Data Manager to add builds for BWA and Samtools. The BWA builds work fine (I can map to the build), but when I try to use SAM-to-BAM I get the error "Sequences are not currently available for the specified build."
Using the Data Manager creates the directory tool-data/n_sylvestris/ which contains the sub-dirs 'seq', 'bwa_index' and 'sam_index'. 'seq' contains a symlink to the n_sylvestris.fa sequence. 'sam_index' and 'bwa_index' both contains the sub-directory 'n_sylvestris', which contains a symlink to the symlink for n_sylvestris.fa in 'seq' along with their respective n_sylvestris.fa.xxx index files.
OK - all goodÅ
In tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/ there are three subdirectories: data_manager_bwa_index_builder, data_manager_sam_fa_index_builder and data_manager_fetch_genome_all_fasta All three directories contain all_fasta.loc, tool_data_table_conf.xml, tool_data_table_conf.xml.sample and (for sam and bam dirs) their pertinent index.loc file.
The data_manager_fetch_genome_all_fasta/all_fasta.loc file contains the path to the fasta symlinks.
The all_fasta.loc files in the sam and bwa data_manager_index_builder directories don't contain any uncommented lines.
The index.loc files in the sam and bwa data_manager_index_builder directories point to: tool-data/n_sylvestris/bwa_index/n_sylvestris/n_sylvestris.fa tool-data/n_sylvestris/sam_index/n_sylvestris/n_sylvestris.fa
As BWA runs fine, it's obviously reading the bwa_index.loc file from the directory: tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/data_manager_bwa_ind ex_builder/fe6508204acc/bwa_index.loc
...but it's not reading the samtools indexes at: tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/data_manager_sam_fa_ index_builder/926e50397b83/sam_fa_indices.loc
For Galaxy to find the sam indexes, I have to go to the tool-data/sam_fa_indices.loc file and manually insert into it the contents of: tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/data_manager_sam_fa_ index_builder/926e50397b83/sam_fa_indices.loc
So, I guess my question is: other than inserting the genome builds into builds.txt, should I be doing any other configuration to get Data Manager to write and configure Galaxy to read it's newly created builds. I find it strange that the BWA builds work OK, but the Samtools ones don't.
I've done a few greps for mentions of .loc files in Galaxy and the only difference between the bwa and sam .loc files is that there is a file tool-data/tool_data_table_conf.xml (plus a .sample version) which contains:
<!-- Use the file tool_data_table_conf.xml.oldlocstyle if you don't want to update your loc files as changed in revision 4550:535d276c92bc--> <tables> <!-- Location of SAMTools indexes and other files --> <table name="sam_fa_indexes" comment_char="#"> <columns>line_type, value, path</columns> <file path="tool-data/sam_fa_indices.loc" /> </table> </tables>
Could Galaxy be reading this file and ignoring the one in tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/ ??
Best wishes, Graham
Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601
galaxy-dev@lists.galaxyproject.org