Hi,
This message is long. If you wish to see how I resolved the issue, please scroll down to “SOLUTION”.
I have been trying to set up the SAMTools program on Galaxy in my local instance, and I was getting stuck with the SAM-to-BAM tool.
Looking at the XML file, it tells me that it requires the “sam_fa_indices.loc” file. After copying the sam_fa_indices.loc.sample to make the required file, I then proceeded to put in the index.
However, I found the instructions slightly confusing.
---Instruction at the top---
#This is a sample file distributed with Galaxy that enables tools
#to use a directory of Samtools indexed sequences data files. You will need
#to create these data files and then create a sam_fa_indices.loc file
#similar to this one (store it in this directory ) that points to
#the directories in which those files are stored. The sam_fa_indices.loc
#file has this format (white space characters are TAB characters):
#
#<index> <seq> <location>
This implies that it requires three columns, one with a variable that defines the index, one with a variable that defines the seq, and one that defines the location of the index file.
However, in the example beneath, this is not the format that the line entry is given.
---Example given---
#So, for example, if you had hg18 indexed stored in
#/depot/data2/galaxy/sam/,
#then the sam_fa_indices.loc entry would look like this:
#
#hg18 /depot/data2/galaxy/sam/hg18.fa
This entry has only two columns, it was not clear if hg18 is the index variable, and if the seq variable refers to the sequence file that was used to generate the index, or if it refers the path (containing the prefix) to find the index file.
When I used the two-column entry as in the example, the tool failed, informing me that the sequence is not found. The path is correct, and the file was independently generated by SAMTools. Upon looking at the xml file, I noticed the following validator code:
<validator type="dataset_metadata_in_file" filename="sam_fa_indices.loc" metadata_name="dbkey" metadata_column="1" message="Sequences are not currently available for the specified build." line_startswith="index" />
Two things popped up from reading this:
- The “metadata_column = 1”, which given that column assignment is 0-based, implies that the dbkey variable should match column 2 of the sam_fa_indices.loc file.
- The “line_startswith=index”, which led me to believe that every line must start with index.
>>>>>SOLUTION<<<<<
In sam_fa_indices.loc.sample:
#This is a sample file distributed with Galaxy that enables tools
#to use a directory of Samtools indexed sequences data files. You will need
#to create these data files and then create a sam_fa_indices.loc file
#similar to this one (store it in this directory ) that points to
#the directories in which those files are stored. The sam_fa_indices.loc
#file has this format (white space characters are TAB characters):
#
#index <seq> <location> [Change <index> to the word “index”]
#So, for example, if you had hg18 indexed stored in
#/depot/data2/galaxy/sam/,
#then the sam_fa_indices.loc entry would look like this:
#
#index hg18 /depot/data2/galaxy/sam/hg18.fa [Add the word “index” to the beginning of line]
#
#and your /depot/data2/galaxy/sam/ directory
#would contain hg18.fa and hg18.fa.fai files:
#
#-rw-r--r-- 1 james universe 830134 2005-09-13 10:12 hg18.fa
#-rw-r--r-- 1 james universe 527388 2005-09-13 10:12 hg18.fa.fai
#
#Your sam_fa_indices.loc file should include an entry per line for
#each index set you have stored. The file in the path does actually
#exist, but it should never be directly used. Instead, the name serves
#as a prefix for the index file. For example:
#
#index hg18 /depot/data2/galaxy/sam/hg18.fa [Add the word “index” to the beginning of line]
Once I made these changes, the tool now works.
>>>>>END SOLUTION<<<<<
Thanks for your time and patience.
Cheers,
Oliver