On Mon, Nov 18, 2013 at 2:24 PM, Dave Bouvier <dave@bx.psu.edu> wrote:
Peter,
It turns out there were two problems. First, the test environment was not
resolving the upload tool's dependency on samtools, which I've now
corrected.
Excellent.
On a closely related point, I understand Galaxy likes to store all
BAM files co-ordinate sorted and indexed - when a tool produces
a BAM file where does this happen? i.e. Is it the individual tool's
responsibility, or the framework (e.g. during setting metadata).
I am assume the later, in which case is there still an implicit
samtools dependency there?
This is (unfortunately) performed in multiple methods in the Bam class methods in ~/galaxy/datatypes/binary.py. There are some comments (pasted here) that include an old "TODO" in the Bam class's dataset_content_needs_grooming() method that clarifies some of the reasons for this:
# Samtools version 0.1.13 or newer produces an error condition when attempting to index an
# So when using a newer version of samtools, we'll first check if the input BAM file is sorted
# from the header information. If the header is present and sorted, we do nothing by returning False.
# If it's present and unsorted or if it's missing, we'll index the bam file to see if it produces the
# error. If it does, sorting is needed so we return True (otherwise False).
#
# TODO: we're creating an index file here and throwing it away. We then create it again when
# the set_meta() method below is called later in the job process. We need to enhance this overall
# process so we don't create an index twice. In order to make it worth the time to implement the
# upload tool / framework to allow setting metadata from directly within the tool itself, it should be
# done generically so that all tools will have the ability. In testing, a 6.6 gb BAM file took 128
# seconds to index with samtools, and 45 minutes to sort, so indexing is relatively inexpensive.
Second, the bam file detection on upload was broken due to the
bug in python 2.7.4's gzip module, which I've also corrected.
You mean http://bugs.python.org/issue17666 fixed in 2.7.5?
Yes