On Nov 18, 2013, at 10:33 AM, Peter Cock <p.j.a.cock(a)googlemail.com> wrote:
On Mon, Nov 18, 2013 at 2:24 PM, Dave Bouvier <dave(a)bx.psu.edu>
> It turns out there were two problems. First, the test environment was not
> resolving the upload tool's dependency on samtools, which I've now
On a closely related point, I understand Galaxy likes to store all
BAM files co-ordinate sorted and indexed - when a tool produces
a BAM file where does this happen? i.e. Is it the individual tool's
responsibility, or the framework (e.g. during setting metadata).
I am assume the later, in which case is there still an implicit
samtools dependency there?
This is (unfortunately) performed in multiple methods in the Bam class methods in
~/galaxy/datatypes/binary.py. There are some comments (pasted here) that include an old
"TODO" in the Bam class's dataset_content_needs_grooming() method that
clarifies some of the reasons for this:
# Samtools version 0.1.13 or newer produces an error condition when attempting
to index an
# unsorted bam file - see
# So when using a newer version of samtools, we'll first check if the
input BAM file is sorted
# from the header information. If the header is present and sorted, we do
nothing by returning False.
# If it's present and unsorted or if it's missing, we'll index the
bam file to see if it produces the
# error. If it does, sorting is needed so we return True (otherwise False).
# TODO: we're creating an index file here and throwing it away. We then
create it again when
# the set_meta() method below is called later in the job process. We need to
enhance this overall
# process so we don't create an index twice. In order to make it worth
the time to implement the
# upload tool / framework to allow setting metadata from directly within the
tool itself, it should be
# done generically so that all tools will have the ability. In testing, a 6.6
gb BAM file took 128
# seconds to index with samtools, and 45 minutes to sort, so indexing is
> Second, the bam file detection on upload was broken due to the
> bug in python 2.7.4's gzip module, which I've also corrected.
You mean http://bugs.python.org/issue17666
fixed in 2.7.5?
I reported that when Biopython's BGZF support broke (BGZF
being the gzip flavour used for BAM and tabix style indexed files).
> I have re-run the test framework on samtools_idxstats, and it has
> now passed its test.
> --Dave B.
Thanks Dave :)
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: