On Wed, Mar 23, 2011 at 4:24 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hello Peter,
On Mar 18, 2011, at 11:26 AM, Peter Cock wrote:
Having checked my BAM files with samtools, I can confirm they don't have the SO header.
samtools view -H myfile.bam | grep "SO:"
They were generated with BWA in a split+merge pipeline to use multiple cores. I support I could run samtools reheader on them... but it would be nice to avoid that.
Change set 5256:4acde9321b63 now includes more robust checking if a bam file is sorted.
Thanks!
If using a version of samtools 0.1.13 or newer, an error condition occurs if attempting to index an unsorted bam file. We take advantage of this in our checks.
Yes, that is what Heng Li recently recommended on the samtools mailing list.
Did you see Pierre's little C tool using the samtools API to do this? http://plindenbaum.blogspot.com/2011/02/testing-if-bam-file-is-sorted-using....
Yes, however in testing, a 6.6GB BAM file took 138 seconds to check with the posted 'bamsorted' code that uses the SAMtools API and 128 seconds to index with SAMtools, so we're using samtools for the check.
Given Heng Li's recommendation to use this as a check for being indexed, I would do the same.
The only disadvantage is that you need a new samtools for it to work on 100% of cases but that seems like a good choice moving forward.
Yes, since Galaxy will typically do sort the index anyway, it makes sense to try and do the indexing immediately, and thus find out if a sort is required or not.
I see that is still pending, but since indexing is quite fast, doing it twice isn't the end of the world. Do you think it is worth trying to reuse a provided BAI file when linking to a BAM file rather than copying it into Galaxy? Peter