Hello Peter, On Mar 18, 2011, at 11:26 AM, Peter Cock wrote:
I've just updated my test Galaxy instance to get the 5221:b5ecb8f4839d fix, and I now get a different behaviour - still an error state.
Data type: auto Build: ? Miscellaneous information: The uploaded files need grooming, so change your Copy data into Galaxy? selection to be Copy files into Galaxy instead of Link to files without copying into Galaxy so grooming can be performed. error
Presumably Galaxy uses 'Grooming' in several settings (e.g. FASTQ) to mean 'data sanitising', and what that message is trying to tell me is Galaxy doesn't think my BAM file is sorted (and therefore needs 'grooming'). Right?
This is correct.
Having checked my BAM files with samtools, I can confirm they don't have the SO header.
samtools view -H myfile.bam | grep "SO:"
They were generated with BWA in a split+merge pipeline to use multiple cores. I support I could run samtools reheader on them... but it would be nice to avoid that.
Change set 5256:4acde9321b63 now includes more robust checking if a bam file is sorted. If using a version of samtools 0.1.13 or newer, an error condition occurs if attempting to index an unsorted bam file. We take advantage of this in our checks.
Did you see Pierre's little C tool using the samtools API to do this? http://plindenbaum.blogspot.com/2011/02/testing-if-bam-file-is-sorted-using....
Yes, however in testing, a 6.6GB BAM file took 138 seconds to check with the posted 'bamsorted' code that uses the SAMtools API and 128 seconds to index with SAMtools, so we're using samtools for the check.
The only disadvantage is that you need a new samtools for it to work on 100% of cases but that seems like a good choice moving forward.
Yes, since Galaxy will typically do sort the index anyway, it makes sense to try and do the indexing immediately, and thus find out if a sort is required or not.
Meanwhile, the following trivial patch resolves my problem with getting pre-existing BAM files loaded into Galaxy:
https://bitbucket.org/peterjc/galaxy-central/changeset/7f17701740b2
As a follow up, Galaxy doesn't need to re-index the file if there is already a BAI index. However, making it do this seems to mean knowing a bit more about how Galaxy deals with its metadata.
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu