Hi all,
 
I appreciate all of the discussion related to this issue. I still don't understand why I should only see this issue when I choose the hg_g1k_v37 format but not when I choose the Hg_19 format? I realize that I would need to ensure that the Bam files are sorted correctly before I enter the GATK pipline, but all of this is before that process.
 
When my read files are processed through to .bam files using the hg_19 format, I can view them in IGV without a problem. It is only when I use the hg_g1k_v37 format that I receive an error from IGV. It seems to me that the process that I am using in Galaxy should be identical except for the reference genome format (i.e. hg_19 or hg_g1k_v37).
 
I am at a loss of how to proceed. Does anyone have ideas?
 
Thanks,
Mike



--- On Thu, 10/27/11, Jim Robinson <jrobinso@broadinstitute.org> wrote:

From: Jim Robinson <jrobinso@broadinstitute.org>
Subject: Re: [galaxy-user] Problem with bam and/or bai files
To: "Peter Cock" <p.j.a.cock@googlemail.com>
Cc: "Galaxy Dev" <galaxy-dev@bx.psu.edu>, "Mike Dufault" <dufaultm@yahoo.com>, "galaxy-user" <galaxy-user@lists.bx.psu.edu>
Date: Thursday, October 27, 2011, 9:58 AM

  Its possible the sorting problem was a specific version and now gives
an error.  The incorrect index caused by bad sequence lengths is a
recurrent problem, but I do not know what tool produces such headers. 
Perhaps someone who has experienced this can chime in.

I'm not a samtools expert just sharing my experience on what has caused
this error int the past.   It does seem that, as a general rule,  that
these index problems result in errors from Picard (which the GATK uses),
while samtools can fail silently and sometimes and give you an unrelated
query region.

Jim

> Sending to galaxy-dev ...
>
> On Thu, Oct 27, 2011 at 5:51 AM, Jim Robinson
> <jrobinso@broadinstitute.org>  wrote:
>> Hi Mike,
>>
>> Someone from the Galaxy team can perhaps give some insight on
>> what went wrong,  I can comment on the error message from IGV.
>> That error is thrown from Picard, in every case I've investigated so
>> far it was traced to a problem with the index.
> Useful background re: "Error reading bam file. This usually indicates
> a problem with the index (bai) file. ArrayIndexOutofBoundsException:
> 4682 (4682)."
>
>> The most common causes are (1) a problem with the sequence
>> dictionary in the BAM header itself, specifically incorrect sequence
>> lengths,
> Any idea what tools produce that kind of thing?
>
>> and (2) indexing an un-sorted BAM.  Apparently samtools will
>> make invalid indexes from such files without any complaints in
>> both cases.  You can even use samtools tview on such files,
>> it happily will show you some random region when you query.
> That is news to me - I recall "samtools index" being recommended
> as a way to determine if a BAM files was sorted or not (error on
> unsorted, you get an index if it was sorted) and again from
> memory this is what Galaxy uses internally as part of preparing
> BAM files on upload.
>
> Might this be tied to a specific version of samtools? e.g. a
> possible regression?
>

>> I don't see a "Sort" step in your workflow, maybe that's the problem?
>>
>> Please CC me on any reply,  I might miss it in the list.
>>
>> Jim
> Thanks,
>
> Peter