Hi Greg,
Even though you are not copying the data into Galaxy's default data store, Galaxy determines and stores certain metadata for each of the data files to which you are linking. One of the types of metadata defined for the Bam datatypes is it's index, which is created by a call to samtools.
Unfortunately there is really no way around this because Galaxy requires the index file to be in a correct state, and I believe the test to determine correctness is at least as intensive as generating the index in the first place. It's been a while since I was involved in this (specifically setting metadata for bam files using samtools), so perhaps samtools has been recently improved in this regard. if so, I'll look to others to let me know I'm now "outdated" in my understanding of this. If we need to update samtools used by the Galaxy code to take advantage of newer features, we can certainly do so.
Greg Von Kuster
On Apr 23, 2012, at 2:51 PM, Gregory Miles wrote:
Thank you very much for your help with this - we got that settled. One other question...we are importing sorted, indexed bam files into a galaxy data library and we are not having galaxy copy over the files (they are large) but rather just setting up galaxy such that it points to the relevant directory. We noticed that the file (160 GB in size) is taking a long time to import considering all it should be doing is creating a link. When we examined processes that are running, we noticed that samtools is running. From searching around a bit, it seems that Galaxy does this in order to groom the bam file (sort/index) and ensure that it is in the format necessary for galaxy to be able to interpret it. Is there any way around this? We did the sorting and indexing prior to import and it's taking quite a while to perform an unnecessary function. Thanks.
Greg
Dr. Gregory Miles
Bioinformatics Specialist
Cancer Institute of New Jersey @ UMDNJ
Office: (732) 235 8817
-----------------------------------------------------------------------------------------------------------------------------
CONFIDENTIALITY NOTICE: This email communication may contain private,
confidential, or legally privileged information intended for the sole
use of the designated and/or duly authorized recipient(s). If you are
not the intended recipient or have received this email in error, please
notify the sender immediately by email and permanently delete all copies
of this email including all attachments without reading them. If you are
the intended recipient, secure the contents in a manner that conforms to
all applicable state and/or federal requirements related to privacy and
confidentiality of such information.