Hi, when I 'produce' a bam file in my output, I can download the bam file and a corresponding bai file. I am now wondering where does the bai file come from? When is it being generated, what code is used for that? When I do multiple actions with my bam file - is the index generated every time? (overhead) Where is the bai saved on disk (if it is)? I found that in the IUC gatk, a new bai is always generated when a gatk tool is started. This creates quite an overhead. Isn't there a better solution? @ref https://github.com/galaxyproject/tools-iuc/issues/194 Best, Alexander
The response speed is awesome! On github Eric answered: https://github.com/galaxyproject/tools-iuc/blob/0887009a23d176b21536c9fd8a18...
see L12 of that file for where the bai file is stored.
That explains a lot. Very precise example. Thank you! What I'm still wondering is: When is it being generated, what code is used for that? More specific: When I use Picard SortSam in galaxy, the output is a bam file with an accompanying bai file. But when I use Picard on command line, the output is just a bam file. In the Picard SortSam galaxy tool, it is not being done. (If I didn't miss something) What leads me to When I do multiple actions with my bam file - is the index generated every
time? (overhead)
GATK processes BAM files and at the same time the bai files. It would be a waste to just use the bam files and re-generate the bai files, that already exist. How could that be done? Thanks, Alexander 2015-06-18 18:44 GMT-05:00 Alexander Vowinkel <vowinkel.alexander@gmail.com> :
Hi,
when I 'produce' a bam file in my output, I can download the bam file and a corresponding bai file.
I am now wondering where does the bai file come from? When is it being generated, what code is used for that? When I do multiple actions with my bam file - is the index generated every time? (overhead) Where is the bai saved on disk (if it is)?
I found that in the IUC gatk, a new bai is always generated when a gatk tool is started. This creates quite an overhead. Isn't there a better solution?
@ref https://github.com/galaxyproject/tools-iuc/issues/194
Best, Alexander
Hi Alexander, Am 19.06.2015 um 01:44 schrieb Alexander Vowinkel:
Hi,
when I 'produce' a bam file in my output, I can download the bam file and a corresponding bai file.
I am now wondering where does the bai file come from?
From Galaxy! :) Galaxy has a concept of datatypes and metadata. In the case of BAM files, Galaxy is producing the index file to every position sorted BAM file.
When is it being generated, what code is used for that?
As soon as the BAM file is created. https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary...
When I do multiple actions with my bam file - is the index generated every time? (overhead)
No why should it? The BAM files stays the same right? So if you are using the BAM multiple time the one index is used multiple times.
Where is the bai saved on disk (if it is)?
You can access it with this snipped: $your_bam_file.metadata.bam_index
I found that in the IUC gatk, a new bai is always generated when a gatk tool is started. This creates quite an overhead. Isn't there a better solution?
Do you have an idea? Without Galaxy you also create an index once and reuse it, this is was Galaxy is doing. This concept is also used for other datatypes and improves the usability dramatically. If this was helpful, please consider to ask this question on Biostar again. I think this might be useful for others as well. Cheers, Bjoern
@ref https://github.com/galaxyproject/tools-iuc/issues/194
Best, Alexander
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi, thanks! Let me clarify myself on the multiple generation question. When I do command line, I do this: Bam,Bai -> Prog1 -> Bam,Bai(v2) -> Prog2 -> Bam,Bai(v3) What I understand, what galaxy is doing ("always overwrite" [1]): Bam,Bai -> Prog1 -> Bam,Bai(v2) -> Bam(v2) -> Bam,Bai(v2) -> Prog2 -> Bam,Bai(v3) -> Bam(v3) -> Bam,Bai(v3) Best, Alexander [1] https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary... 2015-06-19 1:26 GMT-05:00 Björn Grüning <bjoern.gruening@gmail.com>:
Hi Alexander,
Am 19.06.2015 um 01:44 schrieb Alexander Vowinkel:
Hi,
when I 'produce' a bam file in my output, I can download the bam file and a corresponding bai file.
I am now wondering where does the bai file come from?
From Galaxy! :) Galaxy has a concept of datatypes and metadata. In the case of BAM files, Galaxy is producing the index file to every position sorted BAM file.
When is it being generated, what code is used for that?
As soon as the BAM file is created.
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary...
When I do multiple actions with my bam file - is the index generated every time? (overhead)
No why should it? The BAM files stays the same right? So if you are using the BAM multiple time the one index is used multiple times.
Where is the bai saved on disk (if it is)?
You can access it with this snipped: $your_bam_file.metadata.bam_index
I found that in the IUC gatk, a new bai is always generated when a gatk tool is started. This creates quite an overhead. Isn't there a better solution?
Do you have an idea? Without Galaxy you also create an index once and reuse it, this is was Galaxy is doing. This concept is also used for other datatypes and improves the usability dramatically.
If this was helpful, please consider to ask this question on Biostar again. I think this might be useful for others as well.
Cheers, Bjoern
@ref https://github.com/galaxyproject/tools-iuc/issues/194
Best, Alexander
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Alexander Vowinkel
-
Björn Grüning