Ahh - sorry. I finally found the format specification for BGZF in the SAM format specification, and it seems that it is 100% GZIP-compatible. There is still the issue of needing an external file index, since all BGZF seems to give you is the size of the compressed block, not anything format-specific, like the number of sequences in the block.
In any case, whether it's GZIP or BGZF, it seems the solutions are very similar, and porting my work should be pretty simple - I just used larger blocks and put all the data in the index file and none in the headers.
John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: firstname.lastname@example.org
-----Original Message----- From: Peter Cock [mailto:email@example.com] Sent: Tuesday, November 08, 2011 4:04 PM To: Duddy, John Cc: Greg Von Kuster; firstname.lastname@example.org; Nate Coraor Subject: Re: [galaxy-dev] Tool shed and datatypes
On Tue, Nov 8, 2011 at 11:45 PM, Duddy, John email@example.com wrote:
It's not public yet, and it involves a little conundrum - we want it so we can support large amounts of data efficiently on a variety of aligners, including our ELAND from CASAVA. However, ELAND does not support unaligned BAM inputs yet, and apparently it would be a lot of work to make it so (and another team's area of responsibility as well).
OK, so using (unaligned) BAM isn't about to happen.
So in the near term, BGZF would not meet our needs.
I don't follow you there, BAM != BGZF.
We can use BGZF to compress FASTQ, FASTA, GenBank, basically anything. You get compression approaching that of plain GZIP (depending on the characteristics of the data) plus efficient random access.
However, work is quite far along on a GZIP-based one that works with ELAND and BWA, since they both read GZIP FASTQ files, and works/will work with a converter to fastq_sanger for other tools.
I can put you in touch with the engineer doing the work if you are interested.
That might be a good idea, or ask them to post here?