On Tue, Nov 8, 2011 at 11:45 PM, Duddy, John <jduddy@illumina.com> wrote:
It's not public yet, and it involves a little conundrum - we want it so we can support large amounts of data efficiently on a variety of aligners, including our ELAND from CASAVA. However, ELAND does not support unaligned BAM inputs yet, and apparently it would be a lot of work to make it so (and another team's area of responsibility as well).
OK, so using (unaligned) BAM isn't about to happen.
So in the near term, BGZF would not meet our needs.
I don't follow you there, BAM != BGZF. We can use BGZF to compress FASTQ, FASTA, GenBank, basically anything. You get compression approaching that of plain GZIP (depending on the characteristics of the data) plus efficient random access.
However, work is quite far along on a GZIP-based one that works with ELAND and BWA, since they both read GZIP FASTQ files, and works/will work with a converter to fastq_sanger for other tools.
I can put you in touch with the engineer doing the work if you are interested.
That might be a good idea, or ask them to post here? Peter