On Tue, Nov 8, 2011 at 11:45 PM, Duddy, John <jduddy(a)illumina.com> wrote:
It's not public yet, and it involves a little conundrum - we
want
it so we can support large amounts of data efficiently on a variety
of aligners, including our ELAND from CASAVA. However, ELAND
does not support unaligned BAM inputs yet, and apparently it
would be a lot of work to make it so (and another team's area
of responsibility as well).
OK, so using (unaligned) BAM isn't about to happen.
So in the near term, BGZF would not meet our needs.
I don't follow you there, BAM != BGZF.
We can use BGZF to compress FASTQ, FASTA, GenBank,
basically anything. You get compression approaching that
of plain GZIP (depending on the characteristics of the data)
plus efficient random access.
However, work is quite far along on a GZIP-based one
that works with ELAND and BWA, since they both read
GZIP FASTQ files, and works/will work with a converter
to fastq_sanger for other tools.
I can put you in touch with the engineer doing the work if
you are interested.
That might be a good idea, or ask them to post here?
Peter