On Saturday, September 3, 2011, Edward Kirton <eskirton@lbl.gov> wrote:
> of course there is a computational cost to compressing/uncompressing
> files but that's probably better than storing unnecessarily huge
> files.  it's a trade-off.

It may still be faster due to less IO, probably depends on your hardware.

> since i'm rapidly running out of storage, i think the best immediate
> solution for me is to deprecate all the fastq datatypes in favor of a
> new fastqsangergz and to bundle the read qc tools to eliminate
> intermediate files.  sure, users won't be able to play around with
> their data as much, but my disk is 88% full and my cluster has been
> 100% occupied for 2-months straight, so less choice is probably
> better.

In your position I agree that is a pragmatic choice. You might be able to modify the file upload code to gzip any FASTQ files... that would prevent uncompressed FASTQ getting into new histories.

I wonder if Galaxy would benefit from a new fastqsanger-gzip (etc) datatype? However this seems generally useful (not just for FASTQ) so perhaps a more general mechanism would be better where tool XML files can say which file types they accept and which of those can/must be compressed (possily not just gzip format?).

Peter