Hello all,
What is the current status in Galaxy for supporting compressed files?
We've talked about this before, for example in addition to FASTQ, many of us have expressed a wish to work with gzipped FASTQ. I understand that some have customized their local Galaxy installations to use gzipped FASTQ as a specific data type - I'm more interested in a general file format neutral solution.
Also, I'd like to be able to used BGZF (not just GZIP) because it is better for random access - see for example http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html - and makes it much easier to break up large datafiles for sharing over a cluster (i.e. it could be exploited in the current Galaxy code for splitting large sequence files).
The 11 May 2012 Galaxy Development News Brief http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-May/009757.html mentions tabix indexing - that uses bgzip, so is there something general in place yet to allow tool wrappers to say they accept not just given file formats, but different compressed versions of file formats?
Ideally I'd like to be able to write an XML tool description saying a tool produced BGZF compressed tabular data, or GZIP compressed Sanger FASTQ etc. Similarly, I'd like to specify my tool accepts FASTA or gzipped FASTA (including BGZF FASTA). While for older tools if they say they accept only uncompressed FASTA, Galaxy could automatically decompress any compressed FASTA entries in my history on demand.
Peter