An alternative approach is to utilize a binary file format specifically
designed for sequence data in a compact manner. An example of that is
Sequence Read Format (SRF). SRF has been incorporated into the Illumina and
Helicos pipelines and will be available for the AB platform shortly. SRF
includes support for compression using several schemes including ZLIB.
This thread has been captured on the genographia website and I've commented
there. There is also a link to more information on SRF. Note: in terms of
implementation, there is a C version (most complete), a C++ prototype (with
a complete C++ implementation coming soon) and an early Java implementation.
------ Forwarded Message
From: Assaf Gordon <gordon(a)cshl.edu>
Date: Wed, 4 Jun 2008 16:26:30 -0700
Subject: [galaxy-user] Storing compressed data files
As users add more and more data files to our galaxy server, disk space
becomes a problem...
What I'd ultimately like is to store the data files in some compressed
manner (at least some of the textual files), how would you suggest to do
A common scenario is:
1. User uploads a big Fastq/solexa file (=> 1.2 GB)
2. FASTQ file converted to FASTA file (=> 0.6 GB)
3. FASTA file trimmed, clipped, stripped, etc. (=> 100 MB)
4. BLAT, Histograms and other reports (=> ~50 MB)
The first three data sets take about 1.9 GB of disk space - and aren't
really needed by the user (as he/she is mostly interested in the
resulting report files). Since these are textual files, they compress
Currently, I store the FASTQ gzip'ed in galaxy, and my tools know how to
read gzip'ed data.
There are two shortcomings with this method:
1. datasets (green squares) of gzip'ed files don't display any data in
the peek window
2. Other galaxy tools which require FASTQ file as input can't read my file.
Perl has an I/O module (PerlIO::gzip) which makes reading gzipped files
transparent to the rest of the program. I think python has something
very similar (http://www.python.org/doc/lib/module-gzip.html
If it's not too much to ask, would it be possible to add support for
reading gzip'ed files ? At least in the peek/preview window ?
Comments are welcomed,
galaxy-user mailing list
------ End of Forwarded Message