On Thu, Jul 4, 2013 at 9:49 PM, Robert Baertsch <robert.baertsch@gmail.com> wrote:
Dan, Do these readers support gzip files?
reader = fastqVerboseErrorReader reader = fastqReader
Presumably you are writing a Python script using this library? The answer is a qualified yes. Instead of passing them a normal file handle using open("example.fastq") you instead use gzip.open("example.fastq") via import gzip.
Do I have to define a special type in galaxy for gzipped files or will the fastq type be ok?
This needs a special file format - but you are not the first person to look at this, some groups have defined custom gzipped variants of the FASTQ formats within their own Galaxy instances. I've not done this but there should be some useful emails in the archive. Note you'd also need to modify any tool definitions to that they can accept a gzipped FASTQ file.
Ideally, I would like to keep my files zipped and not have galaxy unzip them, since they triple in size when unzipped.
I'm happy to do a push request if you don't support this but I want to make sure I'm in line with your roadmap.
Personally I would like a more general system in Galaxy for potentially any file type to be held compressed in a range of formats (e.g. using gzip, bgzf, xy, bz2, etc), with exclusions for things like BAM which are already compressed. This way naive tools would get the gzipped file file uncompressed to a temporary folder before use (i.e. no change for the tool wrapper), but if a tool accepts a gzipped file it will get that (less disk IO and CPU usage, but requires updating tool wrappers). That idea is quite ambitious through ;)
I have written a simple tool to convert Illumina fastq to mapsplice fastq. Does that already exist already somewhere?
I don't know. Peter