On Mon, Jul 8, 2013 at 11:21 PM, Robert Baertsch <rbaertsc(a)ucsc.edu> wrote:
I respectfully disagree, If you want an extensible system, you
always wrap primitive system level calls.
Any tools that opens a file that could be compressed would be affected.
That is a huge number of tools. Do you really want a cottage industry of
tools that have different methods of dealing with compression?
But defining a Python helper function within the Galaxy Python
libraries doesn't achieve that.
Are you talking about patching the OS level POSIX open functions
or something? The tools available in Galaxy are written in a range
of languages including C, Perl, R, etc. Yes, some are in Python,
but of those most are independent of Galaxy and can be used
separately from Galaxy.
Encoding the gzip status in the datatype will create an explosion of
datatypes. Compression is not actually a datatype, it tells you nothing
about the content data that is stored in the file.
What we'd previously discussed was a dual system, holding
the file type as now (e.g. FASTA, SAM, GFF3, etc) and any
compression (e.g., None, normal GZIP, BGZF which is a
GZIP variant, BZIP2, etc).
Galaxy tool wrappers currently define input files with a list
of file types - they'd also have to give a list of supported
compression types (defaulting to none). Likewise for any
output files - if they are already compressed the XML for
the tool wrapper would have to tell Galaxy this.
It is up to the galaxy team to provide a standard way to interact
with compressed files.
That is my preference too - although this could be driven by
the Galaxy community rather than the core team? I see
defining new datatypes like 'gzippedfastq' as a stop gap
special case (but a very practical route for now).
My proposed solution, is a very small change that could
be phased in over time. Any tools that uses open would not support
compressed files, but they would not break on uncompressed files.
Do others have an opinion?
Either I don't understand your plan, or it would only help in
a tiny minority of cases.