On Mon, Jul 8, 2013 at 11:21 PM, Robert Baertsch <rbaertsc@ucsc.edu> wrote:
I respectfully disagree, If you want an extensible system, you should always wrap primitive system level calls.
Any tools that opens a file that could be compressed would be affected. That is a huge number of tools. Do you really want a cottage industry of tools that have different methods of dealing with compression?
But defining a Python helper function within the Galaxy Python libraries doesn't achieve that. Are you talking about patching the OS level POSIX open functions or something? The tools available in Galaxy are written in a range of languages including C, Perl, R, etc. Yes, some are in Python, but of those most are independent of Galaxy and can be used separately from Galaxy.
Encoding the gzip status in the datatype will create an explosion of datatypes. Compression is not actually a datatype, it tells you nothing about the content data that is stored in the file.
What we'd previously discussed was a dual system, holding the file type as now (e.g. FASTA, SAM, GFF3, etc) and any compression (e.g., None, normal GZIP, BGZF which is a GZIP variant, BZIP2, etc). Galaxy tool wrappers currently define input files with a list of file types - they'd also have to give a list of supported compression types (defaulting to none). Likewise for any output files - if they are already compressed the XML for the tool wrapper would have to tell Galaxy this.
It is up to the galaxy team to provide a standard way to interact with compressed files.
That is my preference too - although this could be driven by the Galaxy community rather than the core team? I see defining new datatypes like 'gzippedfastq' as a stop gap special case (but a very practical route for now).
My proposed solution, is a very small change that could be phased in over time. Any tools that uses open would not support compressed files, but they would not break on uncompressed files.
Do others have an opinion?
Either I don't understand your plan, or it would only help in a tiny minority of cases. Regards, Peter