I respectfully disagree, If you want an extensible system, you should always wrap primitive system level calls.
Any tools that opens a file that could be compressed would be affected. That is a huge number of tools. Do you really want a cottage industry of tools that have different methods of dealing with compression?
Encoding the gzip status in the datatype will create an explosion of datatypes. Compression is not actually a datatype, it tells you nothing about the content data that is stored in the file.
It is up to the galaxy team to provide a standard way to interact with compressed files. My proposed solution, is a very small change that could be phased in over time. Any tools that uses open would not support compressed files, but they would not break on uncompressed files.
Do others have an opinion?
On Jul 8, 2013, at 2:58 PM, Peter Cock wrote:
On Mon, Jul 8, 2013 at 10:24 PM, Robert Baertsch firstname.lastname@example.org wrote:
Peter and Dan, I like the idea of replacing all open() with galaxy_open() in all tools. You can tell the format by looking at the first 4 byes (see C code below from the UCSC browser team). Is there some pythonic way of overriding open?
There is monkey patching (replace the current 'open' function with your modified version), but that is not a good idea in general.
In any case, this would only affect the small number of Python tools which happen to use the Galaxy parsing libraries - which is a very small fraction of the tools in Galaxy. Most of the tools in Galaxy are compiled programs and are entirely separate.