I respectfully disagree, If you want an extensible system, you should always wrap
primitive system level calls.
Any tools that opens a file that could be compressed would be affected. That is a huge
number of tools. Do you really want a cottage industry of tools that have different
methods of dealing with compression?
Encoding the gzip status in the datatype will create an explosion of datatypes.
Compression is not actually a datatype, it tells you nothing about the content data that
is stored in the file.
It is up to the galaxy team to provide a standard way to interact with compressed files.
My proposed solution, is a very small change that could be phased in over time. Any tools
that uses open would not support compressed files, but they would not break on
uncompressed files.
Do others have an opinion?
On Jul 8, 2013, at 2:58 PM, Peter Cock wrote:
On Mon, Jul 8, 2013 at 10:24 PM, Robert Baertsch
<robert.baertsch(a)gmail.com> wrote:
> Peter and Dan,
> I like the idea of replacing all open() with galaxy_open() in all tools. You
> can tell the format by looking at the first 4 byes (see C code below from
> the UCSC browser team). Is there some pythonic way of overriding open?
There is monkey patching (replace the current 'open' function with
your modified version), but that is not a good idea in general.
In any case, this would only affect the small number of Python
tools which happen to use the Galaxy parsing libraries - which
is a very small fraction of the tools in Galaxy. Most of the tools
in Galaxy are compiled programs and are entirely separate.
Peter