Hello,
I'd like to request/suggest a feature:
Semi-Transparent support for compressed files.
The feature requires four (tiny) patches (detailed below).
With this feature, dataset files (/database/files/NNN/dataset_NNNN.dat)
can be stored compressed, and their content will be automatically
'peeked' in the preview window.
Additionally, when a user clicks 'save' or 'eye icon', they will be
uncompressed on-the-fly - so the user doesn't need to know/care they are
compressed.
Of course, there's the whole issue of making the different tools read
and write compressed files - but that's another story.
It's actually not too complicated story:
In Python, just call gzip.open instead of open.
In shell scripts, pipe the input file through "zcat -f FILE | program".
In Perl, use PerlIO::Gzip module.
Comments are welcomed,
Regards,
Gordon.
First Patch -
Adding a function to "util" module, which returns a Gzip/Bzip2/Zip File
object (or a plain File object) based on the file type.
File type detection is done using the 'magic' module - I think it is
quite standard (in ubuntu I got it with "apt-get install python-magic").
However, to get galaxy to find this module I had to remove the "-ES"
from "run.sh" - I'm sure there's a better way to do it.
====================================================================
--- ./lib/galaxy/util/__init__.orig.py 2008-12-26 23:48:40.000000000 -0500
+++ ./lib/galaxy/util/__init__.py 2008-12-27 00:31:44.000000000 -0500
@@ -14,11 +14,41 @@ from galaxy.util.docutils_ext.htmlfrag i
pkg_resources.require( 'elementtree' )
from elementtree import ElementTree
+import magic # file detection
+import gzip # allow peeking into compressed files
+import bz2
+import zipfile
+
log = logging.getLogger(__name__)
_lock = threading.RLock()
gzip_magic = '\037\213'
+# Magic file detection
+magic_file = magic.open(magic.MAGIC_MIME)
+try:
+ magic_file.load()
+except:
+ magic_file = None
+
+def open_file_wrapper(filename):
+ file_mime = ""
+ if magic_file is not None:
+ try:
+ file_mime = magic_file.file(filename)
+ except:
+ file_mime = ""
+ if file_mime == "application/x-gzip":
+ return gzip.open(filename)
+ if file_mime == "application/x-bzip2":
+ return bz2.BZ2File(filename)
+ if file_mime == "appication/x-zip":
+ return zipfile.ZipFile(filename)
+
+ #for all other mime types, return the raw file
+ return file(filename)
+
+
def synchronized(func):
"""This wrapper will serialize access to 'func' to a single
thread. Use it as a decorator."""
def caller(*params, **kparams):
====================================================================
Second Patch -
In the 'display' action of the root web controller, return the file with
the appropriate wrapper
====================================================================
--- ./lib/galaxy/web/controllers/root_orig.py 2008-12-26
23:56:01.000000000 -0500
+++ ./lib/galaxy/web/controllers/root.py 2008-12-27 00:35:43.000000000 -0500
@@ -153,7 +153,7 @@ class RootController( BaseController ):
m1 = trans.app.memory_usage.memory( m0, pretty=True )
log.info( "End of root/display, memory used increased
by %s" % m1 )
try:
- return open( data.file_name )
+ return util.open_file_wrapper( data.file_name )
except:
return "This dataset contains no content"
else:
====================================================================
Third patch -
In the BaseController object, allow streaming on compressed files (not
just types.FileTypes):
====================================================================
--- ./lib/galaxy/web/framework/base_orig.py 2008-12-27
00:41:38.000000000 -0500
+++ ./lib/galaxy/web/framework/base.py 2008-12-27 00:41:37.000000000 -0500
@@ -25,6 +25,11 @@ from paste.response import HeaderDict
# For FieldStorage
import cgi
+# For auto-decompressing files
+import gzip
+import bz2
+import zipfile
+
log = logging.getLogger( __name__ )
class WebApplication( object ):
@@ -133,7 +138,7 @@ class WebApplication( object ):
if callable( body ):
# Assume the callable is another WSGI application to run
return body( environ, start_response )
- elif isinstance( body, types.FileType ):
+ elif isinstance( body, (types.FileType, gzip.GzipFile,
bz2.BZ2File, zipfile.ZipFile) ):
# Stream the file back to the browser
return send_file( start_response, trans, body )
else:
====================================================================
Fourth Patch -
In the generic Data datatype object, replace the file object with a
compressed file object in the peek function:
====================================================================
--- ./lib/galaxy/datatypes/data.py 2008-12-26 23:34:15.000000000 -0500
+++ ./lib/galaxy/datatypes/data_orig.py 2008-12-26 23:21:41.000000000 -0500
@@ -332,7 +332,7 @@ def get_file_peek( file_name, WIDTH=256,
count = 0
file_type = ''
data_checked = False
- for line in util.open_file_wrapper( file_name ):
+ for line in file( file_name ):
line = line[ :WIDTH ]
if not data_checked and line:
data_checked = True
====================================================================