How to retain files compressed

28 Feb 2011

      Hi all!

I want to load a R-workspace within a galaxy module (.rdat-file,
R-Project) and therefore built the galaxy-.rdat datatype (binary).
.rdat-files are gzipped and are only recognized within R if they are
still zipped.
However, the corresponding .dat-file is an uncompressed version of the
original .rdat file as I figured out using a hex-editor.
I couldn't find any documentation how to change this behaviour, nor
answers to similar Questions in this list.

Would be happy for any answere that points me in the right direction.

Details
#####

datatypes_conf.xml:
-----------------------------------

<?xml version="1.0"?>
<datatypes>
    <registration converters_path="lib/galaxy/datatypes/converters"
display_path="display_applications">
[...]
        <datatype extension="rdat" type="galaxy.datatypes.binary:Rdat"
mimetype="application/octet-stream" display_in_upload="true"/>       
[...]
   </registration>
<sniffers>
[...]
        <sniffer type="galaxy.datatypes.binary:Rdat"/>       
[...]
    </sniffers>
</datatypes>

binary.py:
------------------

[...]
class Rdat( Binary ):
    """Class describing an rdat binary file (R-workspace)"""
    file_ext = "rdat"
    #MetadataElement( name="Rdat", desc="R-workspace",
param=metadata.FileParameter, readonly=True, no_value=None,
visible=False, optional=True )

    """
    def __init__( self, **kwd ):
        Binary.__init__( self, **kwd )       
        self._name = "Rdat"
    """

    def set_peek( self, dataset, is_multi_byte=False ):
        if not dataset.dataset.purged:
            dataset.peek  = "Binary rdat file (R-workspace)"
            dataset.blurb = data.nice_size( dataset.get_size() )
        else:
            dataset.peek = 'file does not exist'
            dataset.blurb = 'file purged from disk'
    def display_peek( self, dataset ):
        try:
            return dataset.peek
        except:
            return "Binary rdat file (%s)" % ( data.nice_size(
dataset.get_size() ) )
    def get_mime( self ):
        """Returns the mime type of the datatype"""
        return 'application/octet-stream'
    def sniff( self, filename ):
        # rdat is compressed in the gzip format, and must not be
uncompressed in Galaxy.
        # The first 4 bytes of any rdat file are RDX2
        try:
            header = gzip.open( filename ).read(4) #(4)=>4Bytes
            if binascii.b2a_hex( header ) == binascii.hexlify( 'RDX2' ):
#check if there is the RDX2 signature
                return True
            return False
        except:
        return False
            try:
                header = open( filename ).read(4) #(4)=>4Bytes
                if binascii.b2a_hex( header ) == binascii.hexlify(
'RDX2' ): #check if there is the RDX2 signature
                    return True
                return False
            except:
                return False  

-- 
Dr. Christian Hundsrucker
Institute for Functional Genomics
Computational Diagnostics Group
University of Regensburg
Josef Engertstr. 9 
93053 Regensburg, Germany

Christian Hundsrucker

Nate Coraor

tags

participants (2)