Hello! For our future galaxy users in Netherlands Bioinformatics Centre we'd like to support Excel uploads. We have our own excel to tabular CLI tool and I have been trying to get it working with galaxy. The tool itself is not the problem, but getting excel files uploaded is… We're running a local instance (hg identify: 949e4f5fa03a+ tip) on Mac for development purposes. The issues with excel are two-fold: XLSX files get unzipped automatically and a useless XML file remains, while uploading an XLS file results in: "The uploaded binary file contains inappropriate content". I've tried adding <datatype extension="xls" type="galaxy.datatypes.binary:Binary" display_in_upload="true"/> <datatype extension="xlsx" type="galaxy.datatypes.binary:Binary" display_in_upload="true"/> to datatypes_conf.xml and selecting XLS and XLSX as datatype during upload but to no avail (the errors don't change). A temporary workaround we thought of was to first zip the files before uploading, that way Galaxy would unzip them and we'd be left with the raw excel files. At first this seemed to work but the conversion did not. Furthermore, downloading the files and trying to open them failed. A quick 'diff' between the original and mangled files show differences practically throughout the whole file! Now, my questions are as follows. Has work been underway to support excel natively? Is there a way to have Galaxy simply accept uploaded files without any interpretation? What happens inside Galaxy that corrupts the excel files during unpacking? Thank you very much and I apologize if I somehow missed something obvious or if these questions have been asked before. Kind regards, Siemen Sikkema
Hello Siemen, You'll have to treat Excel files as a specific data type, and add support for that data type to Galaxy - see our wiki at http://wiki.g2.bx.psu.edu/Admin/Datatypes for details on how to add support for a new data type. You can implement the Excel class in such a way that nothing gets munged, which is undoubtedly what is now happening when you upload the file because Galaxy sniffs it as some other data type. I'm not sure, but perhaps others in the community have done work in this area. Greg Von Kuster On Dec 8, 2011, at 4:11 AM, Siemen Sikkema wrote:
Hello!
For our future galaxy users in Netherlands Bioinformatics Centre we'd like to support Excel uploads. We have our own excel to tabular CLI tool and I have been trying to get it working with galaxy. The tool itself is not the problem, but getting excel files uploaded is…
We're running a local instance (hg identify: 949e4f5fa03a+ tip) on Mac for development purposes.
The issues with excel are two-fold: XLSX files get unzipped automatically and a useless XML file remains, while uploading an XLS file results in: "The uploaded binary file contains inappropriate content". I've tried adding
<datatype extension="xls" type="galaxy.datatypes.binary:Binary" display_in_upload="true"/> <datatype extension="xlsx" type="galaxy.datatypes.binary:Binary" display_in_upload="true"/>
to datatypes_conf.xml and selecting XLS and XLSX as datatype during upload but to no avail (the errors don't change).
A temporary workaround we thought of was to first zip the files before uploading, that way Galaxy would unzip them and we'd be left with the raw excel files. At first this seemed to work but the conversion did not. Furthermore, downloading the files and trying to open them failed. A quick 'diff' between the original and mangled files show differences practically throughout the whole file!
Now, my questions are as follows. Has work been underway to support excel natively? Is there a way to have Galaxy simply accept uploaded files without any interpretation? What happens inside Galaxy that corrupts the excel files during unpacking?
Thank you very much and I apologize if I somehow missed something obvious or if these questions have been asked before.
Kind regards, Siemen Sikkema ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Hey Siemen, A workaround i'm currently using consists of creating an Xls subclass of the Binary datatype class: In *galaxy-dist/lib/galaxy/datatypes/binary.py* add xls to * unsniffable_binary_formats* like this: *unsniffable_binary_formats = [ 'ab1', 'scf' , 'xls' ]* Also add the following class at the very end of the binary.py file: *class Xls(Binary):* * '''Class describing an Excel (xls) file'''* * * * file_ext = 'xls'* Then in your 'datatypes_conf.xml' add the following line (don't forget to comment out or remove other xls datatype lines): *<datatype extension="xls" type="galaxy.datatypes.binary:Xls" display_in_upload="true"/>* Then restart galaxy and when you upload an excel file you must explicitly select 'xls' for the 'File Format' (I personally don't know how to get 'Auto-detect' to work for added datatypes such as xls). If you have any questions about this implementation let me know. Good luck. Joe Cruz On Thu, Dec 8, 2011 at 3:11 AM, Siemen Sikkema <s.h.sikkema@gmail.com>wrote:
Hello!
For our future galaxy users in Netherlands Bioinformatics Centre we'd like to support Excel uploads. We have our own excel to tabular CLI tool and I have been trying to get it working with galaxy. The tool itself is not the problem, but getting excel files uploaded is…
We're running a local instance (hg identify: 949e4f5fa03a+ tip) on Mac for development purposes.
The issues with excel are two-fold: XLSX files get unzipped automatically and a useless XML file remains, while uploading an XLS file results in: "The uploaded binary file contains inappropriate content". I've tried adding
<datatype extension="xls" type="galaxy.datatypes.binary:Binary" display_in_upload="true"/> <datatype extension="xlsx" type="galaxy.datatypes.binary:Binary" display_in_upload="true"/>
to datatypes_conf.xml and selecting XLS and XLSX as datatype during upload but to no avail (the errors don't change).
A temporary workaround we thought of was to first zip the files before uploading, that way Galaxy would unzip them and we'd be left with the raw excel files. At first this seemed to work but the conversion did not. Furthermore, downloading the files and trying to open them failed. A quick 'diff' between the original and mangled files show differences practically throughout the whole file!
Now, my questions are as follows. Has work been underway to support excel natively? Is there a way to have Galaxy simply accept uploaded files without any interpretation? What happens inside Galaxy that corrupts the excel files during unpacking?
Thank you very much and I apologize if I somehow missed something obvious or if these questions have been asked before.
Kind regards, Siemen Sikkema ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Thu, Dec 8, 2011 at 4:29 PM, Joe Cruz <jcruz7@gmail.com> wrote:
Hey Siemen,
A workaround i'm currently using consists of creating an Xls subclass of the Binary datatype class:
That's not a workaround ;) - that's the correct approach, as per Greg's email. Not that I want to encourage people to use Excel files in Bioinformatics, but perhaps this should be applied to the main Galaxy repository - given at least two groups are using Excel files in Galaxy? Peter P.S. Why do you call your class XIs, surely Excel would be clearer?
+1 I'd agree that calling it excel is suitably wry - because it doesn't IMHO. Please name the new binary datatype something other than xls because many (of my) tools create outputs with that extension and it's a subclass of tabular here - the .xls extension fools excel into loading them automagically. On Fri, Dec 9, 2011 at 3:35 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Thu, Dec 8, 2011 at 4:29 PM, Joe Cruz <jcruz7@gmail.com> wrote:
Hey Siemen,
A workaround i'm currently using consists of creating an Xls subclass of the Binary datatype class:
That's not a workaround ;) - that's the correct approach, as per Greg's email.
Not that I want to encourage people to use Excel files in Bioinformatics, but perhaps this should be applied to the main Galaxy repository - given at least two groups are using Excel files in Galaxy?
Peter
P.S. Why do you call your class XIs, surely Excel would be clearer? ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;
On Fri, Dec 9, 2011 at 3:35 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
P.S. Why do you call your class XIs, surely Excel would be clearer?
I've only just realised Xls is title case xls (as in the extension), I read it at the title case of xis (which made no sense). I still think using Excel as the class name would be clearer. On Thu, Dec 8, 2011 at 9:01 PM, Ross <ross.lazarus@gmail.com> wrote:
+1 I'd agree that calling it excel is suitably wry - because it doesn't IMHO. Please name the new binary datatype something other than xls because many (of my) tools create outputs with that extension and it's a subclass of tabular here - the .xls extension fools excel into loading them automagically.
I don't entirely agree with your logic, but I would agree that as the format name it would be consistent with most of the other Galaxy formats to go for a longer name (e.g. "excel") over a short 3 letter extension (here "xls"). However, does this matter for the extension that Galaxy gives on downloading the dataset, or is that all done via the mime type? We want it to be easy for people to download an Excel file from Galaxy and open it, which means getting the extension right under Windows. Peter
Hi All, Thanks for the responses! I just got around to trying the solution by Joe Cruz. This did the trick for xls files, so that's the good news! The slightly less good news is that the same does not hold for xlsx files. Those files are pure zip files containing lots of internal xml stuff. Even when following the suggested procedure, Galaxy insists that it knows how to handle those files: by unpacking them and returning the first file from the zip (in this case a pretty useless xml file). On a more general note, having to do these tricks can restrict sharing of new tools a bit in my opinion. It would be great if there would be an easy way to tell Galaxy: "leave files of this type as is, I can handle it" :P Siemen On Thu, Dec 8, 2011 at 10:18 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Fri, Dec 9, 2011 at 3:35 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
P.S. Why do you call your class XIs, surely Excel would be clearer?
I've only just realised Xls is title case xls (as in the extension), I read it at the title case of xis (which made no sense). I still think using Excel as the class name would be clearer.
On Thu, Dec 8, 2011 at 9:01 PM, Ross <ross.lazarus@gmail.com> wrote:
+1 I'd agree that calling it excel is suitably wry - because it doesn't IMHO. Please name the new binary datatype something other than xls because many (of my) tools create outputs with that extension and it's a subclass of tabular here - the .xls extension fools excel into loading them automagically.
I don't entirely agree with your logic, but I would agree that as the format name it would be consistent with most of the other Galaxy formats to go for a longer name (e.g. "excel") over a short 3 letter extension (here "xls").
However, does this matter for the extension that Galaxy gives on downloading the dataset, or is that all done via the mime type? We want it to be easy for people to download an Excel file from Galaxy and open it, which means getting the extension right under Windows.
Peter
On Thu, Dec 15, 2011 at 10:13 AM, Siemen Henk Sikkema <s.h.sikkema@gmail.com> wrote:
Hi All,
Thanks for the responses! I just got around to trying the solution by Joe Cruz. This did the trick for xls files, so that's the good news! The slightly less good news is that the same does not hold for xlsx files. Those files are pure zip files containing lots of internal xml stuff. Even when following the suggested procedure, Galaxy insists that it knows how to handle those files: by unpacking them and returning the first file from the zip (in this case a pretty useless xml file).
On a more general note, having to do these tricks can restrict sharing of new tools a bit in my opinion. It would be great if there would be an easy way to tell Galaxy: "leave files of this type as is, I can handle it" :P
Siemen
The easiest way is to contribute your new file type to the main Galaxy repository. They've said if it is reasonably general they're OK with including more file types. Peter
I'm fairly far down the implementation path for supporting inclusion of new data types in tool shed repositories. Unless I get steered into another direction, I should have this documented in early January. On Dec 15, 2011, at 5:26 AM, Peter Cock wrote:
On Thu, Dec 15, 2011 at 10:13 AM, Siemen Henk Sikkema <s.h.sikkema@gmail.com> wrote:
Hi All,
Thanks for the responses! I just got around to trying the solution by Joe Cruz. This did the trick for xls files, so that's the good news! The slightly less good news is that the same does not hold for xlsx files. Those files are pure zip files containing lots of internal xml stuff. Even when following the suggested procedure, Galaxy insists that it knows how to handle those files: by unpacking them and returning the first file from the zip (in this case a pretty useless xml file).
On a more general note, having to do these tricks can restrict sharing of new tools a bit in my opinion. It would be great if there would be an easy way to tell Galaxy: "leave files of this type as is, I can handle it" :P
Siemen
The easiest way is to contribute your new file type to the main Galaxy repository. They've said if it is reasonably general they're OK with including more file types.
Peter
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Thu, Dec 15, 2011 at 12:12 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
I'm fairly far down the implementation path for supporting inclusion of new data types in tool shed repositories. Unless I get steered into another direction, I should have this documented in early January.
Right, and great for niche formats, but I thought we'd agreed that "common" datatypes should go into the core to avoid multiple conflicting/independent definitions? Peter
Yes, I was just providing the status of support for those niche formats. On Dec 15, 2011, at 8:41 AM, Peter Cock wrote:
On Thu, Dec 15, 2011 at 12:12 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
I'm fairly far down the implementation path for supporting inclusion of new data types in tool shed repositories. Unless I get steered into another direction, I should have this documented in early January.
Right, and great for niche formats, but I thought we'd agreed that "common" datatypes should go into the core to avoid multiple conflicting/independent definitions?
Peter
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
participants (6)
-
Greg Von Kuster
-
Joe Cruz
-
Peter Cock
-
Ross
-
Siemen Henk Sikkema
-
Siemen Sikkema