Upload of images - jpg/png - "binary file contains inappropriate content"
Hi all, I would like to be able to upload images to my Galaxy instance - in particular jpg/png to data libraries. I can't however find out how to overcome the problem of inappropriate content in binary file. How to go about? When I create a sniffer class in the galaxy.datatypes.images.py that always returns true (for testing purposes) I still don't get the jpg uploaded without an error (sniffer is incl. in the datatypes_conf.xml ). class Jpg( Image ): file_ext = "jpg" def sniff(self, filename): """Determine if the file is in jpg format. """ return True It seems I'm missing something obvious and could use some help. Cheers, Jelle
Hello Jelle, There are a few things you need to do to add support for a new data type. The steps are described here: http://wiki.g2.bx.psu.edu/Admin/Datatypes/Adding%20Datatypes Greg Von Kuster On Jul 22, 2011, at 8:49 AM, Jelle Scholtalbers wrote:
Hi all,
I would like to be able to upload images to my Galaxy instance - in particular jpg/png to data libraries. I can't however find out how to overcome the problem of inappropriate content in binary file. How to go about?
When I create a sniffer class in the galaxy.datatypes.images.py that always returns true (for testing purposes) I still don't get the jpg uploaded without an error (sniffer is incl. in the datatypes_conf.xml ). class Jpg( Image ): file_ext = "jpg" def sniff(self, filename): """Determine if the file is in jpg format. """ return True
It seems I'm missing something obvious and could use some help.
Cheers, Jelle ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Hello Jelle,
There are a few things you need to do to add support for a new data type. The steps are described here:
http://wiki.g2.bx.psu.edu/Admin/Datatypes/Adding%20Datatypes
Greg Von Kuster
On Jul 22, 2011, at 8:49 AM, Jelle Scholtalbers wrote:
Hi all,
I would like to be able to upload images to my Galaxy instance - in
Hi Greg, I used that link when trying to create the support. The jpg datatype was already present in the datatypes_conf.xml ( <datatype extension="jpg" type="galaxy.datatypes.images:Image" mimetype="image/jpeg"/> ) , so I first made it available at upload -> display_in_upload=True. As far as I understood, if galaxy doesn't have to guess what the format is, this would be sufficient? From step2: "Galaxy tools are configured to automatically set the data type of an output dataset. However, in some scenarios, Galaxy will attempt to determine the data type of a file using a sniffer" However manually setting the format on upload still gave the mentioned error. Therefore I followed the rest of the guide (adding a sniffer) but still seem to fail here.. Cheers, Jelle On Jul 22, 2011 3:26 PM, "Greg Von Kuster" <greg@bx.psu.edu> wrote: particular jpg/png to data libraries. I can't however find out how to overcome the problem of inappropriate content in binary file.
How to go about?
When I create a sniffer class in the galaxy.datatypes.images.py that always returns true (for testing purposes) I still don't get the jpg uploaded without an error (sniffer is incl. in the datatypes_conf.xml ). class Jpg( Image ): file_ext = "jpg" def sniff(self, filename): """Determine if the file is in jpg format. """ return True
It seems I'm missing something obvious and could use some help.
Cheers, Jelle ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Based on the error message you gave, I assume the following code in ~/lib/galaxy/datatypes/sniff.py is presenting the problem. if check_binary( filename ): if ext not in unsniffable_binary_formats and not datatypes_registry.get_datatype_by_extension( ext ).sniff( filename ): raise InappropriateDatasetContentError, 'The binary uploaded file contains inappropriate content.' Have you tried adding your 'jpg' extension to the following in ~/lib/galaxy/datatypes/binary.py? # Currently these supported binary data types must be manually set on upload unsniffable_binary_formats = [ 'ab1', 'scf' ] On Jul 22, 2011, at 11:52 AM, Jelle Scholtalbers wrote:
Hi Greg,
I used that link when trying to create the support. The jpg datatype was already present in the datatypes_conf.xml (
<datatype extension="jpg" type="galaxy.datatypes.images:Image" mimetype="image/jpeg"/> ) , so I first made it available at upload -> display_in_upload=True. As far as I understood, if galaxy doesn't have to guess what the format is, this would be sufficient? From step2:
"Galaxy tools are configured to automatically set the data type of an output dataset. However, in some scenarios, Galaxy will attempt to determine the data type of a file using a sniffer"
However manually setting the format on upload still gave the mentioned error. Therefore I followed the rest of the guide (adding a sniffer) but still seem to fail here..
Cheers, Jelle
On Jul 22, 2011 3:26 PM, "Greg Von Kuster" <greg@bx.psu.edu> wrote:
Hello Jelle,
There are a few things you need to do to add support for a new data type. The steps are described here:
http://wiki.g2.bx.psu.edu/Admin/Datatypes/Adding%20Datatypes
Greg Von Kuster
On Jul 22, 2011, at 8:49 AM, Jelle Scholtalbers wrote:
Hi all,
I would like to be able to upload images to my Galaxy instance - in particular jpg/png to data libraries. I can't however find out how to overcome the problem of inappropriate content in binary file. How to go about?
When I create a sniffer class in the galaxy.datatypes.images.py that always returns true (for testing purposes) I still don't get the jpg uploaded without an error (sniffer is incl. in the datatypes_conf.xml ). class Jpg( Image ): file_ext = "jpg" def sniff(self, filename): """Determine if the file is in jpg format. """ return True
It seems I'm missing something obvious and could use some help.
Cheers, Jelle ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Hi Greg, this was indeed causing the problem. I added some code which now allows to sniff image files. It is practically untested (it does work for me on bmp, jpg, png, tiff) - didn't try with PIL although the code is there. Attached are diffs against changeset 058a5d7a4f84 (bit outdated - can provide newer if desired..). Sniffed formats: http://infohost.nmt.edu/tcc/help/pubs/pil/formats.html or http://docs.python.org/library/imghdr.html depending if PIL is available or not. Cheers, Jelle On Fri, Jul 22, 2011 at 6:46 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Based on the error message you gave, I assume the following code in ~/lib/galaxy/datatypes/sniff.py is presenting the problem.
if check_binary( filename ): if ext not in unsniffable_binary_formats and notdatatypes_registry.get_datatype_by_extension( ext ).sniff( filename ): raise InappropriateDatasetContentError, 'The binary uploaded file contains inappropriate content.'
Have you tried adding your 'jpg' extension to the following in ~/lib/galaxy/datatypes/binary.py?
# Currently these supported binary data types must be manually set on upload unsniffable_binary_formats = [ 'ab1', 'scf' ]
On Jul 22, 2011, at 11:52 AM, Jelle Scholtalbers wrote:
Hi Greg,
I used that link when trying to create the support. The jpg datatype was already present in the datatypes_conf.xml (
<datatype extension="jpg" type="galaxy.datatypes.images:Image" mimetype="image/jpeg"/> )
, so I first made it available at upload -> display_in_upload=True. As far as I understood, if galaxy doesn't have to guess what the format is, this would be sufficient? From step2:
"Galaxy tools are configured to automatically set the data type of an output dataset. However, in some scenarios, Galaxy will attempt to determine the data type of a file using a sniffer"
However manually setting the format on upload still gave the mentioned error. Therefore I followed the rest of the guide (adding a sniffer) but still seem to fail here..
Cheers, Jelle
Hello Jelle,
There are a few things you need to do to add support for a new data type. The steps are described here:
http://wiki.g2.bx.psu.edu/Admin/Datatypes/Adding%20Datatypes
Greg Von Kuster
On Jul 22, 2011, at 8:49 AM, Jelle Scholtalbers wrote:
Hi all,
I would like to be able to upload images to my Galaxy instance - in
On Jul 22, 2011 3:26 PM, "Greg Von Kuster" <greg@bx.psu.edu> wrote: particular jpg/png to data libraries. I can't however find out how to overcome the problem of inappropriate content in binary file.
How to go about?
When I create a sniffer class in the galaxy.datatypes.images.py that always returns true (for testing purposes) I still don't get the jpg uploaded without an error (sniffer is incl. in the datatypes_conf.xml ). class Jpg( Image ): file_ext = "jpg" def sniff(self, filename): """Determine if the file is in jpg format. """ return True
It seems I'm missing something obvious and could use some help.
Cheers, Jelle ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Helo Jelle, I've taken a look at your patches and the code looks good. However, I'm wondering why you find it necessary to upload image files to Galaxy. Do tools exist that take image files as input? If the Galaxy community finds this feature as a good addition to Galaxy, we can probably incorporate it into the distribution. However, we'll need an additional feature before we can do that. We'll need to implement a type of config setting that enables / disables the uploading of image files via the Galaxy api. This can either be an enhancement to the existing "display_in_upload" config setting, or a separate "allow_api_upload" config setting (probably the latter for clarity). I haven't looked too far into what it will take to do this, but if your reasons for needing this feature are beneficial to the community, I can spend some time on it - or if you want to take a pass at it, feel free! Thanks very much for your contributions! Greg Von Kuster On Jul 25, 2011, at 11:23 AM, Jelle Scholtalbers wrote:
Hi Greg,
this was indeed causing the problem. I added some code which now allows to sniff image files. It is practically untested (it does work for me on bmp, jpg, png, tiff) - didn't try with PIL although the code is there. Attached are diffs against changeset 058a5d7a4f84 (bit outdated - can provide newer if desired..).
Sniffed formats: http://infohost.nmt.edu/tcc/help/pubs/pil/formats.html or http://docs.python.org/library/imghdr.html depending if PIL is available or not.
Cheers, Jelle
On Fri, Jul 22, 2011 at 6:46 PM, Greg Von Kuster <greg@bx.psu.edu> wrote: Based on the error message you gave, I assume the following code in ~/lib/galaxy/datatypes/sniff.py is presenting the problem.
if check_binary( filename ): if ext not in unsniffable_binary_formats and not datatypes_registry.get_datatype_by_extension( ext ).sniff( filename ): raise InappropriateDatasetContentError, 'The binary uploaded file contains inappropriate content.'
Have you tried adding your 'jpg' extension to the following in ~/lib/galaxy/datatypes/binary.py?
# Currently these supported binary data types must be manually set on upload unsniffable_binary_formats = [ 'ab1', 'scf' ]
On Jul 22, 2011, at 11:52 AM, Jelle Scholtalbers wrote:
Hi Greg,
I used that link when trying to create the support. The jpg datatype was already present in the datatypes_conf.xml (
<datatype extension="jpg" type="galaxy.datatypes.images:Image" mimetype="image/jpeg"/> ) , so I first made it available at upload -> display_in_upload=True. As far as I understood, if galaxy doesn't have to guess what the format is, this would be sufficient? From step2:
"Galaxy tools are configured to automatically set the data type of an output dataset. However, in some scenarios, Galaxy will attempt to determine the data type of a file using a sniffer"
However manually setting the format on upload still gave the mentioned error. Therefore I followed the rest of the guide (adding a sniffer) but still seem to fail here..
Cheers, Jelle
On Jul 22, 2011 3:26 PM, "Greg Von Kuster" <greg@bx.psu.edu> wrote:
Hello Jelle,
There are a few things you need to do to add support for a new data type. The steps are described here:
http://wiki.g2.bx.psu.edu/Admin/Datatypes/Adding%20Datatypes
Greg Von Kuster
On Jul 22, 2011, at 8:49 AM, Jelle Scholtalbers wrote:
Hi all,
I would like to be able to upload images to my Galaxy instance - in particular jpg/png to data libraries. I can't however find out how to overcome the problem of inappropriate content in binary file. How to go about?
When I create a sniffer class in the galaxy.datatypes.images.py that always returns true (for testing purposes) I still don't get the jpg uploaded without an error (sniffer is incl. in the datatypes_conf.xml ). class Jpg( Image ): file_ext = "jpg" def sniff(self, filename): """Determine if the file is in jpg format. """ return True
It seems I'm missing something obvious and could use some help.
Cheers, Jelle ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
<datatypes_conf.xml.diff><images.py.diff><upload.py.diff>
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Wed, Jul 27, 2011 at 2:28 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Helo Jelle, I've taken a look at your patches and the code looks good. However, I'm wondering why you find it necessary to upload image files to Galaxy. Do tools exist that take image files as input? If the Galaxy community finds this feature as a good addition to Galaxy, we can probably incorporate it into the distribution.
I think it is a nice feature - and can come up with some use cases too. How about future tools for image analysis? e.g. microscope photos of cultures to automatically do cell/organism counting, or plant leaves for pathogenicity assays.
However, we'll need an additional feature before we can do that. We'll need to implement a type of config setting that enables / disables the uploading of image files via the Galaxy api. ...
Why not just allow it? A general configuration allowing/blocking any file type from being uploaded makes more sense to me than just wanting to block images. Peter
Hi Peter, On Jul 27, 2011, at 9:34 AM, Peter Cock wrote:
On Wed, Jul 27, 2011 at 2:28 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Helo Jelle, I've taken a look at your patches and the code looks good. However, I'm wondering why you find it necessary to upload image files to Galaxy. Do tools exist that take image files as input? If the Galaxy community finds this feature as a good addition to Galaxy, we can probably incorporate it into the distribution.
I think it is a nice feature - and can come up with some use cases too. How about future tools for image analysis? e.g. microscope photos of cultures to automatically do cell/organism counting, or plant leaves for pathogenicity assays.
Ok, I'm convinced.
However, we'll need an additional feature before we can do that. We'll need to implement a type of config setting that enables / disables the uploading of image files via the Galaxy api. ...
Why not just allow it? A general configuration allowing/blocking any file type from being uploaded makes more sense to me than just wanting to block images.
The enhancement would be to add a config setting to allow / disallow uploading any data type ( not just images ) via the api. The problem with just allowing uploading of images is that we'll end up hosting a bunch of porn on our main public Galaxy instance. This has happened to us in the past...
Peter
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Hi Greg, I do see the concern of uploading data through the api but I don't think filtering on file types is the right solution. It seems to be an issue with allowing everyone to register. Wouldn't it be better to at least validate e-mail addresses of new users on the main instance? This would probably resolve most issues related to unwanted content on the server? Furthermore I agree with Peter, that if you want to block certain file types on your own instance, they should be removed from the datatypes_config.xml and thereby disabling them. Btw I needed the images for the sample tracking part - e.g. uploading gel fotos. Cheers, Jelle On Wed, Jul 27, 2011 at 3:55 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hi Peter,
On Jul 27, 2011, at 9:34 AM, Peter Cock wrote:
Helo Jelle, I've taken a look at your patches and the code looks good. However, I'm wondering why you find it necessary to upload image files to Galaxy. Do tools exist that take image files as input? If the Galaxy community finds this feature as a good addition to Galaxy, we can
On Wed, Jul 27, 2011 at 2:28 PM, Greg Von Kuster <greg@bx.psu.edu> wrote: probably
incorporate it into the distribution.
I think it is a nice feature - and can come up with some use cases too. How about future tools for image analysis? e.g. microscope photos of cultures to automatically do cell/organism counting, or plant leaves for pathogenicity assays.
Ok, I'm convinced.
However, we'll need an additional feature before we can do that. We'll need to implement a type of config setting that enables / disables the uploading of image files via the Galaxy api. ...
Why not just allow it? A general configuration allowing/blocking any file type from being uploaded makes more sense to me than just wanting to block images.
The enhancement would be to add a config setting to allow / disallow uploading any data type ( not just images ) via the api. The problem with just allowing uploading of images is that we'll end up hosting a bunch of porn on our main public Galaxy instance. This has happened to us in the past...
Peter
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
On Wed, Jul 27, 2011 at 3:26 PM, Jelle Scholtalbers <j.scholtalbers@gmail.com> wrote:
Hi Greg, I do see the concern of uploading data through the api but I don't think filtering on file types is the right solution. It seems to be an issue with allowing everyone to register. Wouldn't it be better to at least validate e-mail addresses of new users on the main instance? This would probably resolve most issues related to unwanted content on the server?
That might be another good idea.
Furthermore I agree with Peter, that if you want to block certain file types on your own instance, they should be removed from the datatypes_config.xml and thereby disabling them.
But we don't want to disable the formats - many tools may produce JPEG etc as output. Greg just wants to be able to block uploads by file type.
Btw I needed the images for the sample tracking part - e.g. uploading gel fotos.
Another good use case - do you analyse them in Galaxy too? Peter
On Jul 27, 2011, at 10:26 AM, Jelle Scholtalbers wrote:
Wouldn't it be better to at least validate e-mail addresses of new users on the main instance? This would probably resolve most issues related to unwanted content on the server?
Not in our experience, valid email addresses are trivial to fake.
Furthermore I agree with Peter, that if you want to block certain file types on your own instance, they should be removed from the datatypes_config.xml and thereby disabling them.
We still support these formats on main for everything other than upload (some tools generate PNG plots for example). A config option that disables the upload restrictions for valid users seems like a good idea, especially for sites with more stringent account policies. However I think we would still have to keep the restrictions in place on main.
On Jul 27, 2011, at 10:26 AM, Jelle Scholtalbers wrote:
Wouldn't it be better to at least validate e-mail addresses of new users on the main instance? This would probably resolve most issues related to unwanted content on the server?
Not in our experience, valid email addresses are trivial to fake. Slightly creepy to require, but more of a pain to fake is a Facebook or LinkedIn account. I've done this (OAuth 2.0) in Java servlets, but not Python yet.... On the other hand, isn't user data supposed to require a login to view? i.e., was the inappropriate content being served or just stored on the
public server?
No, data does not require a login to view if the dataset permissions are set permissively. This allows users to share direct links to datasets (which is meant to be a good feature ;). On Jul 27, 2011, at 11:45 AM, Paul Gordon wrote:
On the other hand, isn't user data supposed to require a login to view? i.e., was the inappropriate content being served or just stored on the public server?
Hi, On Jul 27, 2011, at 3:34 PM, Peter Cock wrote:
On Wed, Jul 27, 2011 at 2:28 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Helo Jelle, I've taken a look at your patches and the code looks good. However, I'm wondering why you find it necessary to upload image files to Galaxy. Do tools exist that take image files as input? If the Galaxy community finds this feature as a good addition to Galaxy, we can probably incorporate it into the distribution.
I think it is a nice feature - and can come up with some use cases too.
+1
How about future tools for image analysis? e.g. microscope photos of cultures to automatically do cell/organism counting, or plant leaves for pathogenicity assays.
However, we'll need an additional feature before we can do that. We'll need to implement a type of config setting that enables / disables the uploading of image files via the Galaxy api. ...
Why not just allow it?
Indeed!
A general configuration allowing/blocking any file type from being uploaded makes more sense to me than just wanting to block images.
Why block file types? If you really want to then how about automagically *allowing* all file types specified in datatypes_conf.xml? Cheers, Pi
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
------------------------------------------------------------- mobile: +31 6 143 66 783 e-mail: pieter.neerincx@gmail.com skype: pieter.online -------------------------------------------------------------
Jelle, I've added support for uploading image datatypes based on your patches in change set 5833:e7214c69ed7d, but with some changes. There is a new image_util.py file in ~/lib/galaxy/datatypes/util which allows for detecting image data types without having to create a new Image() class. I've also aded a couple of fixes to the code. There is currently no support in the Galaxy api for uploading files outside of a data library, so I did not have to add the new config setting discussed in the previous thread. We'll look into that further when the api is enhanced to enable uploads. Thanks very much for your contribution! Greg Von Kuster On Jul 25, 2011, at 11:23 AM, Jelle Scholtalbers wrote:
Hi Greg,
this was indeed causing the problem. I added some code which now allows to sniff image files. It is practically untested (it does work for me on bmp, jpg, png, tiff) - didn't try with PIL although the code is there. Attached are diffs against changeset 058a5d7a4f84 (bit outdated - can provide newer if desired..).
Sniffed formats: http://infohost.nmt.edu/tcc/help/pubs/pil/formats.html or http://docs.python.org/library/imghdr.html depending if PIL is available or not.
Cheers, Jelle
On Fri, Jul 22, 2011 at 6:46 PM, Greg Von Kuster <greg@bx.psu.edu> wrote: Based on the error message you gave, I assume the following code in ~/lib/galaxy/datatypes/sniff.py is presenting the problem.
if check_binary( filename ): if ext not in unsniffable_binary_formats and not datatypes_registry.get_datatype_by_extension( ext ).sniff( filename ): raise InappropriateDatasetContentError, 'The binary uploaded file contains inappropriate content.'
Have you tried adding your 'jpg' extension to the following in ~/lib/galaxy/datatypes/binary.py?
# Currently these supported binary data types must be manually set on upload unsniffable_binary_formats = [ 'ab1', 'scf' ]
On Jul 22, 2011, at 11:52 AM, Jelle Scholtalbers wrote:
Hi Greg,
I used that link when trying to create the support. The jpg datatype was already present in the datatypes_conf.xml (
<datatype extension="jpg" type="galaxy.datatypes.images:Image" mimetype="image/jpeg"/> ) , so I first made it available at upload -> display_in_upload=True. As far as I understood, if galaxy doesn't have to guess what the format is, this would be sufficient? From step2:
"Galaxy tools are configured to automatically set the data type of an output dataset. However, in some scenarios, Galaxy will attempt to determine the data type of a file using a sniffer"
However manually setting the format on upload still gave the mentioned error. Therefore I followed the rest of the guide (adding a sniffer) but still seem to fail here..
Cheers, Jelle
On Jul 22, 2011 3:26 PM, "Greg Von Kuster" <greg@bx.psu.edu> wrote:
Hello Jelle,
There are a few things you need to do to add support for a new data type. The steps are described here:
http://wiki.g2.bx.psu.edu/Admin/Datatypes/Adding%20Datatypes
Greg Von Kuster
On Jul 22, 2011, at 8:49 AM, Jelle Scholtalbers wrote:
Hi all,
I would like to be able to upload images to my Galaxy instance - in particular jpg/png to data libraries. I can't however find out how to overcome the problem of inappropriate content in binary file. How to go about?
When I create a sniffer class in the galaxy.datatypes.images.py that always returns true (for testing purposes) I still don't get the jpg uploaded without an error (sniffer is incl. in the datatypes_conf.xml ). class Jpg( Image ): file_ext = "jpg" def sniff(self, filename): """Determine if the file is in jpg format. """ return True
It seems I'm missing something obvious and could use some help.
Cheers, Jelle ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
<datatypes_conf.xml.diff><images.py.diff><upload.py.diff>
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu
participants (6)
-
Greg Von Kuster
-
James Taylor
-
Jelle Scholtalbers
-
Paul Gordon
-
Peter Cock
-
Pieter Neerincx