Galaxy strips CSS from HTML files

newer
Tool Shed Automatic Install Behind...

Cory Spencer

30 Jan 2012 30 Jan '12

10:33 p.m.

Hello all - One of the Galaxy tools I've been developing generates HTML output which I'd styled using a <style>...</style> tag in the HTML header. After updating to the latest Galaxy release earlier today, the <html>, <head>...</head>, <style> and <body> tags started to get stripped from the output, rendering previously CSS styled output rather unstylish. Delving into things, I noticed a change committed in December that sanitizes the output for HTML files via a call to "sanitize_html": https://bitbucket.org/galaxy/galaxy-central/changeset/35fee32991ce#chg-lib/g... The added lines 381 -> 383 in the new file appear to be causing this new behaviour. Is there any option for making this optional? What was the rational behind stripping out these tags on outputted HTML files? Thanks for any help! Cory Spencer

Show replies by thread

Dannon Baker

1 Feb 1 Feb

3:01 p.m.

Hi Cory, The new call to sanitize_html was introduced to more effectively prevent malicious content and possible XSS attacks, though I can't think off the top of my head why we couldn't allow style content. I'll see what I can do about relaxing the filter a little. Thanks! -Dannon On 01/30/2012 10:33 PM, Cory Spencer wrote:

...

Hello all -

One of the Galaxy tools I've been developing generates HTML output which I'd styled using a<style>...</style> tag in the HTML header. After updating to the latest Galaxy release earlier today, the<html>,<head>...</head>,<style> and<body> tags started to get stripped from the output, rendering previously CSS styled output rather unstylish.

Delving into things, I noticed a change committed in December that sanitizes the output for HTML files via a call to "sanitize_html":

https://bitbucket.org/galaxy/galaxy-central/changeset/35fee32991ce#chg-lib/g...

The added lines 381 -> 383 in the new file appear to be causing this new behaviour.

Is there any option for making this optional? What was the rational behind stripping out these tags on outputted HTML files?

Thanks for any help!

Cory Spencer ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Cory Spencer

3:41 p.m.

Hi Dannon and thanks for the response! I can see the need to sanitize incoming HTTP request parameters that may have malicious content. However, I'm unclear as to why this also needs to happen for HTML pages outputted by the Galaxy tools? If they have been generated with sanitized HTTP request parameters, is there still a risk of an XSS attack? If anything, would it be possible to make this sort of sanitization controllable via a configuration file option? Thanks! Cory On 2012-02-01, at 12:01 PM, Dannon Baker wrote:

...

Hi Cory,

The new call to sanitize_html was introduced to more effectively prevent malicious content and possible XSS attacks, though I can't think off the top of my head why we couldn't allow style content. I'll see what I can do about relaxing the filter a little.

Thanks!

-Dannon

On 01/30/2012 10:33 PM, Cory Spencer wrote:

...
Hello all -

One of the Galaxy tools I've been developing generates HTML output which I'd styled using a<style>...</style> tag in the HTML header. After updating to the latest Galaxy release earlier today, the<html>,<head>...</head>,<style> and<body> tags started to get stripped from the output, rendering previously CSS styled output rather unstylish.

Delving into things, I noticed a change committed in December that sanitizes the output for HTML files via a call to "sanitize_html":

https://bitbucket.org/galaxy/galaxy-central/changeset/35fee32991ce#chg-lib/g...

The added lines 381 -> 383 in the new file appear to be causing this new behaviour.

Is there any option for making this optional? What was the rational behind stripping out these tags on outputted HTML files?

Thanks for any help!

Cory Spencer ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Dannon Baker

4:33 p.m.

On 02/01/2012 03:41 PM, Cory Spencer wrote:

...

Hi Dannon and thanks for the response!

I can see the need to sanitize incoming HTTP request parameters that may have malicious content. However, I'm unclear as to why this also needs to happen for HTML pages outputted by the Galaxy tools? If they have been generated with sanitized HTTP request parameters, is there still a risk of an XSS attack?toolbox With Galaxy's toolbox at hand you could generate invalid HTML from plain text components. A simple example, but consider the following:

Upload one plain text file with the content: <script Another one with:

...

alert('oh no!'); </ And finally one with: script> Taken individually they're meaningless, but run the concatenate datasets tool to reassemble that into: <script

...

alert('oh no!'); </ script> Change the type of this dataset to html and there's your attack. If you tried to upload this, we'd interpret it as malicious HTML and discard it. As separate datasets, it's impossible to tell. Given Galaxy's powerful text manipulation tools you could write just about whatever you wanted using Galaxy itself and get it in the system as a (seemingly) valid tool-generated dataset. Now, with the outbound sanitation on any dataset served as "text/html" it doesn't matter and it gets handled prior to serving. Another option we discussed would be to trust all tool generated HTML, disallow changing the datatype of anything *to* html, and so on, but that approach comes with its own problems.

...

If anything, would it be possible to make this sort of sanitization controllable via a configuration file option?

I'm rather hesitant to put in a disable option for a security feature, though you're more than welcome to pop those two lines out of your instance. I think the best path forward is probably relaxing the filter a bit, the initial pass was somewhat draconian. Would relaxing the filter to allow style content to pass through work for your needs? -Dannon

...

Thanks!

Cory

On 2012-02-01, at 12:01 PM, Dannon Baker wrote:

...
Hi Cory,

The new call to sanitize_html was introduced to more effectively prevent malicious content and possible XSS attacks, though I can't think off the top of my head why we couldn't allow style content. I'll see what I can do about relaxing the filter a little.

Thanks!

-Dannon

On 01/30/2012 10:33 PM, Cory Spencer wrote:

...
Hello all -

One of the Galaxy tools I've been developing generates HTML output which I'd styled using a<style>...</style> tag in the HTML header. After updating to the latest Galaxy release earlier today, the<html>,<head>...</head>,<style> and<body> tags started to get stripped from the output, rendering previously CSS styled output rather unstylish.

Delving into things, I noticed a change committed in December that sanitizes the output for HTML files via a call to "sanitize_html":

https://bitbucket.org/galaxy/galaxy-central/changeset/35fee32991ce#chg-lib/g...

The added lines 381 -> 383 in the new file appear to be causing this new behaviour.

Is there any option for making this optional? What was the rational behind stripping out these tags on outputted HTML files?

Thanks for any help!

Cory Spencer ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Cory Spencer

4:58 p.m.

On 2012-02-01, at 1:33 PM, Dannon Baker wrote:

...

With Galaxy's toolbox at hand you could generate invalid HTML from plain text components. A simple example, but consider the following:

Upload one plain text file with the content: <script

....

Change the type of this dataset to html and there's your attack. If you tried to upload this, we'd interpret it as malicious HTML and discard it. As separate datasets, it's impossible to tell. Given Galaxy's powerful text manipulation tools you could write just about whatever you wanted using Galaxy itself and get it in the system as a (seemingly) valid tool-generated dataset. Now, with the outbound sanitation on any dataset served as "text/html" it doesn't matter and it gets handled prior to serving.

Okay, I follow you there. That's a good example, thank you!

...

Another option we discussed would be to trust all tool generated HTML, disallow changing the datatype of anything *to* html, and so on, but that approach comes with its own problems.

In the case of the tool we're working on, this option is probably what would have worked best.

...

...
If anything, would it be possible to make this sort of sanitization controllable via a configuration file option?

I'm rather hesitant to put in a disable option for a security feature, though you're more than welcome to pop those two lines out of your instance. I think the best path forward is probably relaxing the filter a bit, the initial pass was somewhat draconian. Would relaxing the filter to allow style content to pass through work for your needs?

Yes, we've already commented it out for the time being. :) Relaxing the filter would be a good improvement so far as we're concerned. I'd be happy to keep in contact with you during the process so that we can find the happy middle ground between security and usability. Thanks again! Cory

Ayton Meintjes

2 Mar 2 Mar

4:05 a.m.

This breaks some of the tools we're developing, although in our case it's harder to fix because it's Javascript we're inserting. I understand the security concerns though. Any advice on a more secure way to allow particular content? Perhaps a whitelist of allowed scripts? On Wed, Feb 1, 2012 at 23:58, Cory Spencer <cspencer@sprocket.org> wrote:

...

On 2012-02-01, at 1:33 PM, Dannon Baker wrote:

...
With Galaxy's toolbox at hand you could generate invalid HTML from plain text components. A simple example, but consider the following:

Upload one plain text file with the content: <script

....

Change the type of this dataset to html and there's your attack. If you tried to upload this, we'd interpret it as malicious HTML and discard it. As separate datasets, it's impossible to tell. Given Galaxy's powerful text manipulation tools you could write just about whatever you wanted using Galaxy itself and get it in the system as a (seemingly) valid tool-generated dataset. Now, with the outbound sanitation on any dataset served as "text/html" it doesn't matter and it gets handled prior to serving.

Okay, I follow you there. That's a good example, thank you!

...
Another option we discussed would be to trust all tool generated HTML, disallow changing the datatype of anything *to* html, and so on, but that approach comes with its own problems.

In the case of the tool we're working on, this option is probably what would have worked best.

...
...
If anything, would it be possible to make this sort of sanitization controllable via a configuration file option?

I'm rather hesitant to put in a disable option for a security feature, though you're more than welcome to pop those two lines out of your instance. I think the best path forward is probably relaxing the filter a bit, the initial pass was somewhat draconian. Would relaxing the filter to allow style content to pass through work for your needs?

Yes, we've already commented it out for the time being. :) Relaxing the filter would be a good improvement so far as we're concerned. I'd be happy to keep in contact with you during the process so that we can find the happy middle ground between security and usability.

Thanks again!

Cory ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

-- Computational Biology Group University of Cape Town South Africa

Dannon Baker

1:57 p.m.

A tool whitelist is an interesting idea that deserves more thought, but for now I'm going to add an option to the galaxy configuration to easily enable or disable the extended html filtering entirely. It'll be enabled by default, but this should make it easier for the administrators of local Galaxy instances in the case where you have custom tools that need to do fancy things and have other security and access controls in place. -Dannon On Mar 2, 2012, at 4:05 AM, Ayton Meintjes wrote:

...

This breaks some of the tools we're developing, although in our case it's harder to fix because it's Javascript we're inserting.

I understand the security concerns though. Any advice on a more secure way to allow particular content? Perhaps a whitelist of allowed scripts?

On Wed, Feb 1, 2012 at 23:58, Cory Spencer <cspencer@sprocket.org> wrote:

On 2012-02-01, at 1:33 PM, Dannon Baker wrote:

...
With Galaxy's toolbox at hand you could generate invalid HTML from plain text components. A simple example, but consider the following:

Upload one plain text file with the content: <script

....

Change the type of this dataset to html and there's your attack. If you tried to upload this, we'd interpret it as malicious HTML and discard it. As separate datasets, it's impossible to tell. Given Galaxy's powerful text manipulation tools you could write just about whatever you wanted using Galaxy itself and get it in the system as a (seemingly) valid tool-generated dataset. Now, with the outbound sanitation on any dataset served as "text/html" it doesn't matter and it gets handled prior to serving.

Okay, I follow you there. That's a good example, thank you!

...
Another option we discussed would be to trust all tool generated HTML, disallow changing the datatype of anything *to* html, and so on, but that approach comes with its own problems.

In the case of the tool we're working on, this option is probably what would have worked best.

...
...
If anything, would it be possible to make this sort of sanitization controllable via a configuration file option?

I'm rather hesitant to put in a disable option for a security feature, though you're more than welcome to pop those two lines out of your instance. I think the best path forward is probably relaxing the filter a bit, the initial pass was somewhat draconian. Would relaxing the filter to allow style content to pass through work for your needs?

Yes, we've already commented it out for the time being. :) Relaxing the filter would be a good improvement so far as we're concerned. I'd be happy to keep in contact with you during the process so that we can find the happy middle ground between security and usability.

Thanks again!

Cory ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

-- Computational Biology Group University of Cape Town South Africa

4877

Age (days ago)

4908

Last active (days ago)

List overview

Download

6 comments

3 participants

participants (3)

Ayton Meintjes
Cory Spencer
Dannon Baker