Re: [galaxy-dev] Validating dynamic inputs stream of consciousness
Dr. Taylor, Thanks for the e-mail. I think that is a solid instinct, it makes me nervous too, though most things do. I am responding to the list because I think others are going to have the same concern (hopefully!). This whole concept puts a lot of onus on the tool developer. A biologist who has taken a two week course on perl could probably write a Galaxy tool, they probably couldn't write a secure tool for a public LWR. I think some experience in thinking about how to secure web accessible applications and prevent injection style attacks is needed. I will update the documentation urging additional caution with respect to this. That said, there have been in the recent past multiple tools on public Galaxy servers (main included) that were developed by serious programmers that allowed arbitrary code execution. This is something the whole community (or at least that subset hosting public servers) needs to address and take more seriously. I guess the other big question is is there something inherently insecure about the framework - to me the most obvious attack vector there is the regular expression that is generated. I worry if that is that too porous somehow, if there some XML that a developer would write that would reasonably make the tool look secure but in fact there is some malicious command that could get through. I cannot guarantee that it is secure, but I would be eager for counter examples or specific issues I can address. The one thing I will say is over and over, things are whitelisted - not blacklisted - which is generally securer. After that level, I think the LWR does still have issues, but they are all the same issues Galaxy itself has. Can you do a denial of service type attack, yes pretty easily, though I think I could hit any public Galaxy the same way. Also are the tools and wrappers themselves secure, is there some fastq file you could give to bowtie to cause it to run an arbitrary command? My guess is probably, but there is not much the framework could do to prevent that. If we are honest and accept that there are going to security problems with the tools we wrap, one idea that might be worth pursuing for both the LWR and Galaxy itself is running tools in chrooted environments or at least as a different user then the webapp. Thanks again, -John On Wed, Mar 6, 2013 at 10:01 AM, James Taylor <james@jamestaylor.org> wrote:
John, something about this makes me really nervous security wise. Need to think hard on it.
(Not a complaint, or criticism, or copied to the list, just letting you know it caught my attention).
-- James Taylor, Assistant Professor, Biology/CS, Emory University
On Tue, Mar 5, 2013 at 10:19 PM, John Chilton <chilton@msi.umn.edu> wrote:
I have been extending the LWR so that one can now stand up public LWR servers (https://lwr.readthedocs.org/en/latest/#setting-up-a-public-lwr-server), the idea is that you can publicly share data and computation with any Galaxy instances in the world easily and seamlessly via vanilla Galaxy tools. This could serve the foundation of a distributed service architecture where tools are the contracts, Galaxy instances the clients, and LWRs the server containers.
One thing that did need to change with the LWR is that inputs need to be validated. It would obviously not be a good idea to allow arbitrary command-line or script executions to unauthenticated clients.
This was accomplished in part by allowing you to configure a Galaxy toolbox or tool_conf file for the LWR. The LWR reads validation logic for command-line and configfiles from the XML and verifies them before execution.
The test cases I used to build up this validation extension to the tool XML can be found here:
https://bitbucket.org/jmchilton/lwr/src/tip/test/validator_test.py
Question 1: Would Galaxy benefit from implementing this validation stuff in the core framework as well? I think of myself as fairly security conscious and my sense is no, but it could I see the argument. There are potentially complex interactions between user inputs and cheetah templates that one might sleep better knowing everything is being validated before execution (the user didn't cause the cheetah template to somehow render "; rm -rf /").
Question 2: Looking back on the validation stuff, it seems I am just rewriting in XML what cheetah is doing. The duplication makes me think the validation stuff might serve the foundation for a better (or at least more secure) way to build up commands.
<command interpreter="python"> <tool_wrapper>tophat_wrapper.py</tool_wrapper> <if> <cond> <equals left="$refGenomeSource.genomeSource" right="history"> <then> <parameter name="--own_file">$refGenomeSource.ownFile</parameter> ....
The client could easily compile that into a cheetah template to build the command, the server could compile it into a regular expression to validate the input.
These aren't things that pressingly need to be added to the core Galaxy framework, but things to think about longer term. If there is any interest from the core Galaxy team I would be happy to implement any of this with any desired changes.
Thanks all, -John
------------------------------------------------ John Chilton Senior Software Developer University of Minnesota Supercomputing Institute Office: 612-625-0917 Cell: 612-226-9223 https://twitter.com/jmchilton https://bitbucket.org/jmchilton https://github.com/jmchilton ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Wed, Mar 6, 2013 at 12:22 PM, John Chilton <chilton@msi.umn.edu> wrote:
This whole concept puts a lot of onus on the tool developer. A biologist who has taken a two week course on perl could probably write a Galaxy tool, they probably couldn't write a secure tool for a public LWR. I think some experience in thinking about how to secure web accessible applications and prevent injection style attacks is needed. I will update the documentation urging additional caution with respect to this.
What I'm trying to understand is whether this model for a public LWR makes sense. It appears that your LWR will take a command line, and then apply a series of validations to it. This set of validations would need to be very comprehensive -- perhaps impossibly comprehensive -- to be secure. To me it would make more sense that the LWR takes an input values dict and constructs the command line itself after validating everything (it already has the toolbox, so this should be possible).
That said, there have been in the recent past multiple tools on public Galaxy servers (main included) that were developed by serious programmers that allowed arbitrary code execution. This is something the whole community (or at least that subset hosting public servers) needs to address and take more seriously.
I could not agree more, though I see this as a somewhat different issue. There are a number of things that would be really helpful here: - Some automatic validation of command line construction to look for common exploits (again, impossible to do comprehensively) - Some kind of sandboxing, through support for chroot, zones, jails, or (dare I dream) running under native client.
there is some malicious command that could get through. I cannot guarantee that it is secure, but I would be eager for counter examples or specific issues I can address.
I think this is what concerned me as well. I'm always worried about security through comprehensive screening, someone almost always finds a way around it. This is why the original python sandbox failed. Constructing the command line from validated inputs seems safer (as long as you trust the template that builds the command line).
If we are honest and accept that there are going to security problems with the tools we wrap, one idea that might be worth pursuing for both the LWR and Galaxy itself is running tools in chrooted environments or at least as a different user then the webapp.
On this we completely agree.
On Wed, Mar 6, 2013 at 12:39 PM, James Taylor <james@jamestaylor.org> wrote:
On Wed, Mar 6, 2013 at 12:22 PM, John Chilton <chilton@msi.umn.edu> wrote:
This whole concept puts a lot of onus on the tool developer. A biologist who has taken a two week course on perl could probably write a Galaxy tool, they probably couldn't write a secure tool for a public LWR. I think some experience in thinking about how to secure web accessible applications and prevent injection style attacks is needed. I will update the documentation urging additional caution with respect to this.
What I'm trying to understand is whether this model for a public LWR makes sense. It appears that your LWR will take a command line, and then apply a series of validations to it. This set of validations would need to be very comprehensive -- perhaps impossibly comprehensive -- to be secure.
It would have to be impossibly comprehensive for many existing tool XML files. The LWR documentation however has some advice on reorganizing tools that makes it quite easy however. The idea is to move all of the option/argument handling logic into a config file, pass the config file into your wrapper as the only argument, and then use optparse/argparse on the contents of the config file. optparse/argparse then handle all of the validation logic, there is no chance of an injection causing a new process to be spawned, etc... I believe this model is pretty simple to implement and quite secure. I know some members of the Galaxy team would like to get away from tool wrappers. I am a tool wrapper fan however. In this context especially it seems a small price to pay for security.
To me it would make more sense that the LWR takes an input values dict and constructs the command line itself after validating everything (it already has the toolbox, so this should be possible).
This has a conceptual appeal and was my first thought, but it too would reduce the expressiveness of Galaxy tools. The tool templates have access to a lot of things - not least among them is app. That is not something you can send over the wire. It would also mean implementation-wise that public LWRs are even more different than traditional LWRs and would need a rewritten client and server. Nonetheless, I am not opposed to this as an option. If the Galaxy team wants to refactor all of the cheetah templating and wrapper stuff out into an easy to use library you pass a dictionary into I would be happy integrate it into the LWR :). (In this fictitious world where you are modularizing Galaxy just for me, a library to install toolshed repositories with dependencies and env files would be really awesome to augment the LWR with as well :).)
That said, there have been in the recent past multiple tools on public Galaxy servers (main included) that were developed by serious programmers that allowed arbitrary code execution. This is something the whole community (or at least that subset hosting public servers) needs to address and take more seriously.
I could not agree more, though I see this as a somewhat different issue. There are a number of things that would be really helpful here:
- Some automatic validation of command line construction to look for common exploits (again, impossible to do comprehensively) - Some kind of sandboxing, through support for chroot, zones, jails, or (dare I dream) running under native client.
there is some malicious command that could get through. I cannot guarantee that it is secure, but I would be eager for counter examples or specific issues I can address.
I think this is what concerned me as well. I'm always worried about security through comprehensive screening, someone almost always finds a way around it. This is why the original python sandbox failed. Constructing the command line from validated inputs seems safer (as long as you trust the template that builds the command line).
If we are honest and accept that there are going to security problems with the tools we wrap, one idea that might be worth pursuing for both the LWR and Galaxy itself is running tools in chrooted environments or at least as a different user then the webapp.
On this we completely agree.
Thanks a bunch for the feedback, I really apperciate it. -John
participants (2)
-
James Taylor
-
John Chilton