Re: [galaxy-dev] Validating dynamic inputs stream of consciousness

6 Mar 2013

      Dr. Taylor,

Thanks for the e-mail. I think that is a solid instinct, it makes me
nervous too, though most things do. I am responding to the list
because I think others are going to have the same concern
(hopefully!).

This whole concept puts a lot of onus on the tool developer. A
biologist who has taken a two week course on perl could probably write
a Galaxy tool, they probably couldn't write a secure tool for a public
LWR. I think some experience in thinking about how to secure web
accessible applications and prevent injection style attacks is needed.
I will update the documentation urging additional caution with respect
to this.

That said, there have been in the recent past multiple tools on public
Galaxy servers (main included) that were developed by serious
programmers that allowed arbitrary code execution. This is something
the whole community (or at least that subset hosting public servers)
needs to address and take more seriously.

I guess the other big question is is there something inherently
insecure about the framework - to me the most obvious attack vector
there is the regular expression that is generated. I worry if that is
that too porous somehow, if there some XML that a developer would
write that would reasonably make the tool look secure but in fact
there is some malicious command that could get through. I cannot
guarantee that it is secure, but I would be eager for counter examples
or specific issues I can address.

The one thing I will say is over and over, things are whitelisted -
not blacklisted - which is generally securer.

After that level, I think the LWR does still have issues, but they are
all the same issues Galaxy itself has. Can you do a denial of service
type attack, yes pretty easily, though I think I could hit any public
Galaxy the same way. Also are the tools and wrappers themselves
secure, is there some fastq file you could give to bowtie to cause it
to run an arbitrary command? My guess is probably, but there is not
much the framework could do to prevent that.

If we are honest and accept that there are going to security problems
with the tools we wrap, one idea that might be worth pursuing for both
the LWR and Galaxy itself is running tools in chrooted environments or
at least as a different user then the webapp.

Thanks again,
-John

On Wed, Mar 6, 2013 at 10:01 AM, James Taylor <james@jamestaylor.org> wrote:
...
John, something about this makes me really nervous security wise. Need
to think hard on it.
(Not a complaint, or criticism, or copied to the list, just letting
you know it caught my attention).
--
James Taylor, Assistant Professor, Biology/CS, Emory University
On Tue, Mar 5, 2013 at 10:19 PM, John Chilton <chilton@msi.umn.edu> wrote:
...
I have been extending the LWR so that one can now stand up public LWR
servers (https://lwr.readthedocs.org/en/latest/#setting-up-a-public-lwr-server),
the idea is that you can publicly share data and computation with any
Galaxy instances in the world easily and seamlessly via vanilla Galaxy
tools. This could serve the foundation of a distributed service
architecture where tools are the contracts, Galaxy instances the
clients, and LWRs the server containers.
One thing that did need to change with the LWR is that inputs need to
be validated. It would obviously not be a good idea to allow arbitrary
command-line or script executions to unauthenticated clients.
This was accomplished in part by allowing you to configure a Galaxy
toolbox or tool_conf file for the LWR. The LWR reads validation logic
for command-line and configfiles from the XML and verifies them before
execution.
The test cases I used to build up this validation extension to the
tool XML can be found here:
https://bitbucket.org/jmchilton/lwr/src/tip/test/validator_test.py
Question 1: Would Galaxy benefit from implementing this validation
stuff in the core framework as well? I think of myself as fairly
security conscious and my sense is no, but it could I see the
argument. There are potentially complex interactions between user
inputs and cheetah templates that one might sleep better knowing
everything is being validated before execution (the user didn't cause
the cheetah template to somehow render "; rm -rf /").
Question 2: Looking back on the validation stuff, it seems I am just
rewriting in XML what cheetah is doing. The duplication makes me think
the validation stuff might serve the foundation for a better (or at
least more secure) way to build up commands.
<command interpreter="python">
  <tool_wrapper>tophat_wrapper.py</tool_wrapper>
  <if>
    <cond>
      <equals left="$refGenomeSource.genomeSource" right="history">
      <then>
         <parameter name="--own_file">$refGenomeSource.ownFile</parameter>
   ....
The client could easily compile that into a cheetah template to build
the command, the server could compile it into a regular expression to
validate the input.
These aren't things that pressingly need to be added to the core
Galaxy framework, but things to think about longer term. If there is
any interest from the core Galaxy team I would be happy to implement
any of this with any desired changes.
Thanks all,
-John
------------------------------------------------
John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
https://twitter.com/jmchilton
https://bitbucket.org/jmchilton
https://github.com/jmchilton
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/