Re: [galaxy-dev] Galaxy with non-SGE/non-Torque distributed jobs

1 Jun 2010

      Andrey Tovchigrechko wrote:
...
We have decided to use a local Galaxy install as a front-end to our 
metagenomic binning tool MGTAXA ( http://andreyto.github.com/mgtaxa/ )
I need some guidance from the Galaxy developers for the best way to 
proceed:
1) The server will be on a DMZ, with no direct access to the internal 
network, where the computes will be running on a local SGE cluster. The 
best that our IT allowed is for some script on the internal cluster to 
monitor a directory on the web server, pull input/tasks from where when 
they appear, put the results back. My current idea is to have the Galaxy 
"local runner" to start "proxy jobs": each proxy job is a local process 
that does "put the input into watched dir; until results appear in the 
watched dir; sleep(30); loop; finish". In other words, Galaxy thinks 
that it is running jobs locally, but it fact those jobs are just waiting 
for the remote results to come back. Does that look like a sane 
solution? How will it scale on the Galaxy side? E.g. how many such 
simultaneous tasks can the local runner support? Any anticipated gotchas?
Hi Andrey,

This will work, but one of the problems you'll run in to is that all 
those jobs will be considered "running" even if they're queued in SGE, 
and will tie up the local job runner while giving a false status to your 
users.  Although to prevent backup, you could increase the number of 
available local runner workers, since a bunch of sleeping scripts 
probably won't impact performance too much.
...
Additionally, we will be also trying to run computes on our TeraGrid 
account. I was thinking that the solution above can be applied to that 
scenario also, except that now the proxy job would be polling qsub on 
TeraGrid through ssh, or call Globus API. Here one problem is that a job 
often has to wait in a TeraGrid queue for 24 hours or so. Will my proxy 
jobs on Galaxy time out/get killed by any chance?
No, jobs can be queued indefinitely.
...
The alternatives are 1) write another runner (in addition to local, sge, 
torque) - how much work it will be?
This would actually be the cleanest route, and you could probably just 
take the existing sge module and strip out all of the DRMAA code. 
Simply have it generate the submission script and write it to the 
cluster_files_directory and collect the outputs from the same directory 
as usual.  But instead of submitting the job directly, it does not need 
to do anything, since your backend process will do it.  The loop that 
monitors job status can simply check for the existence of the output 
files (assuming such appearance is atomic, e.g. they exist and have been 
fully written).
...
2) write a fake SGE python interface 
and make Galaxy think it is using local SGE
This is probably more work than it'd be worth.
...
2) What repo is best to clone, given the scope of our activity described 
above? We will likely need to mess a bit with the Galaxy internals, not 
just the tool definition. Should we clone galaxy-central or galaxy-dist? 
What workflow would you recommend for updating, submitting patches etc?
galaxy-dist would be advisable here.  Ry4an Brase did a lightning talk 
on Mercurial for Galaxy Admins at our recent developer conference that 
explains how to update Galaxy, his slides are on our wiki here:

http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010

For patches, either email them to us on dev list (if they're not too 
big), or set up a patch queue repository in Bitbucket, and send us a 
link to those patches.

--nate
...
I will be very grateful for answers to the above, and also to any 
alternative recommendations.
Andrey
_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev

Re: [galaxy-dev] Galaxy with non-SGE/non-Torque distributed jobs

Nate Coraor