Andrey Tovchigrechko wrote:
We have decided to use a local Galaxy install as a front-end to our metagenomic binning tool MGTAXA ( http://andreyto.github.com/mgtaxa/ ) I need some guidance from the Galaxy developers for the best way to proceed:
- The server will be on a DMZ, with no direct access to the internal
network, where the computes will be running on a local SGE cluster. The best that our IT allowed is for some script on the internal cluster to monitor a directory on the web server, pull input/tasks from where when they appear, put the results back. My current idea is to have the Galaxy "local runner" to start "proxy jobs": each proxy job is a local process that does "put the input into watched dir; until results appear in the watched dir; sleep(30); loop; finish". In other words, Galaxy thinks that it is running jobs locally, but it fact those jobs are just waiting for the remote results to come back. Does that look like a sane solution? How will it scale on the Galaxy side? E.g. how many such simultaneous tasks can the local runner support? Any anticipated gotchas?
Hi Andrey,
This will work, but one of the problems you'll run in to is that all those jobs will be considered "running" even if they're queued in SGE, and will tie up the local job runner while giving a false status to your users. Although to prevent backup, you could increase the number of available local runner workers, since a bunch of sleeping scripts probably won't impact performance too much.
Additionally, we will be also trying to run computes on our TeraGrid account. I was thinking that the solution above can be applied to that scenario also, except that now the proxy job would be polling qsub on TeraGrid through ssh, or call Globus API. Here one problem is that a job often has to wait in a TeraGrid queue for 24 hours or so. Will my proxy jobs on Galaxy time out/get killed by any chance?
No, jobs can be queued indefinitely.
The alternatives are 1) write another runner (in addition to local, sge, torque) - how much work it will be?
This would actually be the cleanest route, and you could probably just take the existing sge module and strip out all of the DRMAA code. Simply have it generate the submission script and write it to the cluster_files_directory and collect the outputs from the same directory as usual. But instead of submitting the job directly, it does not need to do anything, since your backend process will do it. The loop that monitors job status can simply check for the existence of the output files (assuming such appearance is atomic, e.g. they exist and have been fully written).
- write a fake SGE python interface
and make Galaxy think it is using local SGE
This is probably more work than it'd be worth.
- What repo is best to clone, given the scope of our activity described
above? We will likely need to mess a bit with the Galaxy internals, not just the tool definition. Should we clone galaxy-central or galaxy-dist? What workflow would you recommend for updating, submitting patches etc?
galaxy-dist would be advisable here. Ry4an Brase did a lightning talk on Mercurial for Galaxy Admins at our recent developer conference that explains how to update Galaxy, his slides are on our wiki here:
http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010
For patches, either email them to us on dev list (if they're not too big), or set up a patch queue repository in Bitbucket, and send us a link to those patches.
--nate
I will be very grateful for answers to the above, and also to any alternative recommendations. Andrey
galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev