Galaxy with non-SGE/non-Torque distributed jobs
We have decided to use a local Galaxy install as a front-end to our metagenomic binning tool MGTAXA ( http://andreyto.github.com/mgtaxa/ ) I need some guidance from the Galaxy developers for the best way to proceed: 1) The server will be on a DMZ, with no direct access to the internal network, where the computes will be running on a local SGE cluster. The best that our IT allowed is for some script on the internal cluster to monitor a directory on the web server, pull input/tasks from where when they appear, put the results back. My current idea is to have the Galaxy "local runner" to start "proxy jobs": each proxy job is a local process that does "put the input into watched dir; until results appear in the watched dir; sleep(30); loop; finish". In other words, Galaxy thinks that it is running jobs locally, but it fact those jobs are just waiting for the remote results to come back. Does that look like a sane solution? How will it scale on the Galaxy side? E.g. how many such simultaneous tasks can the local runner support? Any anticipated gotchas? Additionally, we will be also trying to run computes on our TeraGrid account. I was thinking that the solution above can be applied to that scenario also, except that now the proxy job would be polling qsub on TeraGrid through ssh, or call Globus API. Here one problem is that a job often has to wait in a TeraGrid queue for 24 hours or so. Will my proxy jobs on Galaxy time out/get killed by any chance? The alternatives are 1) write another runner (in addition to local, sge, torque) - how much work it will be? 2) write a fake SGE python interface and make Galaxy think it is using local SGE 2) What repo is best to clone, given the scope of our activity described above? We will likely need to mess a bit with the Galaxy internals, not just the tool definition. Should we clone galaxy-central or galaxy-dist? What workflow would you recommend for updating, submitting patches etc? I will be very grateful for answers to the above, and also to any alternative recommendations. Andrey
We have a similar scenario. In their infinite wisdom, our system admins have insisted on install pbs pro which we do not think will be compatible with the torque libs. We want to run normal processes locally and outsource specific tools to the compute cluster by writing a tool that submits a pbs script. If we can just run a monitor on a working directory for the output files to turn up - with a timeout - that would be good enough. Cheers Dennis On Sat, May 29, 2010 at 5:47 AM, Andrey Tovchigrechko <atovtchi@jcvi.org>wrote:
We have decided to use a local Galaxy install as a front-end to our metagenomic binning tool MGTAXA ( http://andreyto.github.com/mgtaxa/ ) I need some guidance from the Galaxy developers for the best way to proceed:
1) The server will be on a DMZ, with no direct access to the internal network, where the computes will be running on a local SGE cluster. The best that our IT allowed is for some script on the internal cluster to monitor a directory on the web server, pull input/tasks from where when they appear, put the results back. My current idea is to have the Galaxy "local runner" to start "proxy jobs": each proxy job is a local process that does "put the input into watched dir; until results appear in the watched dir; sleep(30); loop; finish". In other words, Galaxy thinks that it is running jobs locally, but it fact those jobs are just waiting for the remote results to come back. Does that look like a sane solution? How will it scale on the Galaxy side? E.g. how many such simultaneous tasks can the local runner support? Any anticipated gotchas? Additionally, we will be also trying to run computes on our TeraGrid account. I was thinking that the solution above can be applied to that scenario also, except that now the proxy job would be polling qsub on TeraGrid through ssh, or call Globus API. Here one problem is that a job often has to wait in a TeraGrid queue for 24 hours or so. Will my proxy jobs on Galaxy time out/get killed by any chance? The alternatives are 1) write another runner (in addition to local, sge, torque) - how much work it will be? 2) write a fake SGE python interface and make Galaxy think it is using local SGE
2) What repo is best to clone, given the scope of our activity described above? We will likely need to mess a bit with the Galaxy internals, not just the tool definition. Should we clone galaxy-central or galaxy-dist? What workflow would you recommend for updating, submitting patches etc?
I will be very grateful for answers to the above, and also to any alternative recommendations. Andrey
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
-- Dennis Gascoigne 0407 639 995 dennis.gascoigne@gmail.com
Hi Galaxy Devs, I ran into a small problem when using a conditional tag set in my tool config. The conditional works as expected when I use a select param with the default popup menu display type. But when I change the display type to radio buttons, the view in the web browser does not update straight away when I choose another option for the conditional param. I can get the display to update if I select another item from my history for the input parameter though... Below is the code for a simple tool that can be used to reproduce this (Radio button version commented out.) Cheers, Pi --------------------------------------------------------------------------------------- <tool id="Tab2NonRedundantTab1" version="1.2" name="Remove redundancy"> <description>from tabular data sets</description> <command> #if $score_based_record_selection.enable_scoring=="Yes" #sort -t \$'\t' -k $ucol,$ucol -k $score_based_record_selection.scol,$score_based_record_selection.scol\n$score_based_record_selection.bigger_is_better $input | sort -u -t \$'\t' -k $ucol,$ucol > $output #else #sort -u -t \$'\t' -k $ucol,$ucol $input > $output #end if </command> <inputs> <param name="input" type="data" format="tabular" label="Input file to filter for redundancy" help="(in TAB delimited format)"/> <param name="ucol" type="data_column" value="c1" data_ref="input" label="Remove redundancy from column"/> <conditional name="score_based_record_selection"> <!-- Radio buttons don't work yet: refresh problem! <param name="enable_scoring" type="select" label="Use score to determine which redundant records to preserve" display="radio"> --> <param name="enable_scoring" type="select" label="Use score to determine which redundant records to preserve"> <option value="No" selected="true">No</option> <option value="Yes">Yes</option> </param> <when value="Yes"> <param name="scol" type="data_column" value="" data_ref="input" label="Score column used to decide which of the redundant entries to keep"/> <param name="bigger_is_better" type="select" optional="true"> <label>Best entry has the</label> <option value="r">Highest score</option> <option value="">Lowest score</option> </param> </when> <when value="No"/> </conditional> </inputs> <outputs> <data name="output" format="tabular" label="Non-redundant output"/> </outputs> </tool> --------------------------------------------------------------- Biomolecular Mass Spectrometry & Proteomics group Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 6 143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ---------------------------------------------------------------
Andrey Tovchigrechko wrote:
We have decided to use a local Galaxy install as a front-end to our metagenomic binning tool MGTAXA ( http://andreyto.github.com/mgtaxa/ ) I need some guidance from the Galaxy developers for the best way to proceed:
1) The server will be on a DMZ, with no direct access to the internal network, where the computes will be running on a local SGE cluster. The best that our IT allowed is for some script on the internal cluster to monitor a directory on the web server, pull input/tasks from where when they appear, put the results back. My current idea is to have the Galaxy "local runner" to start "proxy jobs": each proxy job is a local process that does "put the input into watched dir; until results appear in the watched dir; sleep(30); loop; finish". In other words, Galaxy thinks that it is running jobs locally, but it fact those jobs are just waiting for the remote results to come back. Does that look like a sane solution? How will it scale on the Galaxy side? E.g. how many such simultaneous tasks can the local runner support? Any anticipated gotchas?
Hi Andrey, This will work, but one of the problems you'll run in to is that all those jobs will be considered "running" even if they're queued in SGE, and will tie up the local job runner while giving a false status to your users. Although to prevent backup, you could increase the number of available local runner workers, since a bunch of sleeping scripts probably won't impact performance too much.
Additionally, we will be also trying to run computes on our TeraGrid account. I was thinking that the solution above can be applied to that scenario also, except that now the proxy job would be polling qsub on TeraGrid through ssh, or call Globus API. Here one problem is that a job often has to wait in a TeraGrid queue for 24 hours or so. Will my proxy jobs on Galaxy time out/get killed by any chance?
No, jobs can be queued indefinitely.
The alternatives are 1) write another runner (in addition to local, sge, torque) - how much work it will be?
This would actually be the cleanest route, and you could probably just take the existing sge module and strip out all of the DRMAA code. Simply have it generate the submission script and write it to the cluster_files_directory and collect the outputs from the same directory as usual. But instead of submitting the job directly, it does not need to do anything, since your backend process will do it. The loop that monitors job status can simply check for the existence of the output files (assuming such appearance is atomic, e.g. they exist and have been fully written).
2) write a fake SGE python interface and make Galaxy think it is using local SGE
This is probably more work than it'd be worth.
2) What repo is best to clone, given the scope of our activity described above? We will likely need to mess a bit with the Galaxy internals, not just the tool definition. Should we clone galaxy-central or galaxy-dist? What workflow would you recommend for updating, submitting patches etc?
galaxy-dist would be advisable here. Ry4an Brase did a lightning talk on Mercurial for Galaxy Admins at our recent developer conference that explains how to update Galaxy, his slides are on our wiki here: http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010 For patches, either email them to us on dev list (if they're not too big), or set up a patch queue repository in Bitbucket, and send us a link to those patches. --nate
I will be very grateful for answers to the above, and also to any alternative recommendations. Andrey
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (4)
-
Andrey Tovchigrechko
-
Dennis Gascoigne
-
Nate Coraor
-
Pieter Neerincx