Hi Galaxy developers, In our Elixir project (http://www.elixir-europe.org/), Norway, we have five geographically distributed Galaxy instances. Each is working on a local cluster (mostly SLURM). We are planning to interconnect those five clusters using a meta-scheduler ARC (http://www.nordugrid.org/arc/), to achieve load balancing, so that a galaxy job can be reallocated to an external cluster in case that the local cluster is saturated. ARC manages the interconnection very well. What we need is to create a Galaxy job handler for ARC. Is there a general template or interface for a job handler, i.e. for defining job submission commands ... etc.? and how to compile and integrate this new job handler and integrate it into the Galaxy installation? Thank you, Yours sincerely, Abdulrahman Azab Head engineer, ELIXIR.NO / The Genomic HyperBrowser team Department of Informatics, University of Oslo, Boks 1072 Blindern, NO-0316 OSLO, Norway Email: azab@ifi.uio.no, Cell-phone: +47 46797339 ---- Senior Lecturer in Computer Engineering Faculty of Engineering, University of Mansoura, 35516-Mansoura, Egypt Email: abdulrahman.azab@mans.edu.eg</owa/abdulrahman.azab@mans.edu.eg>
This sounds like a fun and challenging project - good luck! The route I would recommend pursuing largely hinges on whether all 5 Galaxy instances have a shared file system and run as a single user. If they do I would recommend implementing a Galaxy "job runner" - all the runners bundled with Galaxy can be found in lib/galaxy/jobs/runners/. The standard cluster runners in there would be drmaa.py, pbs.py, and condor.py. drmaa.py and pbs.py demonstrating hooking Galaxy up to a library for submitting jobs to a cluster. condor.py demonstrates wrapping CLI tools. Along similar lines - there is cli.py which is something of a general framework for submitting jobs via CLI tools and can even be used to SSH before running the submission scripts if that is useful. That approach largely hinges on having a large shared cluster if you want to submit many different tools. If you don't mind modifying the tools themselves - one could move logic for staging files and submitting to clusters into the tools themselves - I can send some links of example tools that have done this. If you don't have the shared cluster and have many different tools you would like to manage this way - I would suggest looking at Pulsar (https://github.com/galaxyproject/pulsar). It can be used to distribute jobs to remote clusters/machines. Pulsar has the concept of "job managers" instead of "job runners" - they have a simpler interface that would need to be implemented for ARC. Examples here (https://github.com/galaxyproject/pulsar/tree/master/pulsar/managers). Pulsar has a bunch of options for staging files (File system copies/HTTP/scp/rsync) and these can be configured on a per-path basis for each Galaxy instance allowing you to optimize the data transfer for your 5 setups. Pulsar can be deployed as a RESTful web service (in this case you could probably do one web service for all 5 instances) or by monitoring a message queue (without some small changes you would probably need to stand-up one pulsar server for each of the 5 Galaxy instances in this case). I like to give the warning that Galaxy is designed for large shared file systems - and Pulsar or other distributed strategies requires more effort to deploy (and in your case will definitely require novel development time as well). It is probably out of scope - but I would also note that it would possibly be significantly easier to just deploy one Galaxy instance and route the jobs to local clusters and let them all share one large file system and just provide 5 different "faces" to Galaxy. That probably isn't possible due to hardware/institutional politics/etc... but I just wanted to make sure. Along the same lines - it is worth considering if writing a DRMAA layer for ARC or plugging it into Condor somehow might be a more robust solution that Galaxy can leverage without actually locking your development efforts into Galaxy-specific solutions. -John On Mon, Jan 5, 2015 at 6:35 AM, Abdulrahman Azab <azab@ifi.uio.no> wrote:
Hi Galaxy developers,
In our Elixir project (http://www.elixir-europe.org/), Norway, we have five geographically distributed Galaxy instances. Each is working on a local cluster (mostly SLURM). We are planning to interconnect those five clusters using a meta-scheduler ARC (http://www.nordugrid.org/arc/), to achieve load balancing, so that a galaxy job can be reallocated to an external cluster in case that the local cluster is saturated.
ARC manages the interconnection very well. What we need is to create a Galaxy job handler for ARC. Is there a general template or interface for a job handler, i.e. for defining job submission commands ... etc.?
and how to compile and integrate this new job handler and integrate it into the Galaxy installation?
Thank you,
Yours sincerely, Abdulrahman Azab
Head engineer, ELIXIR.NO / The Genomic HyperBrowser team Department of Informatics, University of Oslo, Boks 1072 Blindern, NO-0316 OSLO, Norway Email: azab@ifi.uio.no, Cell-phone: +47 46797339 ---- Senior Lecturer in Computer Engineering Faculty of Engineering, University of Mansoura, 35516-Mansoura, Egypt Email: abdulrahman.azab@mans.edu.eg
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Abdulrahman Azab
-
John Chilton