Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

20 Feb 2012

      Peter has it right in that we need to do this internally to ensure functionality across a range of job runners.  A side benefit is that it gives us direct access to the tasks so that we can eventually do interesting things with scheduling, resubmission, feedback, etc.  If the overhead looks to be a performance issue I could see having an override that would allow pushing task scheduling to the underlying cluster, but that functionality would come later.

-Dannon

On Feb 20, 2012, at 3:13 AM, Peter Cock wrote:
...
On Mon, Feb 20, 2012 at 8:08 AM, Bram Slabbinck <brsla@psb.vib-ugent.be> wrote:
...
Hi Dannon,
If I may further elaborate on this issue, I would like to mention that this
kind of functionality is also supported by the Sun Grid Engine in the form
of 'array jobs'. With this functionality you can execute a job multiple
times in an independent way, only differing for instance in the parameter
settings. From your description below, it seems similar to the Galaxy
parallelism tag. Is there or do you foresee any implementation of this SGE
functionality through the drmaa interface in Galaxy? If not, is there
anybody who has achieved this through some custom coding? We
would be highly interested in this.
thanks
Bram
I was wondering about why Galaxy submits N separate jobs to SGE
after splitting (identical bar their working directory). I'm not sure if
all the other cluster back ends supported can do this, but basic
dependencies is possible using SGE. That means the cluster could
take care of scheduling the split jobs, the N processing jobs, and
the final merge job (i.e. three stages where for example it won't
do the merge till all the N processing jobs are finished).
My hunch is Galaxy is doing a lot of this 'housekeeping' internally
in order to remain flexible regarding the cluster back end.
Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Dannon Baker