Peter has it right in that we need to do this internally to ensure functionality across a range of job runners. A side benefit is that it gives us direct access to the tasks so that we can eventually do interesting things with scheduling, resubmission, feedback, etc. If the overhead looks to be a performance issue I could see having an override that would allow pushing task scheduling to the underlying cluster, but that functionality would come later. -Dannon On Feb 20, 2012, at 3:13 AM, Peter Cock wrote:
On Mon, Feb 20, 2012 at 8:08 AM, Bram Slabbinck <brsla@psb.vib-ugent.be> wrote:
Hi Dannon,
If I may further elaborate on this issue, I would like to mention that this kind of functionality is also supported by the Sun Grid Engine in the form of 'array jobs'. With this functionality you can execute a job multiple times in an independent way, only differing for instance in the parameter settings. From your description below, it seems similar to the Galaxy parallelism tag. Is there or do you foresee any implementation of this SGE functionality through the drmaa interface in Galaxy? If not, is there anybody who has achieved this through some custom coding? We would be highly interested in this.
thanks Bram
I was wondering about why Galaxy submits N separate jobs to SGE after splitting (identical bar their working directory). I'm not sure if all the other cluster back ends supported can do this, but basic dependencies is possible using SGE. That means the cluster could take care of scheduling the split jobs, the N processing jobs, and the final merge job (i.e. three stages where for example it won't do the merge till all the N processing jobs are finished).
My hunch is Galaxy is doing a lot of this 'housekeeping' internally in order to remain flexible regarding the cluster back end.
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: