On Mon, Feb 20, 2012 at 8:08 AM, Bram Slabbinck <brsla@psb.vib-ugent.be> wrote:
Hi Dannon,
If I may further elaborate on this issue, I would like to mention that this kind of functionality is also supported by the Sun Grid Engine in the form of 'array jobs'. With this functionality you can execute a job multiple times in an independent way, only differing for instance in the parameter settings. From your description below, it seems similar to the Galaxy parallelism tag. Is there or do you foresee any implementation of this SGE functionality through the drmaa interface in Galaxy? If not, is there anybody who has achieved this through some custom coding? We would be highly interested in this.
thanks Bram
I was wondering about why Galaxy submits N separate jobs to SGE after splitting (identical bar their working directory). I'm not sure if all the other cluster back ends supported can do this, but basic dependencies is possible using SGE. That means the cluster could take care of scheduling the split jobs, the N processing jobs, and the final merge job (i.e. three stages where for example it won't do the merge till all the N processing jobs are finished). My hunch is Galaxy is doing a lot of this 'housekeeping' internally in order to remain flexible regarding the cluster back end. Peter