Re: [galaxy-dev] Galaxy not killing split cluster jobs

3 May 2012


      ...
On a related point, I've noticed sometimes one child job from a split task
can fail, yet the rest of the child jobs continue to run on the cluster wasting
CPU time. As soon as one child job dies (assuming there are no plans for
attempting a retry), I would like the parent task to kill all the
other children,
and fail itself. I suppose you could merge the output of any children which
did finish... but it would be simpler not to bother.
Right now, yes, this would make sense- I'll see about adding it.  Ultimately we want to build in a mechanism for retrying child tasks that fail due to cluster errors, etc, so it isn't necessary to rerun the entire job.

-Dannon

Re: [galaxy-dev] Galaxy not killing split cluster jobs

Dannon Baker