Hi,
Can Galaxy resubmit a job if the node where the job is running fails? I know sge can do that by using qsub -r.
It should be very useful if Galaxy can do that.
Thank you,
Cai
I have not seen any reply to this question from last year, so I wanted to re-up it again...
I also run into this issue quite often and with the recent introduction of the (half completed?) feature of the notion of Paused-Jobs it seems we are getting very close to this working...
I know I can re-run a failed job, but I don't think restarting a paused job that is reliant on that first job "knows" to wiat for the new job to finish, does it?
But I suspect that is what the intended functionality is, since otherwise paused jobs are not very useful....
I am still using the Feb-8 version of Galaxy, but don't think I saw anything in the April version that addresses this issue, right?
Maybe it would be useful to make one particular error state ("Job did not return any result from the cluster" or something of that sort that I see if a cluster node fails) make Galaxy simply re-submit the job (with a fixed number of tries ofcourse, 3 seems a decent number) and keep on going, rather than immediately make the job go into error state...
Thanks,
Thon
On Apr 19, 2012, at 09:39 AM, zhengqiu cai caizhq2005@yahoo.com.cn wrote:
Hi,
Can Galaxy resubmit a job if the node where the job is running fails? I know sge can do that by using qsub -r.
It should be very useful if Galaxy can do that.
Thank you,
Cai
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Related, but not automatic retries in case of node failures -- you can follow the progress of workflow continuation (without rerunning, etc) here: https://trello.com/c/kpARiWl5
galaxy-dev@lists.galaxyproject.org