Jason,

 

Are the affected workflow steps actually failing or are they falsely being reported as “failed” (have you checked if correct output exists for the affected step)? Once a step/job is marked “failed’ you can’t use the output (even if it exists) for any subsequent step.

 

If you are using a cluster for your local galaxy install and NFS disk mounts then this may happen because of write cache delays. If that is the case, increasing the value for the “retry_job_output_collection” parameter  to a higher number in the universe_wsgi.ini should help you get around the problem. It fixed the problem in our local galaxy  where some jobs were being reported as “failed” though the correct output was there.

 

--Hemant

 

From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of J. Greenbaum
Sent: Thursday, May 24, 2012 9:18 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] possible to resume failed workflow?

 

Hi,

 

I've created a few workflows and have been having issues with some steps randomly failing.  This would not be an issue if I could simply resume the workflow from the failed step, but it seems that this is not possible.  Instead, I'm forced to restart the workflow from the beginning.  Is this true or am I missing something?

Thanks,

 

Jason

 

--

Jason Greenbaum, Ph.D.
Manager, Bioinformatics Core | jgbaum@liai.org
La Jolla Institute for Allergy and Immunology