Re: [galaxy-dev] possible to resume failed workflow?

25 May 2012

      Hi Hermant, 

Thanks for the suggestion. I've tried setting that parameter to 5, but it has not helped. I've noticed in the galaxy server output that I'm getting the following error: 

galaxy.jobs.runners.pbs DEBUG 2012-05-25 08:13:17,957 (96) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error 

I believe this is why certain jobs are failing. I've googled a bit for this error, and found many threads where similar problems were reported. Here are a couple: 

http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-February/004336.html 
http://osdir.com/ml/galaxy-development-source-control/2011-02/msg00148.html 

Is anyone aware of a solution to this issue? 

Thanks, 

J 

-- Jason Greenbaum, Ph.D. 
Manager, Bioinformatics Core | jgbaum@liai.org 
La Jolla Institute for Allergy and Immunology 

----- Original Message -----
...
From: "Hemant Kelkar" <hkelkar@unc.edu>
To: "J. Greenbaum" <jgbaum@liai.org>, galaxy-dev@lists.bx.psu.edu
Sent: Friday, May 25, 2012 4:34:12 AM
Subject: RE: [galaxy-dev] possible to resume failed workflow?
...
Jason,
...
Are the affected workflow steps actually failing or are they falsely
being reported as “failed” (have you checked if correct output
exists for the affected step)? Once a step/job is marked “failed’
you can’t use the output (even if it exists) for any subsequent
step.
...
If you are using a cluster for your local galaxy install and NFS disk
mounts then this may happen because of write cache delays. If that
is the case, increasing the value for the
“retry_job_output_collection” parameter to a higher number in the
universe_wsgi.ini should help you get around the problem. It fixed
the problem in our local galaxy where some jobs were being reported
as “failed” though the correct output was there.
...
--Hemant
...
From: galaxy-dev-bounces@lists.bx.psu.edu
[mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of J.
Greenbaum
Sent: Thursday, May 24, 2012 9:18 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] possible to resume failed workflow?
...
Hi,
...
I've created a few workflows and have been having issues with some
steps randomly failing. This would not be an issue if I could simply
resume the workflow from the failed step, but it seems that this is
not possible. Instead, I'm forced to restart the workflow from the
beginning. Is this true or am I missing something?
...
Thanks,
...
Jason
...
--
Jason Greenbaum, Ph.D.
Manager, Bioinformatics Core | jgbaum@liai.org
La Jolla Institute for Allergy and Immunology