Hi James, We have made some progress in understanding the workflow-specific job crashes. It seems that 'parallel' workflows are sending jobs simultaneously, and this is problematic for torque. We get this error: 10/18/2012 10:06:18;0080;PBS_Server;Req;req_reject;Reject reply code=15058(Bad DIS based Request Protocol MSG=cannot decode message), aux=0, type=Connect, from @ There is a thread here: http://osdir.com/ml/galaxy-source-control/2011-08/msg00136.html which is very similar to what we are experiencing. In the post linked above, the author indicates he found a fix (pasted below). Would you recommend we make the same change? Thanks! Todd "To deal with this I modified the lib/galaxy/jobs/runners/pbs.py script to make multiple attempts at submitting in the following way: @@ -286,6 +286,12 @@ class PBSJobRunner( BaseJobRunner ): log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) ) log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) ) job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None)+ ##Modified to give ten tries for qsubbing a job+ num_try=0+ while(not job_id and num_try<10): + job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None)+ num_try+=1+ pbs.pbs_disconnect(c) # check to see if it submitted " On 10/17/2012 9:40 AM, James Taylor wrote:
Todd, this is definitely unusual. Can you post (or send directly) relevant sections from the Galaxy log?
-- jt
On Tue, Oct 16, 2012 at 8:15 PM, Todd Oakley <todd.oakley@lifesci.ucsb.edu> wrote:
Hello, We just did a few tweaks to improve Galaxy performance, and a new issue popped up that I would like advice on troubleshooting.
When we run workflows, we see that tools later in the workflow run and crash before the results they depend on have completed running.
We can re-run the crashed jobs later and they work fine, suggesting that they are only failing in the context of running workflows.
I'd appreciate any advice on how to start troubleshooting this problem.
Thanks much! Todd
--
*************************************** Todd Oakley, Professor Ecology Evolution and Marine Biology University of California, Santa Barbara Santa Barbara, CA 93106 USA ***************************************
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- *************************************** Todd Oakley, Professor Ecology Evolution and Marine Biology University of California, Santa Barbara Santa Barbara, CA 93106 USA *************************************** Lab Website <http://labs.eemb.ucsb.edu/oakley/todd/> Twitter: @UCSB_OakleyLab *Recent Papers: * * Pancrustacean Phylotranscriptomics MBE Paper <http://mbe.oxfordjournals.org/content/early/2012/09/12/molbev.mss216.abstract> * Convergent Evolution in Cephalopoda BMC Ev Biol <http://www.biomedcentral.com/1471-2148/12/129/abstract> * Cnidocyte discharge regulated by opsin and light BMC Biology Paper <http://tinyurl.com/7dajl2q> Scientific American Write-up <http://blogs.scientificamerican.com/science-sushi/2012/03/05/hydra-watch-what-they-eat/> * Sponge Larvae Could be Guided by Cryptochrome J Exp Biol. Paper <http://jeb.biologists.org/content/215/8/ii> | Nature News <http://www.nature.com/nature/journal/v484/n7393/full/484145d.html>