Re: [galaxy-dev] Job execution order mixed-up

15 Nov 2013

      On Wed, Nov 13, 2013 at 10:10 AM, Jean-Francois Payotte <
jean-francois.payotte@dnalandmarks.ca> wrote:
...
Hi John,
Thank you for your answer and for trying to help. This is greatly
appreciated!
I didn't really made any progress in tracking down this error, and
hopefully this weird behaviour will not happen anymore with the November
4th, distribution.
But here are my answers to your questions, in case it would ring a bell:
1. Has this behaviour been reported with any other workflow?
   It has been reported with 2 different workflows as of now. These 2
   workflows doesn't have anything in common, except that they are huge (one
   of them has 37 steps, producing a total of about 110 datasets).
2. Are you running Galaxy as a single process or multiple processes?
   If multiple processes, how many web, handler and manager processes do you
   have and are they all on the same machine?
   We are running Galaxy in multiple processes with 5 web servers, 3 job
   handlers and no manager (I believe the manager was rendered obsolete in one
   of the latest Galaxy distributions). All these processes are run on the
   same machine.
I did not catch that a manager is no longer needed. Great.
...
1.
2. Have you made any modifications to Galaxy that could result in
   this behaviour?
   No.
3. What is the value of track_jobs_in_database in your
   universe_wsgi.ini configuration file?
   We never touched this part of the configuration file and the line
   still reads: "#track_jobs_in_database = None".
   After reading your answer, I've decided to modify this line to:
   "track_jobs_in_database = True"
   Unfortunately, running one of the faulty workflows several times (5x),
   I noticed that one of them was still showing this strange behaviour where
   some jobs were executed before their inputs were ready.
   4.
Do you think this issue could be related to the fact that we are using
Galaxy with the multiple processes configuration? We implemented this
configuration some time ago because some of our users were complaining
about the slow responsiveness of the web interface.
Would you recommend using Galaxy without the multiple processes
configuration? (Lets say if updating to November 4th distribution doesn't
fix this issue)
I guess you are probably using the multiple processes configuration as
well on Galaxy main?
Right, a lot of institutions run in the multiple process configuration
including usegalaxy.org so it is not probably not explicitly caused by
having multiple processes. If I had to guess though I would guess it was
some sort of caching problem - one of these processes is marking this job
as complete in the database but than another handler has a different view
of the database or something - that would potentially go away in a single
process mode. Obviously single process mode is not a long term solution,
but if that fixed the problem it would tell a lot.

Are you using postgres?

-John
...
Thanks again for your help!
Jean-François
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Posted by *John Chilton* on *Nov 09, 2013; 2:50pm*
Hello Jean-François,
Have you made any progress tracking down this error? This appears very
serious, but to tell you the truth I have no clue what could cause it. The
distribution you are using is pretty old at this point I feel like if it
was a bug the exhibited under relatively standard parameter combinations
someone else would have reported it by now.
Can you tell me some things: has this been reported with any other
workflows? Is there anything special about this workflow? Can you rebuild
the workflow and see if the error occurs again?
Additional questions if the problem is not restricted to the workflow:
are you running Galaxy as a single process or multiple processes? If
multiple processes, how many web, handler, and manager processes do you
have? Are they all on the same machine? Have you made any modifications to
Galaxy that could result in this behavior? What is the value of
track_jobs_in_database in your universe_wsgi.ini configuration file?
-John
On Thu, Nov 7, 2013 at 10:34 AM, Jean-Francois Payotte <*[hidden email]*<http://dev.list.galaxyproject.org/user/SendEmail.jtp?type=node&node=4662520&i=0>>
wrote:
Dear Galaxy mailing-list,
Once again I come seeking for your help. I hope someone already had this
issue or will have an idea on where to look to solve it. :)
One of our users reported having workflows failing because some steps were
executed before all their inputs where ready.
You can find a screenshot attached, where we can see that step (42) "Sort
on data 39" has been executed while step (39) is still waiting to run (gray
box).
This behaviour has been reproduced with at least two different Galaxy
tools (one custom, and the sort tool which comes standard with Galaxy).
This behaviour seems to be a little bit random, as running two times a
workflow where this issue occurs, only one time did some steps were
executed in the wrong order.
I could be wrong, but I don't think this issue is grid-related as, from my
understanding, Galaxy is not using SGE job dependencies functionality.
I believe all jobs stays in some internal queues (within Galaxy) until all
input files are ready, and only then the job is submitted to the cluster.
Any help or any hint on what to look at to solve this issue would be
greatly appreciated.
We have updated our Galaxy instance to August 12th distribution on October
1st, and I believe we never experienced this issue before the update.
Many thanks for your help,
Jean-François
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/