'Never ending' jobs in history
Hi, Occasionally, we see jobs that are constantly in the 'job running' state in the history. This is despite the fact that the job completed correctly. I can also see in the paster.log file that Galaxy has picked up on the fact that the job ended, but it just hasn't updated the UI. Clicking the eye for the 'running' history items does show the results of the run. This does not happen all the time. Is this a known issue and/or how can I resolve it? Thanks, Chris
Chris Cole wrote:
Hi,
Occasionally, we see jobs that are constantly in the 'job running' state in the history. This is despite the fact that the job completed correctly. I can also see in the paster.log file that Galaxy has picked up on the fact that the job ended, but it just hasn't updated the UI. Clicking the eye for the 'running' history items does show the results of the run.
Hi Chris, Is there anything unusual displayed in the paster log when this happens? --nate
This does not happen all the time. Is this a known issue and/or how can I resolve it? Thanks,
Chris
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On 19/03/10 12:51, Nate Coraor wrote:
Chris Cole wrote:
Hi,
Occasionally, we see jobs that are constantly in the 'job running' state in the history. This is despite the fact that the job completed correctly. I can also see in the paster.log file that Galaxy has picked up on the fact that the job ended, but it just hasn't updated the UI. Clicking the eye for the 'running' history items does show the results of the run.
Hi Nate, Sorry for not getting back earlier. Been waiting for another one to occur.
Is there anything unusual displayed in the paster log when this happens?
Not as far I can see. This is the progress from paster.log of a job that has finished, but is still 'running' in the history: galaxy.jobs DEBUG 2010-03-26 13:43:27,245 job 1493 put in policy queue galaxy.jobs.schedulingpolicy.roundrobin DEBUG 2010-03-26 13:43:27,245 RoundRobin queue: retrieving job from job queue for session = 369 galaxy.jobs DEBUG 2010-03-26 13:43:27,245 dispatching job 1493 to sge runner 10.31.3.203 - - [26/Mar/2010:13:43:26 +0100] "POST /tool_runner/index HTTP/1.1" 200 - "http://wsdev.compbio.dundee.ac.uk:3216/tool_runner/rerun?id=1533" "Moz illa/5.0 (X11; U; Linux i686; en-GB; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2" 10.31.3.203 - - [26/Mar/2010:13:43:27 +0100] "GET /history HTTP/1.1" 200 - "http://wsdev.compbio.dundee.ac.uk:3216/tool_runner/index" "Mozilla/5.0 (X11; U; L inux i686; en-GB; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2" galaxy.jobs.runners.sge DEBUG 2010-03-26 13:43:28,624 (1493) submitting file /homes/www-galaxy/galaxy_devel/database/pbs/galaxy_1493.sh galaxy.jobs.runners.sge DEBUG 2010-03-26 13:43:28,625 (1493) command is: python /homes/www-galaxy/galaxy_devel/tools/sr_mapping/bowtie_wrapper.py --thre ads="4" --dataType="solexa" --output=/homes/www-galaxy/galaxy_devel/database/files/001/dataset_1619.dat --suppressHeader=false --genomeSource =indexed --snpphred="None" --snpfrac="None" --keepends="None" --ref=/db/bowtie/dicty_genomic --indexSettings="None" --iautoB="None " --ipacked="None" --ibmax="None" --ibmaxdivn="None" --idcv="None" --inodc="None" --inoref="None" --ioffrate="None" - -iftab="None" --intoa="None" --iendian="None" --iseed="None" --icutoff="None" --paired=single --input1=/homes/www-galaxy/galaxy_ devel/database/files/001/dataset_1593.dat --input2="None" --params=full --skip=0 --alignLimit=-1 --trimH=0 --trimL=0 --mismatchSeed=2 --mismatchQual=70 --seedLen=18 --rounding=noRound --maqSoapAlign=-1 --tryHard=noTryHard --valAlign=1 --allValAligns=doAllValAligns --suppressAlign=40 --best=doBest --maxBacktracks=800 --strata=doStrata --offrate=-1 -- seed=-1 --minInsert="None" --maxInsert="None" --mateOrient="None" --maxAlignAttempt="None" --forwardAlign="None" --reverseAlign ="None" galaxy.jobs.runners.sge DEBUG 2010-03-26 13:43:28,640 (1493) queued in 64bit-pri.q queue as 789899 galaxy.jobs DEBUG 2010-03-26 13:43:28,793 job 1493 dispatched galaxy.jobs.runners.sge DEBUG 2010-03-26 13:43:29,775 (1493/789899) state change: job is queued and waiting to be scheduled 10.31.3.203 - - [26/Mar/2010:13:43:32 +0100] "POST /root/history_item_updates HTTP/1.1" 200 - "http://wsdev.compbio.dundee.ac.uk:3216/history" "Mozilla/5.0 ( X11; U; Linux i686; en-GB; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2" 10.31.3.203 - - [26/Mar/2010:13:43:35 +0100] "POST /root/history_item_updates HTTP/1.1" 200 - "http://wsdev.compbio.dundee.ac.uk:3216/history" "Mozilla/5.0 ( X11; U; Linux i686; en-GB; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2" galaxy.jobs.runners.sge DEBUG 2010-03-26 13:43:37,790 (1493/789899) state change: job is running 10.31.3.203 - - [26/Mar/2010:13:43:38 +0100] "POST /root/history_item_updates HTTP/1.1" 200 - "http://wsdev.compbio.dundee.ac.uk:3216/history" "Mozilla/5.0 ( X11; U; Linux i686; en-GB; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2" galaxy.jobs.schedulingpolicy.roundrobin DEBUG 2010-03-26 13:43:38,846 RoundRobin queue clean up: Removed job queue entry from dictionary for session = 369 THen a while later... galaxy.jobs.runners.sge DEBUG 2010-03-26 13:49:06,988 (1493/789899) state change: job finished normally galaxy.jobs DEBUG 2010-03-26 13:49:07,424 job 1493 ended Yet, here I am at 14:34 and the job is still 'running' in the history. Any ideas? Chris p.s. Typical! Just as I was about to click send, it turned green... Still, is there any reason for the delay?
Chris Cole wrote:
galaxy.jobs.runners.sge DEBUG 2010-03-26 13:49:06,988 (1493/789899) state change: job finished normally galaxy.jobs DEBUG 2010-03-26 13:49:07,424 job 1493 ended
Yet, here I am at 14:34 and the job is still 'running' in the history. Any ideas?
Chris
p.s. Typical! Just as I was about to click send, it turned green... Still, is there any reason for the delay?
Hi Chris, Are you still seeing hits to /root/history_item_updates after the "job X ended" message? Also I assume you are not using SQLite? Thanks, --nate
On 26/03/10 14:41, Nate Coraor wrote:
Chris Cole wrote:
galaxy.jobs.runners.sge DEBUG 2010-03-26 13:49:06,988 (1493/789899) state change: job finished normally galaxy.jobs DEBUG 2010-03-26 13:49:07,424 job 1493 ended
Yet, here I am at 14:34 and the job is still 'running' in the history. Any ideas?
Chris
p.s. Typical! Just as I was about to click send, it turned green... Still, is there any reason for the delay?
Hi Chris,
Are you still seeing hits to /root/history_item_updates after the "job X ended" message? Also I assume you are not using SQLite?
Yes and correct (MySQL). Although, I can't be sure the history_item_updates aren't for another history item that's running. Cheers, Chris
Chris Cole wrote:
On 26/03/10 14:41, Nate Coraor wrote:
Are you still seeing hits to /root/history_item_updates after the "job X ended" message? Also I assume you are not using SQLite?
Yes and correct (MySQL). Although, I can't be sure the history_item_updates aren't for another history item that's running.
All of the updates are sent in a single request. When this happens, can you go have a look at the 'state' column in the 'dataset' table in the database and see if it's actually been updated? The dataset id for the WHERE clause can be found in the output file path in the command line (in the output you sent, it's 1619). Thanks, --nate
Cheers,
Chris
On 26/03/10 14:59, Nate Coraor wrote:
Chris Cole wrote:
On 26/03/10 14:41, Nate Coraor wrote:
Are you still seeing hits to /root/history_item_updates after the "job X ended" message? Also I assume you are not using SQLite?
Yes and correct (MySQL). Although, I can't be sure the history_item_updates aren't for another history item that's running.
All of the updates are sent in a single request.
When this happens, can you go have a look at the 'state' column in the 'dataset' table in the database and see if it's actually been updated? The dataset id for the WHERE clause can be found in the output file path in the command line (in the output you sent, it's 1619).
Will do, next time I have a misbehaving job. Cheers, Chris
participants (2)
-
Chris Cole
-
Nate Coraor