On Thu, Jul 28, 2011 at 11:03 PM, Shantanu Pavgi <pavgi(a)uab.edu> wrote:
We experienced an issue where some of the galaxy jobs were sitting in the
'new' state for a quite long time. They were not waiting for cluster resources
to become available, but haven't been even queued up through DRMAA.
We are currently using non-debug mode and following were my observations:
* No indication of new jobs in paster.log file
* database/pbs script didn't contain any associated job scripts
* in backend database - job table contained their galaxy job id but no
command_line input was recorded
Also, not all the jobs are waiting in the 'new' state. Many jobs submitted after
above waiting jobs got completed successfully on the cluster. Is there any
job submission logic within galaxy which is being used for submitting jobs?
Any clues on how to debug this issue will be really helpful.
I've just been searching the archives for any other cases of new jobs not
getting queued (either with the local runner or via DRMAA) but sitting
in state new - and I found your query.
Did you ever solve your issue Shantanu?
We had something similar just happen, but it affected all new jobs,
unlike what you described. Fortunately I could work out what the cause
was - our Galaxy partition had run out of disk space. I did some
cleanup, and then I could submit and run new jobs - but the existing
stalled jobs remained stalled.
Peter