On Thu, Jul 28, 2011 at 11:03 PM, Shantanu Pavgi email@example.com wrote:
We experienced an issue where some of the galaxy jobs were sitting in the 'new' state for a quite long time. They were not waiting for cluster resources to become available, but haven't been even queued up through DRMAA. We are currently using non-debug mode and following were my observations:
- No indication of new jobs in paster.log file
- database/pbs script didn't contain any associated job scripts
- in backend database - job table contained their galaxy job id but no
command_line input was recorded
Also, not all the jobs are waiting in the 'new' state. Many jobs submitted after above waiting jobs got completed successfully on the cluster. Is there any job submission logic within galaxy which is being used for submitting jobs? Any clues on how to debug this issue will be really helpful.
I've just been searching the archives for any other cases of new jobs not getting queued (either with the local runner or via DRMAA) but sitting in state new - and I found your query.
Did you ever solve your issue Shantanu?
We had something similar just happen, but it affected all new jobs, unlike what you described. Fortunately I could work out what the cause was - our Galaxy partition had run out of disk space. I did some cleanup, and then I could submit and run new jobs - but the existing stalled jobs remained stalled.