I noticed the recent conversation about the "not-enough-memory" issue and the suggestion that it may be caused by the latest drmaa library. I wonder if my another problem is also caused by the same drmaa library:
We do not have (yet) many users for galaxy;I am still installing tools and loading datasets. That's why there may be periods (of several hours) when nobody is running any tool from the Galaxy UI. And when it finally happens the job is submitted but never finishes. But it was not (as far as I can see) submitted yet to the LSF - it is not known to the command "bjobs" at all. The log just says:
- dispatching job 244 to drmaa runner
- job 244 dispatched
- (244) submitting file /home/galaxy/galaxy-dist/database/pbs/galaxy_244.sh
- (244) command is: seqret ...
But it does not continue by the usual:
- (244) queued as 222391
- (244/222391) state change: job is queued and active
It simply waits forever. However, and here is the interesting point, if, in this moment, I submit, manually - not using Galaxy, any job to the LSF (using the same LSF queue that is used by Galaxy) - e.g. something like this:
bsub -q galaxy -o $HOME/tmp/test.log "/usr/bin/env > $HOME/tmp/test.output"
this simple job is queued and done AND the other jobs, those started previously from the Galaxy UI and so far waiting (somewhere), suddenly are normally queues and executed. Strange, isn't it?
My workaround now is a bit silly: I have a cron job that runs every hour my simple bsub command (as in the example above) - and I have no problem starting jobs from Galaxy even after prolonged period of inactivity. But I wonder if somebody noticed similar behaviour, or if it is worth to use drmaa library 1.0.3?
Thanks for any help and cheers,
Martin
--
Martin Senger
email: martin.senger@gmail.com,martin.senger@kaust.edu.sa
skype: martinsenger