Hello everyone,
I have posted this earlier but I am afraid it did not go through I hope
J.
I was able to setup galaxy to work with our HPC cluster using the LSF scheduler. So far so good except with few exceptions:
1)
I noticed one thing that submitting a job after a long period (for example overnight) the jobs do not get executed and more will not show up as jobs in the queue when I execute the “bjobs” command from the command line. As if the jobs
were never submitted to the LSF. However, if I submit a job from the command (i.e >bsub sleep -5), then I check the jobs in the queue using the bjobs command I see this job as well as the other jobs that were submitted and could not see them before.
Weird ….
Has anyone seen this behavior before? Is this related to galaxy setup? Is there anything I should try out to get rid of such behavior?
2)
Also related to LSF setup. Every time I restart galaxy it will not restart rather it will crash. Then if I start it again it will start after that. Here is the error I keep seeing after the first restart
“galaxy.jobs.runners.state_handler_factory DEBUG 2015-08-04 08:12:17,484 Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit “
Any idea to get rid of this as well? Is this a job still in the database that I need to clean manually? If so can you tell me what table(s) to look into to clear out.
3. Finally, how do I control the resources (i.e cores for a job ) given to a submitted job on Galaxy?
Thank you in advance for any tips or hints to resolve these issues.
Best regards,
Hak