LSF cluster wierd behaviours!

4 Aug 2015

      Hello everyone,

I have posted this earlier but I am afraid it did not go through I hope :).

I was able to setup galaxy to work with our HPC cluster using the LSF scheduler.  So far so good except with few exceptions:

1)      I noticed one thing that submitting a job after a long period (for example overnight) the jobs do not get executed and more will not show up as jobs in the queue when I execute the "bjobs" command from the command line.   As if the jobs were never submitted to the LSF. However, if I submit a job from the command (i.e >bsub sleep -5), then I check the jobs in the queue using the bjobs command I see this job as well as the other jobs that were submitted and could not see them before.

Weird ....

Has anyone seen this behavior before?  Is this related to galaxy setup?  Is there anything I should try out to get rid of such behavior?

2)       Also related to LSF setup.  Every time I restart galaxy it will not restart rather it will crash.  Then if I start it again it will start after that.  Here is the error I keep seeing after the first restart
"galaxy.jobs.runners.state_handler_factory DEBUG 2015-08-04 08:12:17,484 Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit  "
Any idea to get rid of this as well?  Is this a job still in the database that I need to clean manually?  If so can you tell me what table(s) to look into to clear out.

3.  Finally, how do I control the resources (i.e cores for a job ) given to a submitted job on Galaxy?

Thank you in advance for any tips or hints to resolve these issues.

Best regards,

Hak

Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center.

Hakeem Almabrazi

Nicola Soranzo

Hakeem Almabrazi

Nicola Soranzo

Hakeem Almabrazi

Nicola Soranzo

tags

participants (2)