Round-Robin Scheduling
Hello, I have another question regarding the local job scheduler: Is it possible to limit the number of jobs *per user* ? That is - any given user can have up to X number of jobs running concurrently, regardless of the value of local_job_queue_workers ? Imagine the following situation: local_job_queue_worker = 5 job_scheduler_policy = galaxy.jobs.schedulingpolicy.roundrobin:UserRoundRobin Which means that at any given moment, galaxy can run only five jobs. Now, Galaxy is completely Idle, no jobs are running. One users starts 7 very long running jobs (each jobs will take about two hours). If I understand correctly - since no jobs are running, 5 of the user's job will be started immediately, even with the round-robin policy, right ? And this means that for the next two hours, every other user which starts a job - his/her job will be either new or limbo-running, but none will actually be started, right ? I think I'm experiencing this situation on my galaxy server. Users are complaining their jobs have been running 'forever' or not even starting for a long long time. Close examination shows that there are running 5 jobs (all from the same user) which have been running for three hours, and they are kind of hogging all the worker threads. To make a long story short - I would like to make sure a single user can't hog Galaxy. Is it possible with the local job runner, and if not - is it possible with the SGE job runner ? Thanks for reading so far, Gordon.
Hello Assaf, I've opened the following ticket for this issue. As usual, thanks very much for your message. http://bitbucket.org/galaxy/galaxy-central/issue/62/round-robin-job-scheduli... Greg Von Kuster Galaxy Development Team Assaf Gordon wrote:
Hello,
I have another question regarding the local job scheduler:
Is it possible to limit the number of jobs *per user* ?
That is - any given user can have up to X number of jobs running concurrently, regardless of the value of local_job_queue_workers ?
Imagine the following situation:
local_job_queue_worker = 5 job_scheduler_policy = galaxy.jobs.schedulingpolicy.roundrobin:UserRoundRobin
Which means that at any given moment, galaxy can run only five jobs.
Now, Galaxy is completely Idle, no jobs are running. One users starts 7 very long running jobs (each jobs will take about two hours). If I understand correctly - since no jobs are running, 5 of the user's job will be started immediately, even with the round-robin policy, right ?
And this means that for the next two hours, every other user which starts a job - his/her job will be either new or limbo-running, but none will actually be started, right ?
I think I'm experiencing this situation on my galaxy server. Users are complaining their jobs have been running 'forever' or not even starting for a long long time.
Close examination shows that there are running 5 jobs (all from the same user) which have been running for three hours, and they are kind of hogging all the worker threads.
To make a long story short - I would like to make sure a single user can't hog Galaxy. Is it possible with the local job runner, and if not - is it possible with the SGE job runner ?
Thanks for reading so far, Gordon.
_______________________________________________ galaxy-dev mailing list galaxy-dev@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-dev
participants (2)
-
Assaf Gordon
-
Greg Von Kuster