I have 7 runnings jobs (of user-A), 35 queued job from user-A and 2 queued job from user-B, and galaxy consistently chooses to run queued job from user-A instead of user-B (presumably because they were queued before user-B submitted the jobs).
You've nailed it. The "queuing policy" determines the order in which the jobs are handed to the runner. If a user queues a bunch of jobs while no one else is waiting, and then another user comes along, they have to wait. This is definitely not ideal, however unless you are running *only* local job runners it is tricky to deal with because really it is the underlying queuing system that decides what order jobs actually get run in. However, I thought you were already mapping Galaxy users onto cluster users, which should result in the cluster scheduler dealing with some of these issues.
Is there a way to work around this issue ? a different configuration maybe ?
Or, an old request: is it possible to limit the number of jobs-per- user at any single time (i.e. a single user can run 3 jobs at any given time, even if no other users are running jobs and there are 7 workers ready) ?
I'd like to hear other suggestions, but I think some substantial rewriting is the only way to deal with this. Basically right now, the dispatcher has no notion of what is actually going on down in the runners. In particular, it doesn't know how many jobs the user already has running. We need to think about how to re-architect this. Fortunately, we're in the midst of a big rewrite of the job manager and scheduling, so this will definitely be something we work on. I'll try to think of an easy way to get the current framework to do what you want.