I have a question regarding the job queuing/scheduling in Galaxy,
because although I assured my users that the scheduling is fair (using the "round robin" scheduler), what I'm seeing is not exactly fair (and so users wait a long time for their jobs to run).
I'll try to explain the situation as best as I can, and I'll appreciate any feedback.
The attached diagram shows a situation similar to what I have.
At time "t0", user-A submits a workflow with many steps (the lines represent input-output connections, the numbers are job numbers).
After a while, jobs 1,2,3 are done (green). and jobs 4,5 are running (yellow) - assuming I can run only two jobs at a time.
If I understand correctly, jobs 6-12 are queued (i.e. their input dataset is "ready", so they are ready to run, and put on my local runner queue).
Job 13 is still "new" because its input datasets (11,12,13) are not ready.
At time "t1", (while jobs 4,5 are still running), user-B submits a single job (#14).
If I understand correctly, it will go from "new" to "queued" immediately, because it isn't dependent on any input dataset and is ready to run.
My question is:
Once jobs 4 or 5 are completed, which job will run next ?
If everything is "fair", I'd expect job #14 (from the second user) to run.
But what I think I'm seeing, is that the "queued jobs" queue is actually FIFO, and jobs 6-12 will be run before job 14.
(Job 14 however will run before job 13),
Is the "round-robin" queue refers only to the transition from "new" to "queued" ?
I could be completely wrong about this whole thing, but what I'm experiencing is that user-A submitted 5 workflows in parallel (all look very similar to the attached diagram),
I have 7 runnings jobs (of user-A), 35 queued job from user-A and 2 queued job from user-B, and galaxy consistently chooses to run queued job from user-A instead of user-B (presumably because they were queued before user-B submitted the jobs).
Is there a way to work around this issue ? a different configuration maybe ?
Or, an old request: is it possible to limit the number of jobs-per-user at any single time (i.e. a single user can run 3 jobs at any given time, even if no other users are running jobs and there are 7 workers ready) ?
Thanks for reading so far,
A minor UI bug when switching histories (in a long history list):
When clicking on one of the pages links (e.g. "7"), the history list grid is faded out - but only the upper part of it, not the entire list.
It looks like the height of the fadeout region is the region that would have been visible if I hadn't scrolled the screen all the way down (but I have to scroll down to reach the pages links).
See attached image for half faded / half not-faded history.
This happens if I have a long history list (longer than what fits on screen), then scroll all the way down, click on the page link (to switch to another history page) and quickly scroll up (before the page is updated).
Fade out by itself isn't critical, but the non-faded history items are still clickable.
Some of my galaxy users are seeing histories without being able to "expand" a dataset (see attached image).
The attached image is from Google Chrom on Windows XP.
The user tried Firefox (on windows) and everything worked fine.
I tried Google Chrom on Linux and it work fine,
So reproducing it is a bit hard - but it does happen more than once.
Any ideas on how to pin-point the problem ?
Is it network related (the connection dropped before completion) ?
I'm a employee for the The duch bioinformatics center (NBIC). We
decided to set up national Galaxy server with a focus on Next Generation
Sequences data. For this we need to acquire some hardware and services.
I would like have a idea what kind of hardware is necessary to operate
smoothly . We expect a the first month +- 30 (serious) users and will
grow to 100 user in 3 months.
I would like to get a rough estimate of:
-number of cpu
-hard drive space needed
I known it it hard to predict what kind of hardware I need, because it
depends on the user input. However I would like to get a indication.
I am experimenting with the Galaxy tool and adding a module itself is no problem. Furthermore there are a lot of datatypes supported (FASTA etc.), but it seems NetCDF is not yet 'supported'. So I found out that you can add your own datatype ( http://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes ), but before doing that, I was wondering how to support reading the NetCDF datatype - especially for mass spectrometry data?
Or is there another datatype which I can use to read mass spectrometry NetCDF's in Galaxy?
Thanks in advance and with kind regards,
UMCU Metabolomics Centre
Dept. Metabolic and Endocrine Diseases
Wilhelmina Childrens Hospital, UMC Utrecht
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht
ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct
te informeren door het bericht te retourneren. Het Universitair Medisch
Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W.
(Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij
de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
This message may contain confidential information and is intended exclusively
for the addressee. If you receive this message unintentionally, please do not
use the contents but notify the sender immediately by return e-mail. University
Medical Center Utrecht is a legal person by public law and is registered at
the Chamber of Commerce for Midden-Nederland under no. 30244197.
Please consider the environment before printing this e-mail.
Standard CentOS 5.4 install
[root@mako1 scripts]# python -V
Following the instructions
but fails at "sh setup.sh" because "check_python.py" throws and exeption
and returns "python -ES". Note sure what this means.