Hi, Are there options available to track the actual runtime of jobs on a cluster and store them in the database? Or are there fields in the database that approximate the job execution duration? This might be useful for fine-grained wall-time estimation in a crowded cluster environment. What I'd like to do is fetch an average runtime / mb of input data for a specific tool from the database, and than use this for wall-time estimation of new jobs in a dynamic job runner script. Has this been done before? Best, Geert -- Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail: geert.vandeweyer@ua.ac.be http://ua.ac.be/cognitivegenetics http://www.linkedin.com/pub/geert-vandeweyer/26/457/726
+1 for me! Alex ________________________________________ Van: galaxy-dev-bounces@lists.bx.psu.edu [galaxy-dev-bounces@lists.bx.psu.edu] namens Peter Cock [p.j.a.cock@googlemail.com] Verzonden: woensdag 8 mei 2013 12:06 To: Geert Vandeweyer Cc: galaxy-dev@lists.bx.psu.edu Onderwerp: Re: [galaxy-dev] Track Job Runtime On Wed, May 8, 2013 at 10:08 AM, Geert Vandeweyer <geert.vandeweyer2@ua.ac.be> wrote:
Hi,
Are there options available to track the actual runtime of jobs on a cluster and store them in the database?
Not yet, but I'd really like to have that information too. Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi all, I'm fiddling with this, and I have a proof of principle working for PBS jobs using very ugly sql-alchemy hack. Idea is: - if job goes from queued to running : store seconds since epoch in 'runtime' - if job goes from running to finished : compare time, store difference as runtime. I've created an extra field 'runtime' for holding this info, using seconds since epoch. When querying afterwards, one should filter for 'OK' jobs, and discard jobs that are still running. right now, i have these statements to add timestamps to the database (somewhere in the check_watched_items function in pbs.py) : self.sa_session.execute('UPDATE job SET runtime = :runtime WHERE id = :id',{'runtime':runtime,'id':galaxy_job_id}) Does anybody know how to translate this to a proper sqlalchemy statement such as (which does not work): self.sa_session.query(self.model.Job).filter_by(id=galaxy_job_id).update({"runtime":runtime},synchronize_session=False) or sa_session.execute(self.sa_session.Table('job').update().values(runtime=runtime).where(id=galaxy_job_id)) If I can figure this out, I'll try to polish it and create a pull request. Best, Geert On 05/08/2013 03:58 PM, Bossers, Alex wrote:
+1 for me! Alex
________________________________________ Van: galaxy-dev-bounces@lists.bx.psu.edu [galaxy-dev-bounces@lists.bx.psu.edu] namens Peter Cock [p.j.a.cock@googlemail.com] Verzonden: woensdag 8 mei 2013 12:06 To: Geert Vandeweyer Cc: galaxy-dev@lists.bx.psu.edu Onderwerp: Re: [galaxy-dev] Track Job Runtime
On Wed, May 8, 2013 at 10:08 AM, Geert Vandeweyer <geert.vandeweyer2@ua.ac.be> wrote:
Hi,
Are there options available to track the actual runtime of jobs on a cluster and store them in the database? Not yet, but I'd really like to have that information too.
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail: geert.vandeweyer@ua.ac.be http://ua.ac.be/cognitivegenetics http://www.linkedin.com/pub/geert-vandeweyer/26/457/726
For what it is worth, I think the right way to handle this is to add a new table that tracks job state changes. Whenever the state of a job changes a row would be written with the job id, new state, and timestamp. (You could then try to populate it for existing jobs in exactly the way you propose as part of a migration). -- James Taylor, Assistant Professor, Biology/CS, Emory University On Wed, May 8, 2013 at 11:38 AM, Geert Vandeweyer < geert.vandeweyer2@ua.ac.be> wrote:
Hi all,
I'm fiddling with this, and I have a proof of principle working for PBS jobs using very ugly sql-alchemy hack.
Idea is: - if job goes from queued to running : store seconds since epoch in 'runtime' - if job goes from running to finished : compare time, store difference as runtime.
I've created an extra field 'runtime' for holding this info, using seconds since epoch. When querying afterwards, one should filter for 'OK' jobs, and discard jobs that are still running.
right now, i have these statements to add timestamps to the database (somewhere in the check_watched_items function in pbs.py) :
self.sa_session.execute('**UPDATE job SET runtime = :runtime WHERE id = :id',{'runtime':runtime,'id':**galaxy_job_id})
Does anybody know how to translate this to a proper sqlalchemy statement such as (which does not work):
self.sa_session.query(self.**model.Job).filter_by(id=** galaxy_job_id).update({"**runtime":runtime},synchronize_**session=False) or sa_session.execute(self.sa_**session.Table('job').update().** values(runtime=runtime).where(**id=galaxy_job_id))
If I can figure this out, I'll try to polish it and create a pull request.
Best,
Geert
On 05/08/2013 03:58 PM, Bossers, Alex wrote:
+1 for me! Alex
______________________________**__________ Van: galaxy-dev-bounces@lists.bx.**psu.edu<galaxy-dev-bounces@lists.bx.psu.edu>[ galaxy-dev-bounces@lists.bx.**psu.edu<galaxy-dev-bounces@lists.bx.psu.edu>] namens Peter Cock [p.j.a.cock@googlemail.com] Verzonden: woensdag 8 mei 2013 12:06 To: Geert Vandeweyer Cc: galaxy-dev@lists.bx.psu.edu Onderwerp: Re: [galaxy-dev] Track Job Runtime
On Wed, May 8, 2013 at 10:08 AM, Geert Vandeweyer <geert.vandeweyer2@ua.ac.be> wrote:
Hi,
Are there options available to track the actual runtime of jobs on a cluster and store them in the database?
Not yet, but I'd really like to have that information too.
Peter ______________________________**_____________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/**search/mailinglists/<http://galaxyproject.org/search/mailinglists/>
--
Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail: geert.vandeweyer@ua.ac.be http://ua.ac.be/**cognitivegenetics <http://ua.ac.be/cognitivegenetics> http://www.linkedin.com/pub/**geert-vandeweyer/26/457/726<http://www.linkedin.com/pub/geert-vandeweyer/26/457/726>
______________________________**_____________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/**search/mailinglists/<http://galaxyproject.org/search/mailinglists/>
On 05/08/2013 05:38 PM, Geert Vandeweyer wrote:
self.sa_session.execute('UPDATE job SET runtime = :runtime WHERE id = :id',{'runtime':runtime,'id':galaxy_job_id})
does anybody have a solution to convert this statement to proper sqlalchemy syntax, for use in the check_watched_items function in pbs.py ? Regarding Taylor's suggestion: A separate table is also an option, but would take more queries & joins to estimate walltime at startup (join table with job table for job type (on job-id), request two rows per "finished" jobid, substract end-start timestamp, average. An extra column in the job table only needs one query on one table (select runtime from jobs where type = 'x' and state = 'ok'). Best, Geert -- Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail: geert.vandeweyer@ua.ac.be http://ua.ac.be/cognitivegenetics http://www.linkedin.com/pub/geert-vandeweyer/26/457/726
participants (4)
-
Bossers, Alex
-
Geert Vandeweyer
-
James Taylor
-
Peter Cock