Re: [galaxy-dev] [galaxy-bugs] Some issues when running jobs through drmaa job runner.

newer
Re: [galaxy-dev] [galaxy-user]...

older
"Could not save visualization"...

Luobin Yang

16 Aug 2011 16 Aug '11

9:49 p.m.

Hi, Nate, On Tue, Aug 16, 2011 at 12:38 PM, Nate Coraor <nate@bx.psu.edu> wrote:

...

Hi Luobin,

Sorry for the delay in response. I would suggest moving this discussion to the galaxy-dev mailing list since it does not contain any private data. The wide audience on galaxy-dev may be able to come up with additional ideas than what we on the Galaxy Team come up with.

Please see my responses inline below.

Luobin Yang wrote:

...
Hi,

Thanks for Martin Dahlo's excellent blog ( http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html) on making SLURM work with GALAXY, I am able to run galaxy jobs on a cluster, but I've got a couple of issues:

1. Sometimes a job is in waiting state and it won't start to run until I restart galaxy.

When this happens, is the job stuck in Galaxy or in the queueing system (PBS, SGE, something else?)?

When this happens, the job is stuck in Galaxy, it doesn't submit the job to SLURM queueing system.

...

In your config file (universe_wsgi.ini), is track_jobs_in_database = True or False? Are you using SQLite or another database? If you watch

I am using PostgreSQL database system. track_jobs_in_database is False in my config file. Which option is better when I have a queueing system?

...

the Galaxy log, are any exceptions or other errors logged when a job becomes stuck?

Galaxy log doesn't show any exception or errors.

...

...
2. Sometimes a job is in running state even though it is already finished and restarting galaxy can make the job's state change from running to finished.

Most likely, Galaxy is setting metadata on the job's outputs. You can probably speed up this process by setting:

set_metadata_externally = True

in the config file.

Thanks, --nate

...
I am wondering what's causing those issues and how they can be fixed.

Thanks, Luobin

...

Attachments:

attachment.htm (text/html — 2.9 KB)

Show replies by date

Roman Valls

17 Aug 17 Aug

8:59 a.m.

New subject: [galaxy-bugs] Some issues when running jobs through drmaa job runner.

On 2011-08-16 23:49, Luobin Yang wrote:

...

Hi, Nate,

On Tue, Aug 16, 2011 at 12:38 PM, Nate Coraor <nate@bx.psu.edu <mailto:nate@bx.psu.edu>> wrote:

Hi Luobin,

Sorry for the delay in response. I would suggest moving this discussion to the galaxy-dev mailing list since it does not contain any private data. The wide audience on galaxy-dev may be able to come up with additional ideas than what we on the Galaxy Team come up with.

Please see my responses inline below.

Luobin Yang wrote: > Hi, > > Thanks for Martin Dahlo's excellent blog ( > http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html) on making SLURM > work with GALAXY, I am able to run galaxy jobs on a cluster, but I've got a > couple of issues: > > 1. Sometimes a job is in waiting state and it won't start to run until I > restart galaxy.

When this happens, is the job stuck in Galaxy or in the queueing system (PBS, SGE, something else?)?

When this happens, the job is stuck in Galaxy, it doesn't submit the job to SLURM queueing system.

Could you send the output of "squeue -u youruser" and (if possible) the logs from SLURM ? You can ask for them to your sysadmins, tell them the exact timestamp so that they can look it up easily. Perhaps the most useful logs are the ones you get when galaxy is running (out of ./run.sh output), paste them as well.

...

In your config file (universe_wsgi.ini), is track_jobs_in_database = True or False? Are you using SQLite or another database? If you watch

I am using PostgreSQL database system.

track_jobs_in_database is False in my config file. Which option is better when I have a queueing system?

the Galaxy log, are any exceptions or other errors logged when a job becomes stuck?

Galaxy log doesn't show any exception or errors.

> 2. Sometimes a job is in running state even though it is already finished > and restarting galaxy can make the job's state change from running to > finished.

Most likely, Galaxy is setting metadata on the job's outputs. You can probably speed up this process by setting:

set_metadata_externally = True

in the config file.

Thanks, --nate

> > I am wondering what's causing those issues and how they can be fixed. > > Thanks, > Luobin

>

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Luobin Yang

18 Aug 18 Aug

5:42 p.m.

New subject: [galaxy-bugs] Some issues when running jobs through drmaa job runner.

On Wed, Aug 17, 2011 at 2:59 AM, Roman Valls <brainstorm@nopcode.org> wrote:

...

On 2011-08-16 23:49, Luobin Yang wrote:

...
Hi, Nate,

On Tue, Aug 16, 2011 at 12:38 PM, Nate Coraor <nate@bx.psu.edu <mailto:nate@bx.psu.edu>> wrote:

Hi Luobin,

Sorry for the delay in response. I would suggest moving this discussion to the galaxy-dev mailing list since it does not contain any private data. The wide audience on galaxy-dev may be able to come up with additional ideas than what we on the Galaxy Team come up with.

Please see my responses inline below.

Luobin Yang wrote: > Hi, > > Thanks for Martin Dahlo's excellent blog ( > http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html) on making SLURM > work with GALAXY, I am able to run galaxy jobs on a cluster, but I've got a > couple of issues: > > 1. Sometimes a job is in waiting state and it won't start to run until I > restart galaxy.

When this happens, is the job stuck in Galaxy or in the queueing system (PBS, SGE, something else?)?

When this happens, the job is stuck in Galaxy, it doesn't submit the job to SLURM queueing system.

Could you send the output of "squeue -u youruser" and (if possible) the logs from SLURM ? You can ask for them to your sysadmins, tell them the exact timestamp so that they can look it up easily.

Perhaps the most useful logs are the ones you get when galaxy is running (out of ./run.sh output), paste them as well.

Hi, Roman, I already switched from SLURM to Torque PBS for our cluster for a while because I had to get a working system in a limited time. It turned out the PBS job runner in Galaxy doesn't have any such issues working with Torque PBS. Even though Torque PBS doesn't have some features that are available in SLURM, it works fine for our cluster at the moment. Thanks anyway!

...

...
In your config file (universe_wsgi.ini), is track_jobs_in_database = True or False? Are you using SQLite or another database? If you

watch

...
I am using PostgreSQL database system.

track_jobs_in_database is False in my config file. Which option is better when I have a queueing system?

the Galaxy log, are any exceptions or other errors logged when a job becomes stuck?

Galaxy log doesn't show any exception or errors.

> 2. Sometimes a job is in running state even though it is already finished > and restarting galaxy can make the job's state change from running

to

...
> finished.

Most likely, Galaxy is setting metadata on the job's outputs. You

can

...
probably speed up this process by setting:

set_metadata_externally = True

in the config file.

Thanks, --nate

> > I am wondering what's causing those issues and how they can be

fixed.

...
> > Thanks, > Luobin

>

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Nate Coraor

17 Aug 17 Aug

3:14 p.m.

New subject: [galaxy-bugs] Some issues when running jobs through drmaa job runner.

Luobin Yang wrote:

...

Hi, Nate,

On Tue, Aug 16, 2011 at 12:38 PM, Nate Coraor <nate@bx.psu.edu> wrote:

...
Hi Luobin,

Sorry for the delay in response. I would suggest moving this discussion to the galaxy-dev mailing list since it does not contain any private data. The wide audience on galaxy-dev may be able to come up with additional ideas than what we on the Galaxy Team come up with.

Please see my responses inline below.

Luobin Yang wrote:

...
Hi,

Thanks for Martin Dahlo's excellent blog ( http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html) on making SLURM work with GALAXY, I am able to run galaxy jobs on a cluster, but I've got a couple of issues:

1. Sometimes a job is in waiting state and it won't start to run until I restart galaxy.

When this happens, is the job stuck in Galaxy or in the queueing system (PBS, SGE, something else?)?

When this happens, the job is stuck in Galaxy, it doesn't submit the job to SLURM queueing system.

...
In your config file (universe_wsgi.ini), is track_jobs_in_database = True or False? Are you using SQLite or another database? If you watch

I am using PostgreSQL database system.

track_jobs_in_database is False in my config file. Which option is better when I have a queueing system?

I'd use 'True' for this, it allows you to run multiple Galaxy processes.

...

...
the Galaxy log, are any exceptions or other errors logged when a job becomes stuck?

Galaxy log doesn't show any exception or errors.

You'll probably have to add debugging statements to the JobQueue class in lib/galaxy/jobs/__init__.py, in this case. Presumably the "ready to run" check is returning JOB_WAIT. The heartbeat may also be helpful if the process is sticking, see enable_heartbeat = True in the config. --nate

...

...
...
2. Sometimes a job is in running state even though it is already finished and restarting galaxy can make the job's state change from running to finished.

Most likely, Galaxy is setting metadata on the job's outputs. You can probably speed up this process by setting:

set_metadata_externally = True

in the config file.

Thanks, --nate

...
I am wondering what's causing those issues and how they can be fixed.

Thanks, Luobin

...

5442

Age (days ago)

5444

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Luobin Yang
Nate Coraor
Roman Valls