Persistent jobs in cluster queue even after canceling job in galaxy

Hi All, I've been able to submit jobs to the cluster through galaxy, it works great. But when the job is in queue to run (it is gray in the galaxy history pane) and I cancel the job, it still remains in queue on the cluster. Why does this happen? How can I delete the jobs in queue as well? I tried qdel <job_id> as galaxy user but it says I am not authenticated to delete the job. Any help would be greatly appreciated. Thanks Ravi.

On Tue, Mar 11, 2014 at 5:57 PM, Ravi Alla <ravi.alla@berkeley.edu> wrote:
Hi All, I've been able to submit jobs to the cluster through galaxy, it works great. But when the job is in queue to run (it is gray in the galaxy history pane) and I cancel the job, it still remains in queue on the cluster. Why does this happen? How can I delete the jobs in queue as well? I tried qdel <job_id> as galaxy user but it says I am not authenticated to delete the job. Any help would be greatly appreciated. Thanks Ravi.
What kind of cluster is it? e.g. SGE? Are you using task splitting (parallelization)? Peter

Hi All, I have installed the recent version of galaxy and starting multiple web and job handlers(six each) on a Centos 5.1 machine. It is working almost perfectly. 1. The first problem is sometimes jobs never start and they are in grey stage. After I killed them and start again, they work fine. There are no any log about why they haven’t started. So, if you direct me to find the reason, it would be great. 2. Second problem is about UCSC visualization issue using the links from history. I haven’t found a solution yet, my XSendFile is all set and working in the previous version but not this one. The only difference is I am using the new version and balancer in Apache since I run multiple instances. So, balancer might not use XSendFile properly or there is a configuration problem.
http://lists.bx.psu.edu/pipermail/galaxy-user/2012-November/005508.html https://wiki.galaxyproject.org/Admin/Config/ApacheProxy
Let me know, if anybody has a similar settings and solved this problem with Apache. Best, Alper Kucukural, PhD Bioinformatics Core, University of Massachusetts Medical School 368 Plantation St.Room AS4.2067 Worcester, MA 01605-2324 Phone: 774-312-4493 E-mail: alper@kucukural.com

Hmm... unfortunately I have no particular guess about either of these issues. It sounds like you upgraded Galaxy and made large changes to your configuration at the same time. Can you try just one process and no load balancer so you can determine if the issue is with the Apache configuration or the latest version of Galaxy. If that works - can you just do one web process and one handler and see if the problems persist? The goal would be to scale up, but still figure out at what point the problems start. -John On Tue, Mar 11, 2014 at 1:30 PM, Alper Kucukural <alper@kucukural.com> wrote:
Hi All, I have installed the recent version of galaxy and starting multiple web and job handlers(six each) on a Centos 5.1 machine. It is working almost perfectly. 1. The first problem is sometimes jobs never start and they are in grey stage. After I killed them and start again, they work fine. There are no any log about why they haven't started. So, if you direct me to find the reason, it would be great.
2. Second problem is about UCSC visualization issue using the links from history. I haven't found a solution yet, my XSendFile is all set and working in the previous version but not this one. The only difference is I am using the new version and balancer in Apache since I run multiple instances. So, balancer might not use XSendFile properly or there is a configuration problem.
http://lists.bx.psu.edu/pipermail/galaxy-user/2012-November/005508.html https://wiki.galaxyproject.org/Admin/Config/ApacheProxy
Let me know, if anybody has a similar settings and solved this problem with Apache. Best,
Alper Kucukural, PhD Bioinformatics Core, University of Massachusetts Medical School 368 Plantation St.Room AS4.2067 Worcester, MA 01605-2324 Phone: 774-312-4493 E-mail: alper@kucukural.com
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

We've had the same problem since updating from the April 2013 stable to Feb 2014 stable. Our jobs are going off to a slurm cluster via the SlurmJobRunner plugin (though this was happening with the DRMAAJobRunner plugin too, if I remember right). Removing pending datasets occasionally removes the entry in Admin/Jobs, but not reliably, and regardless the jobs stay queued in slurm with no noise in galaxy.log at DEBUG or higher. We're not using parallelism either. I noticed this around the same time as I noticed the new in-progress animation in the history pane; perhaps there's an ajax-y callback that's not firing? Otherwise I'd expect to see something in the log about Galaxy at least trying to cancel the job. On Tue, Mar 11, 2014 at 10:57 AM, Ravi Alla <ravi.alla@berkeley.edu> wrote:
Hi All, I've been able to submit jobs to the cluster through galaxy, it works great. But when the job is in queue to run (it is gray in the galaxy history pane) and I cancel the job, it still remains in queue on the cluster. Why does this happen? How can I delete the jobs in queue as well? I tried qdel <job_id> as galaxy user but it says I am not authenticated to delete the job. Any help would be greatly appreciated. Thanks Ravi. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org

I believe this problem was fixed by Nate after the latest dist release and pushed to the stable branch of galaxy-central. https://bitbucket.org/galaxy/galaxy-central/commits/1298d3f6aca59825d0eb3d32... If you are eager for this bug fix, you can track the latest stable branch of galaxy-central instead of the galaxy-dist tag mentioned in the dev news. Right now it has some other good bug fixes not in the latest release. -John On Tue, Mar 11, 2014 at 9:53 PM, Brian Claywell <bclaywel@fhcrc.org> wrote:
We've had the same problem since updating from the April 2013 stable to Feb 2014 stable. Our jobs are going off to a slurm cluster via the SlurmJobRunner plugin (though this was happening with the DRMAAJobRunner plugin too, if I remember right). Removing pending datasets occasionally removes the entry in Admin/Jobs, but not reliably, and regardless the jobs stay queued in slurm with no noise in galaxy.log at DEBUG or higher. We're not using parallelism either.
I noticed this around the same time as I noticed the new in-progress animation in the history pane; perhaps there's an ajax-y callback that's not firing? Otherwise I'd expect to see something in the log about Galaxy at least trying to cancel the job.
On Tue, Mar 11, 2014 at 10:57 AM, Ravi Alla <ravi.alla@berkeley.edu> wrote:
Hi All, I've been able to submit jobs to the cluster through galaxy, it works great. But when the job is in queue to run (it is gray in the galaxy history pane) and I cancel the job, it still remains in queue on the cluster. Why does this happen? How can I delete the jobs in queue as well? I tried qdel <job_id> as galaxy user but it says I am not authenticated to delete the job. Any help would be greatly appreciated. Thanks Ravi. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

On Fri, Mar 14, 2014 at 9:01 AM, John Chilton <jmchilton@gmail.com> wrote:
I believe this problem was fixed by Nate after the latest dist release and pushed to the stable branch of galaxy-central.
https://bitbucket.org/galaxy/galaxy-central/commits/1298d3f6aca59825d0eb3d32...
If you are eager for this bug fix, you can track the latest stable branch of galaxy-central instead of the galaxy-dist tag mentioned in the dev news. Right now it has some other good bug fixes not in the latest release.
Ah, got it, thanks! Is it unfeasible to push bug fixes like those back to galaxy-dist/stable so those of us that would prefer stable to bleeding-edge don't have to cherry-pick commits? -- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org

What are the best practices for updating galaxy? Also is there a quick command I can run to see what version of galaxy I am running? On Mar 14, 2014, at 12:23 PM, Brian Claywell <bclaywel@fhcrc.org> wrote:
On Fri, Mar 14, 2014 at 9:01 AM, John Chilton <jmchilton@gmail.com> wrote:
I believe this problem was fixed by Nate after the latest dist release and pushed to the stable branch of galaxy-central.
https://bitbucket.org/galaxy/galaxy-central/commits/1298d3f6aca59825d0eb3d32...
If you are eager for this bug fix, you can track the latest stable branch of galaxy-central instead of the galaxy-dist tag mentioned in the dev news. Right now it has some other good bug fixes not in the latest release.
Ah, got it, thanks! Is it unfeasible to push bug fixes like those back to galaxy-dist/stable so those of us that would prefer stable to bleeding-edge don't have to cherry-pick commits?
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org

Erg... I am pretty ignorant about mercurial so I should probably not respond to this but I will try. It is pretty common practice for the Galaxy team to push bug fixes to the last release to the stable branch of galaxy-central - which is very different than the default branch of galaxy-central which contains active development. These don't go out to galaxy-dist automatically to prevent the need to strip truly egregious stuff out of the stable branch that the galaxy-dev news says to target. A quirk of this however is that the stable branch of galaxy-central is actually a good deal more stable the stable branch of galaxy-dist. It is what usegalaxy.org targets and at least a few other high profile Galaxy maintainers have caught on to this trick as well. I think you can update (or merge) the latest stable branch by doing something like the following: hg pull https://bitbucket.org/galaxy/galaxy-central#stable hg update stable We should probably do a better job of keeping the stable branch of galaxy-dist up-to-date - but right now we just push out updates at releases and for major security issues as far as I know. -John On Fri, Mar 14, 2014 at 2:23 PM, Brian Claywell <bclaywel@fhcrc.org> wrote:
On Fri, Mar 14, 2014 at 9:01 AM, John Chilton <jmchilton@gmail.com> wrote:
I believe this problem was fixed by Nate after the latest dist release and pushed to the stable branch of galaxy-central.
https://bitbucket.org/galaxy/galaxy-central/commits/1298d3f6aca59825d0eb3d32...
If you are eager for this bug fix, you can track the latest stable branch of galaxy-central instead of the galaxy-dist tag mentioned in the dev news. Right now it has some other good bug fixes not in the latest release.
Ah, got it, thanks! Is it unfeasible to push bug fixes like those back to galaxy-dist/stable so those of us that would prefer stable to bleeding-edge don't have to cherry-pick commits?
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org
participants (5)
-
Alper Kucukural
-
Brian Claywell
-
John Chilton
-
Peter Cock
-
Ravi Alla