CloudMan: Autoscaling =Unable to run this job due to a cluster error
Hello Galaxy Team! I've set up a vanilla CloudMan instance using AWS. I've set the head node to not handle jobs and have set AutoScaling to a minimum of 0 and a maximum of 4 worker nodes. Upon submitting a job, it fails with the following error: Unable to run this job due to a cluster error, please retry it later Now if I set AutoScaling to a minimum of 1 worker node it works fine. But I would rather not always have 1 worker node up. Any recommendations? Thank you very much, Robert ________________________________ This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited. If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments).
Hi Robert, Can you check on CloudMan's Admin page if the following is shown (towards the bottom of the page): "Switch master to not run jobs" If it says, "Switch master to run jobs", just click on the text and the master will become an execution host on the cluster, which should then give Galaxy some resources to run the jobs on. Sorry for the trouble and let us know if this does not fix it, Enis On Wed, Oct 8, 2014 at 6:34 PM, Petit III, Robert A. <robert.petit@emory.edu
wrote:
Hello Galaxy Team!
I've set up a vanilla CloudMan instance using AWS. I've set the head node to not handle jobs and have set AutoScaling to a minimum of 0 and a maximum of 4 worker nodes.
Upon submitting a job, it fails with the following error: Unable to run this job due to a cluster error, please retry it later
Now if I set AutoScaling to a minimum of 1 worker node it works fine. But I would rather not always have 1 worker node up.
Any recommendations?
Thank you very much, Robert
------------------------------
This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited.
If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments).
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Enis, I have indeed set the master node to not run jobs. Does there has to be at least one execution host (either master or always on worker) auto scaling to work? Thanks for the help, Robert ________________________________ From: Enis Afgan [afgane@gmail.com] Sent: Thursday, October 09, 2014 11:30 AM To: Petit III, Robert A. Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] CloudMan: Autoscaling =Unable to run this job due to a cluster error Hi Robert, Can you check on CloudMan's Admin page if the following is shown (towards the bottom of the page): "Switch master to not run jobs" If it says, "Switch master to run jobs", just click on the text and the master will become an execution host on the cluster, which should then give Galaxy some resources to run the jobs on. Sorry for the trouble and let us know if this does not fix it, Enis On Wed, Oct 8, 2014 at 6:34 PM, Petit III, Robert A. <robert.petit@emory.edu<mailto:robert.petit@emory.edu>> wrote: Hello Galaxy Team! I've set up a vanilla CloudMan instance using AWS. I've set the head node to not handle jobs and have set AutoScaling to a minimum of 0 and a maximum of 4 worker nodes. Upon submitting a job, it fails with the following error: Unable to run this job due to a cluster error, please retry it later Now if I set AutoScaling to a minimum of 1 worker node it works fine. But I would rather not always have 1 worker node up. Any recommendations? Thank you very much, Robert ________________________________ This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited. If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments). ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
There needs to be at least one exec host for jobs to actually run, irrespective of whether auto-scaling is on or off. In an attempt to offload work from the master instance when there are workers, CloudMan will automatically toggle the master as an exec host a the first worker is added or the last one removed. However, I'm starting to think there's a bug in how that's implemented. If this causes you more problems, let us know. On Thu, Oct 9, 2014 at 11:35 AM, Petit III, Robert A. < robert.petit@emory.edu> wrote:
Hi Enis,
I have indeed set the master node to not run jobs. Does there has to be at least one execution host (either master or always on worker) auto scaling to work?
Thanks for the help, Robert ------------------------------ *From:* Enis Afgan [afgane@gmail.com] *Sent:* Thursday, October 09, 2014 11:30 AM *To:* Petit III, Robert A. *Cc:* galaxy-dev@lists.bx.psu.edu *Subject:* Re: [galaxy-dev] CloudMan: Autoscaling =Unable to run this job due to a cluster error
Hi Robert, Can you check on CloudMan's Admin page if the following is shown (towards the bottom of the page): "Switch master to not run jobs" If it says, "Switch master to run jobs", just click on the text and the master will become an execution host on the cluster, which should then give Galaxy some resources to run the jobs on.
Sorry for the trouble and let us know if this does not fix it, Enis
On Wed, Oct 8, 2014 at 6:34 PM, Petit III, Robert A. < robert.petit@emory.edu> wrote:
Hello Galaxy Team!
I've set up a vanilla CloudMan instance using AWS. I've set the head node to not handle jobs and have set AutoScaling to a minimum of 0 and a maximum of 4 worker nodes.
Upon submitting a job, it fails with the following error: Unable to run this job due to a cluster error, please retry it later
Now if I set AutoScaling to a minimum of 1 worker node it works fine. But I would rather not always have 1 worker node up.
Any recommendations?
Thank you very much, Robert
------------------------------
This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited.
If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments).
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Enis Afgan
-
Petit III, Robert A.