Cloudman/Galaxy autoscaling

Jim McCusker

25 Apr 2014 25 Apr '14

2:37 p.m.

What are the exact conditions that will trigger autoscaling in a galaxy/cloudman instance? I'm seeing 100% utilization on the head node but no attempts by cloudman at spinning up new instances. What sort of state is supposed to trigger that? Thanks, Jim

Attachments:

attachment.htm (text/html — 315 bytes)

Show replies by date

Dannon Baker

25 Apr 25 Apr

2:43 p.m.

You can review the exact code here( see 'slow_job_turnover') : https://bitbucket.org/galaxy/cloudman/src/7b8f04895ad309e0168cb3de66446ae20f... But, basically, load on any particular node isn't very useful for autoscaling in this context because most jobs cannot be split among multiple nodes. What we use is a heristic to determine churn and jobs waiting to run. Generally speaking, if you have jobs waiting to run for a bit, your cluster should scale. -Dannon On Fri, Apr 25, 2014 at 2:37 PM, Jim McCusker <jmccusker@5amsolutions.com>wrote:

...

What are the exact conditions that will trigger autoscaling in a galaxy/cloudman instance? I'm seeing 100% utilization on the head node but no attempts by cloudman at spinning up new instances. What sort of state is supposed to trigger that?

Thanks, Jim

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Jim McCusker

3:01 p.m.

Thanks, is the idle time configurable? On Fri, Apr 25, 2014 at 2:43 PM, Dannon Baker <dannon.baker@gmail.com>wrote:

...

You can review the exact code here( see 'slow_job_turnover') : https://bitbucket.org/galaxy/cloudman/src/7b8f04895ad309e0168cb3de66446ae20f...

But, basically, load on any particular node isn't very useful for autoscaling in this context because most jobs cannot be split among multiple nodes. What we use is a heristic to determine churn and jobs waiting to run. Generally speaking, if you have jobs waiting to run for a bit, your cluster should scale.

-Dannon

On Fri, Apr 25, 2014 at 2:37 PM, Jim McCusker <jmccusker@5amsolutions.com>wrote:

...
What are the exact conditions that will trigger autoscaling in a galaxy/cloudman instance? I'm seeing 100% utilization on the head node but no attempts by cloudman at spinning up new instances. What sort of state is supposed to trigger that?

Thanks, Jim

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Dannon Baker

28 Apr 28 Apr

2:05 p.m.

The time after which it culls instances? It is not, but because of the way Amazon bills for instances, you never want to kill an instance until the end of the hour (since, regardless of when you kill an instance, you're billed for the remainder of the hour). -Dannon On Fri, Apr 25, 2014 at 3:01 PM, Jim McCusker <jmccusker@5amsolutions.com>wrote:

...

Thanks, is the idle time configurable?

On Fri, Apr 25, 2014 at 2:43 PM, Dannon Baker <dannon.baker@gmail.com>wrote:

...
You can review the exact code here( see 'slow_job_turnover') : https://bitbucket.org/galaxy/cloudman/src/7b8f04895ad309e0168cb3de66446ae20f...

But, basically, load on any particular node isn't very useful for autoscaling in this context because most jobs cannot be split among multiple nodes. What we use is a heristic to determine churn and jobs waiting to run. Generally speaking, if you have jobs waiting to run for a bit, your cluster should scale.

-Dannon

On Fri, Apr 25, 2014 at 2:37 PM, Jim McCusker <jmccusker@5amsolutions.com

...
wrote:

...
What are the exact conditions that will trigger autoscaling in a galaxy/cloudman instance? I'm seeing 100% utilization on the head node but no attempts by cloudman at spinning up new instances. What sort of state is supposed to trigger that?

Thanks, Jim

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Jim McCusker

2:19 p.m.

I was thinking of the job idle time, where we spin up more instances if jobs sit waiting for, say, 15 seconds instead of 60. Jim On Mon, Apr 28, 2014 at 2:05 PM, Dannon Baker <dannon.baker@gmail.com>wrote:

...

The time after which it culls instances? It is not, but because of the way Amazon bills for instances, you never want to kill an instance until the end of the hour (since, regardless of when you kill an instance, you're billed for the remainder of the hour).

-Dannon

On Fri, Apr 25, 2014 at 3:01 PM, Jim McCusker <jmccusker@5amsolutions.com>wrote:

...
Thanks, is the idle time configurable?

On Fri, Apr 25, 2014 at 2:43 PM, Dannon Baker <dannon.baker@gmail.com>wrote:

...
You can review the exact code here( see 'slow_job_turnover') : https://bitbucket.org/galaxy/cloudman/src/7b8f04895ad309e0168cb3de66446ae20f...

But, basically, load on any particular node isn't very useful for autoscaling in this context because most jobs cannot be split among multiple nodes. What we use is a heristic to determine churn and jobs waiting to run. Generally speaking, if you have jobs waiting to run for a bit, your cluster should scale.

-Dannon

On Fri, Apr 25, 2014 at 2:37 PM, Jim McCusker < jmccusker@5amsolutions.com> wrote:

...
What are the exact conditions that will trigger autoscaling in a galaxy/cloudman instance? I'm seeing 100% utilization on the head node but no attempts by cloudman at spinning up new instances. What sort of state is supposed to trigger that?

Thanks, Jim

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Dannon Baker

2:33 p.m.

Ahh, ok. No, that isn't exposed currently, though we've talked about better autoscaling control before and I can see how it would be valuable. If this is something you'd want to change in the near term, you could always run a custom version of cloudman (launching via your own bucket instead of our primary cloudman one). I'll go ahead and add it to the Trello board as a future improvement idea, though. -Dannon On Mon, Apr 28, 2014 at 2:19 PM, Jim McCusker <jmccusker@5amsolutions.com>wrote:

...

I was thinking of the job idle time, where we spin up more instances if jobs sit waiting for, say, 15 seconds instead of 60.

Jim

On Mon, Apr 28, 2014 at 2:05 PM, Dannon Baker <dannon.baker@gmail.com>wrote:

...
The time after which it culls instances? It is not, but because of the way Amazon bills for instances, you never want to kill an instance until the end of the hour (since, regardless of when you kill an instance, you're billed for the remainder of the hour).

-Dannon

On Fri, Apr 25, 2014 at 3:01 PM, Jim McCusker <jmccusker@5amsolutions.com

...
wrote:

...
Thanks, is the idle time configurable?

On Fri, Apr 25, 2014 at 2:43 PM, Dannon Baker <dannon.baker@gmail.com>wrote:

...
You can review the exact code here( see 'slow_job_turnover') : https://bitbucket.org/galaxy/cloudman/src/7b8f04895ad309e0168cb3de66446ae20f...

But, basically, load on any particular node isn't very useful for autoscaling in this context because most jobs cannot be split among multiple nodes. What we use is a heristic to determine churn and jobs waiting to run. Generally speaking, if you have jobs waiting to run for a bit, your cluster should scale.

-Dannon

On Fri, Apr 25, 2014 at 2:37 PM, Jim McCusker < jmccusker@5amsolutions.com> wrote:

...
What are the exact conditions that will trigger autoscaling in a galaxy/cloudman instance? I'm seeing 100% utilization on the head node but no attempts by cloudman at spinning up new instances. What sort of state is supposed to trigger that?

Thanks, Jim

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

4242

Age (days ago)

4245

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Dannon Baker
Jim McCusker