Re: [galaxy-dev] [galaxy-user] Using Galaxy Cloudman for a workshop

1 Dec 2011


      Right! I did think to look for a 'share this cluster' command, I just
failed to find it. It all makes sense now, thanks.

On Thu, Dec 1, 2011 at 7:34 PM, Enis Afgan <eafgan@emory.edu> wrote:
...
Hi Clare,
The share string is generated when you share a cluster. The string is
accessible on the shared cluster, when you click the green 'Share a cluster'
icon next to the cluster name and then the top link "Shared instances". You
will get a list of the point in time shares of the cluster you have created.
The share string will look something like
this cm-cd53Bfg6f1223f966914df347687f6uf32/shared/2011-10-19--03-14
You simply paste that string into new cluster box you mentioned.
Enis
On Thu, Dec 1, 2011 at 6:31 AM, Clare Sloggett <sloc@unimelb.edu.au> wrote:
...
Hi Enis, Jeremy, and all,
Thanks so much for all your help. I have another question which I
suspect is just me missing something obvious.
I'm guessing that when you cloned the cluster for your workshop, you
used CloudMan's 'share-an-instance' functionality?
When I launch a new cluster which I want to be a copy of an existing
cluster, and select the share-an-instance option, it asks for the
"cluster share-string". How can I find this string for my existing
cluster?
Or have I got completely the wrong idea - did you actually clone the
instance using AWS functionality?
Thanks,
Clare
On Mon, Nov 21, 2011 at 5:37 PM, Enis Afgan <eafgan@emory.edu> wrote:
...
Hi Clare,
I don't recall what instance type we used earlier, but I think an Extra
Large Instance is going to be fine. Do note that the master node is also
being used to run jobs. However, if it's loaded by just the web server,
SGE
will typically just not schedule jobs to it.
As far as the core/thread/slot concerns goes, SGE sees each core as a
slot.
Each job in Galaxy simply requires 1 slot, even if it uses multiple
threads
(i.e., cores). What this means is that nodes will probably get
overloaded if
only the same type of job is being run (BWA), but if analyses are being
run
that use multiple tools, jobs will get spread over the cluster to
balance
the overal load a bit better than by simply looking at the number of
slots.
Enis
On Mon, Nov 21, 2011 at 4:34 AM, Clare Sloggett <sloc@unimelb.edu.au>
wrote:
...
Hi Jeremy,
Also if you do remember what kind of Amazon node you used,
particularly for the cluster's master node (e.g. an 'xlarge' 4-core
15GB or perhaps one of the 'high-memory' nodes?), that would be a
reassuring sanity chech for me!
Cheers,
Clare
On Mon, Nov 21, 2011 at 10:37 AM, Clare Sloggett <sloc@unimelb.edu.au>
wrote:
...
Hi Jeremy, Enis,
That makes sense. I know I can configure how many threads BWA uses in
its wrapper, with bwa -t. But, is there somewhere that I need to tell
Galaxy the corresponding information, ie that this command-line task
will make use of up to 4 cores?
Or, does this imply that there is always exactly one job per node? So
if I have (for instance) a cluster made of 4-core nodes, and a
single-threaded task (e.g. samtools), are the other 3 cores just
going
to waste or will the scheduler allocate multiple single-threaded jobs
to one node?
I've cc'd galaxy-dev instead of galaxy-user as I think the
conversation has gone that way!
Thanks again,
Clare
On Fri, Nov 18, 2011 at 2:36 PM, Jeremy Goecks
<jeremy.goecks@emory.edu>
wrote:
...
> On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks
> <jeremy.goecks@emory.edu> wrote:
>
>> Scalability issues are more likely to arise on the back end than
>> the
>> front end, so you'll want to ensure that you have enough compute
>> nodes. BWA
>> uses four nodes by default--Enis, does the cloud config change
>> this
>> parameter?--so you'll want 4x50 or 200 total nodes if you want
>> everyone to
>> be able to run a BWA job simultaneously.
>>
>
> Actually, one other question - this paragraph makes me realise that
> I
> don't really understand how Galaxy is distributing jobs. I had
> thought
> that each job would only use one node, and in some cases take
> advantage of multiple cores within that node. I'm taking a "node"
> to
> be a set of cores with their own shared memory, so in this case a
> VM
> instance, is this right? If some types of jobs can be distributed
> over
> multiple nodes, can I configure, in Galaxy, how many nodes they
> should
> use?
You're right -- my word choices were poor. Replace 'node' with
'core'
in my paragraph to get an accurate suggestion for resources.
Galaxy uses a job scheduler--SGE on the cloud--to distribute jobs to
different cluster nodes. Jobs that require multiple cores typically
run on a
single node. Enis can chime in on whether CloudMan supports job
submission
over multiple nodes; this would require setup of an appropriate
parallel
environment and a tool that can make use of this environment.
Good luck,
J.
--
E: sloc@unimelb.edu.au
P: 03 903 53357
M: 0414 854 759
--
E: sloc@unimelb.edu.au
P: 03 903 53357
M: 0414 854 759
--
E: sloc@unimelb.edu.au
P: 03 903 53357
M: 0414 854 759
-- 
E: sloc@unimelb.edu.au
P: 03 903 53357
M: 0414 854 759