On Mon, Jun 22, 2015 at 4:45 PM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk> wrote:

Dear Enis,

Thanks for your help with this. I can confirm that if I use a password that doesn’t start with an exclamation mark the cloudman adding nodes works fine, it can contain an ! but just not start with one.

Knowing where to look logs-wise will really help – is there a schematic at all somewhere that shows the Cloudman startup procedure? I’m working on one for my own understanding but I’m a firm believer in not reinventing the wheel.

Best wishes,

Nick

--

Nick Dickens

DPhil BSc ARCS

Bioinformatics Team Leader

Wellcome Trust Centre for Molecular Parasitology

B6-21 SGDB

120 University Place

Glasgow

G12 8TA

Tel: +44 141 330 8282

http://fb.me/WTCMPbix

@WTCMPbix

http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/

http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativegenomicsofleishmania/

From: Nick Dickens <nick.dickens@glasgow.ac.uk>
Date: Monday, 22 June 2015 17:02
To: Enis Afgan <enis.afgan@irb.hr>
Cc: "galaxy-dev@lists.galaxyproject.org" <galaxy-dev@lists.galaxyproject.org>
Subject: Re: [galaxy-dev] Cloudman cluster not starting workers

Aha - so neither /tmp/cm/cm_boot.py.log nor /mnt/cm/paster.log exist, but the ec2autorun.log showed a reasonable error, I used a setup password beginning with an exclamation mark, which it seems not to like. I realised that I accidentally posted this to the list previously so don't worry about it being here I've killed that particular cluster. When I was trying different configurations, etc I also consistently used the same password format (which I will no longer use now I posted it to a public mail list like a moron).

I assume since ec2autorun is first in the bootstrap setup if it fails then so does everything else. I'll try it with a different password format - and get back to you (I have a meeting just now). But this looks like an issue with the password format to me...and possibly a bug in the script?

Best wishes,

Nick

[INFO] ec2autorun:57 2015-06-22 15:38:42,207: Getting user data from 'http://169.254.169.254/latest/user-data', attempt 0
[DEBUG] ec2autorun:61 2015-06-22 15:38:42,210: Saving user data in its original format to file '/tmp/cm/original_userData.yaml'
[DEBUG] ec2autorun:65 2015-06-22 15:38:42,211: Got user data
[INFO] ec2autorun:416 2015-06-22 15:38:42,211: Handling user data in YAML format.
Traceback (most recent call last):
File "/usr/bin/ec2autorun.py", line 516, in <module>
    main()
File "/usr/bin/ec2autorun.py", line 512, in main
    _parse_user_data(ud)
File "/usr/bin/ec2autorun.py", line 504, in _parse_user_data
    _handle_yaml(ud)
File "/usr/bin/ec2autorun.py", line 417, in _handle_yaml
    ud = _load_user_data(user_data)
File "/usr/bin/ec2autorun.py", line 402, in _load_user_data
    ud = yaml.load(user_data)
File "/usr/lib/python2.7/dist-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 48, in construct_document
    for dummy in generator:
File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 208, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 133, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 88, in construct_object
    data = constructor(self, node)
File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 414, in construct_undefined
    node.start_mark)
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!galaxySATDEVZGK'
in "<string>", line 4, column 13:
    freenxpass: !galaxySATDEVZGK

On 22/06/15 14:45, Enis Afgan wrote:

Hmm - /mnt definitely should not be empty. There's nothing unusual in the log you sent so could you please send me the one from the worker? It's in the same location (/mnt/cm/paster.log)
If it's not there, please track the boot procedure logs as follows and can send those logs:

1. /usr/bin/ec2autorun.log

2. /tmp/cm/cm_boot.py.log

3. /mnt/cm/paster.log

Thanks,

Enis

On Fri, Jun 19, 2015 at 5:02 PM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk> wrote:

Thanks – I’ve attached the log. I just tried to start a worker and let it go to the first reboot and then copied this log. I logged into the worker and it looks ok (dmesg, etc) the only noticable thing was /mnt is empty (just a lost+found directory) and I was expecting to see an nfs mount for galaxy export or something. But I’m still finding my way round the system. It may also have been the time in the reboot cycle that I was there.

Best wishes,

Nick

--

Nick Dickens

DPhil BSc ARCS

Bioinformatics Team Leader

Wellcome Trust Centre for Molecular Parasitology

B6-21 SGDB

120 University Place

Glasgow

G12 8TA

Tel: +44 141 330 8282

http://fb.me/WTCMPbix

@WTCMPbix

http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/

http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativegenomicsofleishmania/

From: Enis Afgan <enis.afgan@irb.hr>
Date: Friday, 19 June 2015 17:00
To: Nick Dickens <nick.dickens@glasgow.ac.uk>
Cc: "galaxy-dev@lists.galaxyproject.org" <galaxy-dev@lists.galaxyproject.org>
Subject: Re: [spam?] [galaxy-dev] Cloudman cluster not starting workers

Hi Nick,
Sorry to hear you're having trouble. I just tried a couple of scenarios and they all worked as expected (e.g., with and without elastic IPs, different instance types).

The main CloudMan log is located in /mnt/cm/paster.log, on both master and worker instances (if you didn't download the ssh key from cloudlaunch, you can ssh with ubuntu username and the same password as provided on the cloudlaunch form). The log is also available from the UI if you go to Admin page and then click 'Show CloudMan log' under 'System controls'. If you can share that, we can hopefully figure out what's going on.

Best,

Enis

On Wed, Jun 17, 2015 at 11:47 AM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk> wrote:

Hi All,

First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance".

It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc).

I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy.

Thanks for your help and best wishes,

Nick

--

Nick Dickens

DPhil BSc ARCS

Bioinformatics Team Leader

Wellcome Trust Centre for Molecular Parasitology

B6-21 SGDB

120 University Place

Glasgow

G12 8TA

Tel: +44 141 330 8282

http://fb.me/WTCMPbix

@WTCMPbix

http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/

http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativegenomicsofleishmania/

----------------------------- Upozorenje -----------------------------

Automatskom detekcijom utvrdjeno je da tekst ove poruke
podsjeca na tzv. phishing poruku.

AKO SE U PORUCI TRAZI DA POSALJETE VASU IRB LOZINKU ILI
DA UNESETE IRB PODATKE NA NAVEDENOM LINKU, RADI SE O
NAPADU S CILJEM KRADJE I ZLOUPOTREBE PODATAKA.

Centar za informatiku i racunarstvo,
Institut Rudjer Boskovic

----------------------------- Upozorenje -----------------------------

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/

-- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282 http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativegenomicsofleishmania/