Cloudman cluster not starting workers
Hi All, First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance". It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc). I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy. Thanks for your help and best wishes, Nick -- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282 http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
Hi Nick, Sorry to hear you're having trouble. I just tried a couple of scenarios and they all worked as expected (e.g., with and without elastic IPs, different instance types). The main CloudMan log is located in /mnt/cm/paster.log, on both master and worker instances (if you didn't download the ssh key from cloudlaunch, you can ssh with ubuntu username and the same password as provided on the cloudlaunch form). The log is also available from the UI if you go to Admin page and then click 'Show CloudMan log' under 'System controls'. If you can share that, we can hopefully figure out what's going on. Best, Enis On Wed, Jun 17, 2015 at 11:47 AM, Nicholas Dickens < Nick.Dickens@glasgow.ac.uk> wrote:
Hi All,
First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance".
It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc).
I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy.
Thanks for your help and best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/
http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
----------------------------- Upozorenje -----------------------------
Automatskom detekcijom utvrdjeno je da tekst ove poruke podsjeca na tzv. phishing poruku.
AKO SE U PORUCI TRAZI DA POSALJETE VASU IRB LOZINKU ILI DA UNESETE IRB PODATKE NA NAVEDENOM LINKU, RADI SE O NAPADU S CILJEM KRADJE I ZLOUPOTREBE PODATAKA.
Centar za informatiku i racunarstvo, Institut Rudjer Boskovic
----------------------------- Upozorenje -----------------------------
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Thanks – I’ve attached the log. I just tried to start a worker and let it go to the first reboot and then copied this log. I logged into the worker and it looks ok (dmesg, etc) the only noticable thing was /mnt is empty (just a lost+found directory) and I was expecting to see an nfs mount for galaxy export or something. But I’m still finding my way round the system. It may also have been the time in the reboot cycle that I was there. Best wishes, Nick -- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282 http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg... From: Enis Afgan <enis.afgan@irb.hr<mailto:enis.afgan@irb.hr>> Date: Friday, 19 June 2015 17:00 To: Nick Dickens <nick.dickens@glasgow.ac.uk<mailto:nick.dickens@glasgow.ac.uk>> Cc: "galaxy-dev@lists.galaxyproject.org<mailto:galaxy-dev@lists.galaxyproject.org>" <galaxy-dev@lists.galaxyproject.org<mailto:galaxy-dev@lists.galaxyproject.org>> Subject: Re: [spam?] [galaxy-dev] Cloudman cluster not starting workers Hi Nick, Sorry to hear you're having trouble. I just tried a couple of scenarios and they all worked as expected (e.g., with and without elastic IPs, different instance types). The main CloudMan log is located in /mnt/cm/paster.log, on both master and worker instances (if you didn't download the ssh key from cloudlaunch, you can ssh with ubuntu username and the same password as provided on the cloudlaunch form). The log is also available from the UI if you go to Admin page and then click 'Show CloudMan log' under 'System controls'. If you can share that, we can hopefully figure out what's going on. Best, Enis On Wed, Jun 17, 2015 at 11:47 AM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk<mailto:Nick.Dickens@glasgow.ac.uk>> wrote: Hi All, First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance". It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc). I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy. Thanks for your help and best wishes, Nick -- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282<tel:%2B44%20141%20330%208282> http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg... ----------------------------- Upozorenje ----------------------------- Automatskom detekcijom utvrdjeno je da tekst ove poruke podsjeca na tzv. phishing poruku. AKO SE U PORUCI TRAZI DA POSALJETE VASU IRB LOZINKU ILI DA UNESETE IRB PODATKE NA NAVEDENOM LINKU, RADI SE O NAPADU S CILJEM KRADJE I ZLOUPOTREBE PODATAKA. Centar za informatiku i racunarstvo, Institut Rudjer Boskovic ----------------------------- Upozorenje ----------------------------- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hmm - /mnt definitely should not be empty. There's nothing unusual in the log you sent so could you please send me the one from the worker? It's in the same location (/mnt/cm/paster.log) If it's not there, please track the boot procedure logs as follows and can send those logs: 1. /usr/bin/ec2autorun.log 2. /tmp/cm/cm_boot.py.log 3. /mnt/cm/paster.log Thanks, Enis On Fri, Jun 19, 2015 at 5:02 PM, Nicholas Dickens < Nick.Dickens@glasgow.ac.uk> wrote:
Thanks – I’ve attached the log. I just tried to start a worker and let it go to the first reboot and then copied this log. I logged into the worker and it looks ok (dmesg, etc) the only noticable thing was /mnt is empty (just a lost+found directory) and I was expecting to see an nfs mount for galaxy export or something. But I’m still finding my way round the system. It may also have been the time in the reboot cycle that I was there.
Best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/
http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
From: Enis Afgan <enis.afgan@irb.hr> Date: Friday, 19 June 2015 17:00 To: Nick Dickens <nick.dickens@glasgow.ac.uk> Cc: "galaxy-dev@lists.galaxyproject.org" < galaxy-dev@lists.galaxyproject.org> Subject: Re: [spam?] [galaxy-dev] Cloudman cluster not starting workers
Hi Nick, Sorry to hear you're having trouble. I just tried a couple of scenarios and they all worked as expected (e.g., with and without elastic IPs, different instance types).
The main CloudMan log is located in /mnt/cm/paster.log, on both master and worker instances (if you didn't download the ssh key from cloudlaunch, you can ssh with ubuntu username and the same password as provided on the cloudlaunch form). The log is also available from the UI if you go to Admin page and then click 'Show CloudMan log' under 'System controls'. If you can share that, we can hopefully figure out what's going on.
Best, Enis
On Wed, Jun 17, 2015 at 11:47 AM, Nicholas Dickens < Nick.Dickens@glasgow.ac.uk> wrote:
Hi All,
First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance".
It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc).
I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy.
Thanks for your help and best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/
http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
----------------------------- Upozorenje -----------------------------
Automatskom detekcijom utvrdjeno je da tekst ove poruke podsjeca na tzv. phishing poruku.
AKO SE U PORUCI TRAZI DA POSALJETE VASU IRB LOZINKU ILI DA UNESETE IRB PODATKE NA NAVEDENOM LINKU, RADI SE O NAPADU S CILJEM KRADJE I ZLOUPOTREBE PODATAKA.
Centar za informatiku i racunarstvo, Institut Rudjer Boskovic
----------------------------- Upozorenje -----------------------------
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Aha - so neither /tmp/cm/cm_boot.py.log nor /mnt/cm/paster.log exist, but the ec2autorun.log showed a reasonable error, I used a setup password beginning with an exclamation mark, which it seems not to like. I realised that I accidentally posted this to the list previously so don't worry about it being here I've killed that particular cluster. When I was trying different configurations, etc I also consistently used the same password format (which I will no longer use now I posted it to a public mail list like a moron). I assume since ec2autorun is first in the bootstrap setup if it fails then so does everything else. I'll try it with a different password format - and get back to you (I have a meeting just now). But this looks like an issue with the password format to me...and possibly a bug in the script? Best wishes, Nick [INFO] ec2autorun:57 2015-06-22 15:38:42,207: Getting user data from 'http://169.254.169.254/latest/user-data', attempt 0 [DEBUG] ec2autorun:61 2015-06-22 15:38:42,210: Saving user data in its original format to file '/tmp/cm/original_userData.yaml' [DEBUG] ec2autorun:65 2015-06-22 15:38:42,211: Got user data [INFO] ec2autorun:416 2015-06-22 15:38:42,211: Handling user data in YAML format. Traceback (most recent call last): File "/usr/bin/ec2autorun.py", line 516, in <module> main() File "/usr/bin/ec2autorun.py", line 512, in main _parse_user_data(ud) File "/usr/bin/ec2autorun.py", line 504, in _parse_user_data _handle_yaml(ud) File "/usr/bin/ec2autorun.py", line 417, in _handle_yaml ud = _load_user_data(user_data) File "/usr/bin/ec2autorun.py", line 402, in _load_user_data ud = yaml.load(user_data) File "/usr/lib/python2.7/dist-packages/yaml/__init__.py", line 71, in load return loader.get_single_data() File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 39, in get_single_data return self.construct_document(node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 48, in construct_document for dummy in generator: File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map value = self.construct_mapping(node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 208, in construct_mapping return BaseConstructor.construct_mapping(self, node, deep=deep) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 133, in construct_mapping value = self.construct_object(value_node, deep=deep) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 88, in construct_object data = constructor(self, node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 414, in construct_undefined node.start_mark) yaml.constructor.ConstructorError: could not determine a constructor for the tag '!galaxySATDEVZGK' in "<string>", line 4, column 13: freenxpass: !galaxySATDEVZGK On 22/06/15 14:45, Enis Afgan wrote:
Hmm - /mnt definitely should not be empty. There's nothing unusual in the log you sent so could you please send me the one from the worker? It's in the same location (/mnt/cm/paster.log) If it's not there, please track the boot procedure logs as follows and can send those logs: 1. /usr/bin/ec2autorun.log 2. /tmp/cm/cm_boot.py.log 3. /mnt/cm/paster.log
Thanks, Enis
On Fri, Jun 19, 2015 at 5:02 PM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk <mailto:Nick.Dickens@glasgow.ac.uk>> wrote:
Thanks – I’ve attached the log. I just tried to start a worker and let it go to the first reboot and then copied this log. I logged into the worker and it looks ok (dmesg, etc) the only noticable thing was /mnt is empty (just a lost+found directory) and I was expecting to see an nfs mount for galaxy export or something. But I’m still finding my way round the system. It may also have been the time in the reboot cycle that I was there.
Best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282 <tel:%2B44%20141%20330%208282>
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
From: Enis Afgan <enis.afgan@irb.hr <mailto:enis.afgan@irb.hr>> Date: Friday, 19 June 2015 17:00 To: Nick Dickens <nick.dickens@glasgow.ac.uk <mailto:nick.dickens@glasgow.ac.uk>> Cc: "galaxy-dev@lists.galaxyproject.org <mailto:galaxy-dev@lists.galaxyproject.org>" <galaxy-dev@lists.galaxyproject.org <mailto:galaxy-dev@lists.galaxyproject.org>> Subject: Re: [spam?] [galaxy-dev] Cloudman cluster not starting workers
Hi Nick, Sorry to hear you're having trouble. I just tried a couple of scenarios and they all worked as expected (e.g., with and without elastic IPs, different instance types).
The main CloudMan log is located in /mnt/cm/paster.log, on both master and worker instances (if you didn't download the ssh key from cloudlaunch, you can ssh with ubuntu username and the same password as provided on the cloudlaunch form). The log is also available from the UI if you go to Admin page and then click 'Show CloudMan log' under 'System controls'. If you can share that, we can hopefully figure out what's going on.
Best, Enis
On Wed, Jun 17, 2015 at 11:47 AM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk <mailto:Nick.Dickens@glasgow.ac.uk>> wrote:
Hi All,
First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance".
It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc).
I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy.
Thanks for your help and best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282 <tel:%2B44%20141%20330%208282>
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
----------------------------- Upozorenje -----------------------------
Automatskom detekcijom utvrdjeno je da tekst ove poruke podsjeca na tzv. phishing poruku.
AKO SE U PORUCI TRAZI DA POSALJETE VASU IRB LOZINKU ILI DA UNESETE IRB PODATKE NA NAVEDENOM LINKU, RADI SE O NAPADU S CILJEM KRADJE I ZLOUPOTREBE PODATAKA.
Centar za informatiku i racunarstvo, Institut Rudjer Boskovic
----------------------------- Upozorenje -----------------------------
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282 http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
Dear Enis, Thanks for your help with this. I can confirm that if I use a password that doesn’t start with an exclamation mark the cloudman adding nodes works fine, it can contain an ! but just not start with one. Knowing where to look logs-wise will really help – is there a schematic at all somewhere that shows the Cloudman startup procedure? I’m working on one for my own understanding but I’m a firm believer in not reinventing the wheel. Best wishes, Nick -- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282 http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg... From: Nick Dickens <nick.dickens@glasgow.ac.uk<mailto:nick.dickens@glasgow.ac.uk>> Date: Monday, 22 June 2015 17:02 To: Enis Afgan <enis.afgan@irb.hr<mailto:enis.afgan@irb.hr>> Cc: "galaxy-dev@lists.galaxyproject.org<mailto:galaxy-dev@lists.galaxyproject.org>" <galaxy-dev@lists.galaxyproject.org<mailto:galaxy-dev@lists.galaxyproject.org>> Subject: Re: [galaxy-dev] Cloudman cluster not starting workers Aha - so neither /tmp/cm/cm_boot.py.log nor /mnt/cm/paster.log exist, but the ec2autorun.log showed a reasonable error, I used a setup password beginning with an exclamation mark, which it seems not to like. I realised that I accidentally posted this to the list previously so don't worry about it being here I've killed that particular cluster. When I was trying different configurations, etc I also consistently used the same password format (which I will no longer use now I posted it to a public mail list like a moron). I assume since ec2autorun is first in the bootstrap setup if it fails then so does everything else. I'll try it with a different password format - and get back to you (I have a meeting just now). But this looks like an issue with the password format to me...and possibly a bug in the script? Best wishes, Nick [INFO] ec2autorun:57 2015-06-22 15:38:42,207: Getting user data from 'http://169.254.169.254/latest/user-data', attempt 0 [DEBUG] ec2autorun:61 2015-06-22 15:38:42,210: Saving user data in its original format to file '/tmp/cm/original_userData.yaml' [DEBUG] ec2autorun:65 2015-06-22 15:38:42,211: Got user data [INFO] ec2autorun:416 2015-06-22 15:38:42,211: Handling user data in YAML format. Traceback (most recent call last): File "/usr/bin/ec2autorun.py", line 516, in <module> main() File "/usr/bin/ec2autorun.py", line 512, in main _parse_user_data(ud) File "/usr/bin/ec2autorun.py", line 504, in _parse_user_data _handle_yaml(ud) File "/usr/bin/ec2autorun.py", line 417, in _handle_yaml ud = _load_user_data(user_data) File "/usr/bin/ec2autorun.py", line 402, in _load_user_data ud = yaml.load(user_data) File "/usr/lib/python2.7/dist-packages/yaml/__init__.py", line 71, in load return loader.get_single_data() File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 39, in get_single_data return self.construct_document(node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 48, in construct_document for dummy in generator: File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map value = self.construct_mapping(node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 208, in construct_mapping return BaseConstructor.construct_mapping(self, node, deep=deep) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 133, in construct_mapping value = self.construct_object(value_node, deep=deep) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 88, in construct_object data = constructor(self, node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 414, in construct_undefined node.start_mark) yaml.constructor.ConstructorError: could not determine a constructor for the tag '!galaxySATDEVZGK' in "<string>", line 4, column 13: freenxpass: !galaxySATDEVZGK On 22/06/15 14:45, Enis Afgan wrote: Hmm - /mnt definitely should not be empty. There's nothing unusual in the log you sent so could you please send me the one from the worker? It's in the same location (/mnt/cm/paster.log) If it's not there, please track the boot procedure logs as follows and can send those logs: 1. /usr/bin/ec2autorun.log 2. /tmp/cm/cm_boot.py.log 3. /mnt/cm/paster.log Thanks, Enis On Fri, Jun 19, 2015 at 5:02 PM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk<mailto:Nick.Dickens@glasgow.ac.uk>> wrote: Thanks – I’ve attached the log. I just tried to start a worker and let it go to the first reboot and then copied this log. I logged into the worker and it looks ok (dmesg, etc) the only noticable thing was /mnt is empty (just a lost+found directory) and I was expecting to see an nfs mount for galaxy export or something. But I’m still finding my way round the system. It may also have been the time in the reboot cycle that I was there. Best wishes, Nick -- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282<tel:%2B44%20141%20330%208282> http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg... From: Enis Afgan <enis.afgan@irb.hr<mailto:enis.afgan@irb.hr>> Date: Friday, 19 June 2015 17:00 To: Nick Dickens <nick.dickens@glasgow.ac.uk<mailto:nick.dickens@glasgow.ac.uk>> Cc: "galaxy-dev@lists.galaxyproject.org<mailto:galaxy-dev@lists.galaxyproject.org>" <galaxy-dev@lists.galaxyproject.org<mailto:galaxy-dev@lists.galaxyproject.org>> Subject: Re: [spam?] [galaxy-dev] Cloudman cluster not starting workers Hi Nick, Sorry to hear you're having trouble. I just tried a couple of scenarios and they all worked as expected (e.g., with and without elastic IPs, different instance types). The main CloudMan log is located in /mnt/cm/paster.log, on both master and worker instances (if you didn't download the ssh key from cloudlaunch, you can ssh with ubuntu username and the same password as provided on the cloudlaunch form). The log is also available from the UI if you go to Admin page and then click 'Show CloudMan log' under 'System controls'. If you can share that, we can hopefully figure out what's going on. Best, Enis On Wed, Jun 17, 2015 at 11:47 AM, Nicholas Dickens <Nick.Dickens@glasgow.ac.uk<mailto:Nick.Dickens@glasgow.ac.uk>> wrote: Hi All, First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance". It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc). I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy. Thanks for your help and best wishes, Nick -- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282<tel:%2B44%20141%20330%208282> http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/ http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg... ----------------------------- Upozorenje ----------------------------- Automatskom detekcijom utvrdjeno je da tekst ove poruke podsjeca na tzv. phishing poruku. AKO SE U PORUCI TRAZI DA POSALJETE VASU IRB LOZINKU ILI DA UNESETE IRB PODATKE NA NAVEDENOM LINKU, RADI SE O NAPADU S CILJEM KRADJE I ZLOUPOTREBE PODATAKA. Centar za informatiku i racunarstvo, Institut Rudjer Boskovic ----------------------------- Upozorenje ----------------------------- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Nick Dickens DPhil BSc ARCS Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA Tel: +44 141 330 8282 http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/http://www.gla...
Glad to hear it's working now. Sorry about the trouble. The startup procedure is available here: http://cloudman.irb.hr/blog/2013/03/06/cloudman-startup-procedure/ As part of a larger documentation redo, I'll add it somewhere on the main wiki we well. On Mon, Jun 22, 2015 at 4:45 PM, Nicholas Dickens < Nick.Dickens@glasgow.ac.uk> wrote:
Dear Enis,
Thanks for your help with this. I can confirm that if I use a password that doesn’t start with an exclamation mark the cloudman adding nodes works fine, it can contain an ! but just not start with one.
Knowing where to look logs-wise will really help – is there a schematic at all somewhere that shows the Cloudman startup procedure? I’m working on one for my own understanding but I’m a firm believer in not reinventing the wheel.
Best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/
http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
From: Nick Dickens <nick.dickens@glasgow.ac.uk> Date: Monday, 22 June 2015 17:02 To: Enis Afgan <enis.afgan@irb.hr> Cc: "galaxy-dev@lists.galaxyproject.org" < galaxy-dev@lists.galaxyproject.org> Subject: Re: [galaxy-dev] Cloudman cluster not starting workers
Aha - so neither /tmp/cm/cm_boot.py.log nor /mnt/cm/paster.log exist, but the ec2autorun.log showed a reasonable error, I used a setup password beginning with an exclamation mark, which it seems not to like. I realised that I accidentally posted this to the list previously so don't worry about it being here I've killed that particular cluster. When I was trying different configurations, etc I also consistently used the same password format (which I will no longer use now I posted it to a public mail list like a moron).
I assume since ec2autorun is first in the bootstrap setup if it fails then so does everything else. I'll try it with a different password format - and get back to you (I have a meeting just now). But this looks like an issue with the password format to me...and possibly a bug in the script?
Best wishes,
Nick
[INFO] ec2autorun:57 2015-06-22 15:38:42,207: Getting user data from ' http://169.254.169.254/latest/user-data', attempt 0 [DEBUG] ec2autorun:61 2015-06-22 15:38:42,210: Saving user data in its original format to file '/tmp/cm/original_userData.yaml' [DEBUG] ec2autorun:65 2015-06-22 15:38:42,211: Got user data [INFO] ec2autorun:416 2015-06-22 15:38:42,211: Handling user data in YAML format. Traceback (most recent call last): File "/usr/bin/ec2autorun.py", line 516, in <module> main() File "/usr/bin/ec2autorun.py", line 512, in main _parse_user_data(ud) File "/usr/bin/ec2autorun.py", line 504, in _parse_user_data _handle_yaml(ud) File "/usr/bin/ec2autorun.py", line 417, in _handle_yaml ud = _load_user_data(user_data) File "/usr/bin/ec2autorun.py", line 402, in _load_user_data ud = yaml.load(user_data) File "/usr/lib/python2.7/dist-packages/yaml/__init__.py", line 71, in load return loader.get_single_data() File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 39, in get_single_data return self.construct_document(node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 48, in construct_document for dummy in generator: File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map value = self.construct_mapping(node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 208, in construct_mapping return BaseConstructor.construct_mapping(self, node, deep=deep) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 133, in construct_mapping value = self.construct_object(value_node, deep=deep) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 88, in construct_object data = constructor(self, node) File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 414, in construct_undefined node.start_mark) yaml.constructor.ConstructorError: could not determine a constructor for the tag '!galaxySATDEVZGK' in "<string>", line 4, column 13: freenxpass: !galaxySATDEVZGK
On 22/06/15 14:45, Enis Afgan wrote:
Hmm - /mnt definitely should not be empty. There's nothing unusual in the log you sent so could you please send me the one from the worker? It's in the same location (/mnt/cm/paster.log) If it's not there, please track the boot procedure logs as follows and can send those logs: 1. /usr/bin/ec2autorun.log 2. /tmp/cm/cm_boot.py.log 3. /mnt/cm/paster.log
Thanks, Enis
On Fri, Jun 19, 2015 at 5:02 PM, Nicholas Dickens < Nick.Dickens@glasgow.ac.uk> wrote:
Thanks – I’ve attached the log. I just tried to start a worker and let it go to the first reboot and then copied this log. I logged into the worker and it looks ok (dmesg, etc) the only noticable thing was /mnt is empty (just a lost+found directory) and I was expecting to see an nfs mount for galaxy export or something. But I’m still finding my way round the system. It may also have been the time in the reboot cycle that I was there.
Best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/
http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
From: Enis Afgan <enis.afgan@irb.hr> Date: Friday, 19 June 2015 17:00 To: Nick Dickens <nick.dickens@glasgow.ac.uk> Cc: "galaxy-dev@lists.galaxyproject.org" < galaxy-dev@lists.galaxyproject.org> Subject: Re: [spam?] [galaxy-dev] Cloudman cluster not starting workers
Hi Nick, Sorry to hear you're having trouble. I just tried a couple of scenarios and they all worked as expected (e.g., with and without elastic IPs, different instance types).
The main CloudMan log is located in /mnt/cm/paster.log, on both master and worker instances (if you didn't download the ssh key from cloudlaunch, you can ssh with ubuntu username and the same password as provided on the cloudlaunch form). The log is also available from the UI if you go to Admin page and then click 'Show CloudMan log' under 'System controls'. If you can share that, we can hopefully figure out what's going on.
Best, Enis
On Wed, Jun 17, 2015 at 11:47 AM, Nicholas Dickens < Nick.Dickens@glasgow.ac.uk> wrote:
Hi All,
First time post, as a quick intro, I’m have some reasonable experience with EC2 & AWS for developing our own pipelines, I’m comfortable in python and *nix flavours and I am developing a completely custom galaxy in AWS for members of the WTCMP, Glasgow. In the meantime, I have thrown up a quick cloudman galaxy using the cloudstart and the cloudman 2.3 ami (ami-a7dbf6ce) in us-east-1. Auto–scaling didn’t seem to work so I’ve switched it off and added nodes manually, I tried various sizes including ‘same as master’ instances but they just don’t start – in the EC2 console I can see them and see them running. But they’re constantly pending in the /cloud interface and in the log they reboot 4 times and then terminate – apparently not responding "10:16:56 - Instance i-xxxxxx not responding after 4 reboots. Terminating instance".
It’s out of the box, I editted the universe_wsgi.ini… file to disallow user registration and allow me to impersonate users but didn’t change anything else. The only other configuration I’ve done is associate an elastic IP with the master instance so I can have a more static url for a couple of test users (if I need to destroy it and start again, etc).
I’m new to the system so don’t know which logs are best to check…and am I missing something obvious? It there a known bug when using elastic IPs? I’ve googled but with no joy.
Thanks for your help and best wishes,
Nick -- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282
http://fb.me/WTCMPbix @WTCMPbix http://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/
http://www.gla.ac.uk/researchinstitutes/iii/staff/jeremymottram/comparativeg...
----------------------------- Upozorenje -----------------------------
Automatskom detekcijom utvrdjeno je da tekst ove poruke podsjeca na tzv. phishing poruku.
AKO SE U PORUCI TRAZI DA POSALJETE VASU IRB LOZINKU ILI DA UNESETE IRB PODATKE NA NAVEDENOM LINKU, RADI SE O NAPADU S CILJEM KRADJE I ZLOUPOTREBE PODATAKA.
Centar za informatiku i racunarstvo, Institut Rudjer Boskovic
----------------------------- Upozorenje -----------------------------
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Nick Dickens DPhil BSc ARCS
Bioinformatics Team Leader Wellcome Trust Centre for Molecular Parasitology B6-21 SGDB 120 University Place Glasgow G12 8TA
Tel: +44 141 330 8282 http://fb.me/WTCMPbix @WTCMPbixhttp://www.gla.ac.uk/researchinstitutes/iii/staff/nickdickens/http://www.gla...
participants (3)
-
Enis Afgan
-
Nicholas Dickens
-
Nick Dickens