Hi all, I'm new to CloudMan, and trying to launch a cluster via GVL (3 or 4) on NeCTAR. I'm able to get a head node running without trouble via launch.genome.edu.au, but launching worker nodes from the CloudMan interface appears to fail. CloudMan reboots the worker repeatedly before giving up. I logged into the worker to inspect log files and found the following, but it's not obvious to me what to do next. Hope this is something simple?
ubuntu@server-fbbd9a10-fb58-48d8-89cd-5ddd22821648:~$ cat /mnt/cm/paster.log Python version: (2, 7) Image configuration suports: {'apps': ['cloudman', 'galaxy']} 2015-10-15 14:15:24,973 DEBUG app:73 Initializing app 2015-10-15 14:15:24,973 DEBUG ec2:109 Gathering instance zone, attempt 0 2015-10-15 14:15:25,140 DEBUG ec2:115 Instance zone is 'NCI' 2015-10-15 14:15:25,140 DEBUG ec2:44 Gathering instance ami, attempt 0 2015-10-15 14:15:25,459 DEBUG app:76 Running on 'openstack' type of cloud in zone 'NCI' using image 'ami-00003484'. 2015-10-15 14:15:25,459 DEBUG app:98 Getting pd.yaml 2015-10-15 14:15:25,459 DEBUG openstack:99 Establishing a boto Swift connection. 2015-10-15 14:15:25,459 DEBUG openstack:109 Got boto Swift connection. 2015-10-15 14:15:26,112 DEBUG misc:578 Retrieved file 'persistent_data.yaml' from bucket 'cm-45b53bf5024e962bd27e15fd81fcc07d' on host 'swift.rc.nectar.org.au' to 'pd.yaml'. 2015-10-15 14:15:26,118 INFO app:119 Worker starting 2015-10-15 14:15:26,136 DEBUG ec2:76 Gathering instance id, attempt 0 2015-10-15 14:15:26,338 DEBUG ec2:82 Instance ID is 'i-0019a2fc' 2015-10-15 14:16:29,488 DEBUG comm:134 AMQP Connection Failure: [Errno 110] Connection timed out 2015-10-15 14:16:29,492 DEBUG base:57 Enabling 'root' controller, class: CM 2015-10-15 14:16:29,494 DEBUG buildapp:93 Enabling 'httpexceptions' middleware 2015-10-15 14:16:29,496 DEBUG buildapp:99 Enabling 'recursive' middleware 2015-10-15 14:16:29,499 DEBUG buildapp:119 Enabling 'print debug' middleware 2015-10-15 14:16:29,506 DEBUG buildapp:133 Enabling 'error' middleware 2015-10-15 14:16:29,507 DEBUG buildapp:143 Enabling 'config' middleware 2015-10-15 14:16:29,508 DEBUG buildapp:147 Enabling 'x-forwarded-host' middleware 2015-10-15 14:16:29,517 DEBUG misc:768 'cp /etc/hosts /etc/hosts.orig' command OK 2015-10-15 14:16:29,528 DEBUG misc:768 'cp /tmp/tmpuV3NTJ /etc/hosts' command OK Starting server in PID 2825. 2015-10-15 14:16:29,533 DEBUG misc:768 'chmod 644 /etc/hosts' command OK 2015-10-15 14:16:29,533 DEBUG worker:558 Trying to setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm object at 0x2743950>' serving on 0.0.0.0:42284 view at http://127.0.0.1:42284 2015-10-15 14:17:32,656 DEBUG comm:134 AMQP Connection Failure: [Errno 110] Connection timed out 2015-10-15 14:17:32,656 DEBUG worker:558 Trying to setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm object at 0x2743950>' 2015-10-15 14:18:35,760 DEBUG comm:134 AMQP Connection Failure: [Errno 110] Connection timed out 2015-10-15 14:18:35,760 DEBUG worker:558 Trying to setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm object at 0x2743950>' 2015-10-15 14:19:38,864 DEBUG comm:134 AMQP Connection Failure: [Errno 110] Connection timed out 2015-10-15 14:19:38,864 DEBUG worker:558 Trying to setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm object at 0x2743950>'
ubuntu@server-fbbd9a10-fb58-48d8-89cd-5ddd22821648:~$ cat /tmp/cm/cm_boot.py.log 2015-10-15 14:23:43,713 DEBUG cm_boot:430 - virtual-burrito seems to be installed 2015-10-15 14:23:44,037 DEBUG cm_boot:25 - Successfully ran '/bin/bash -l -c 'VIRTUALENVWRAPPER_LOG_DIR=/tmp/; HOME=/home/ubuntu; . /home/ubuntu/.venvburrito/startup.sh; lsvirtualenv | grep CM'' 2015-10-15 14:23:44,037 DEBUG cm_boot:433 - 'CM' virtualenv found 2015-10-15 14:23:44,049 DEBUG cm_boot:493 - Fixing /etc/hosts on NeCTAR 2015-10-15 14:23:44,930 INFO cm_boot:244 - << Starting nginx >> 2015-10-15 14:23:44,931 DEBUG cm_boot:169 - Reconfiguring nginx conf 2015-10-15 14:23:44,931 INFO cm_boot:286 - Attempting to configure max_client_body_size in /usr/nginx/conf/nginx.conf 2015-10-15 14:23:44,934 DEBUG cm_boot:25 - Successfully ran 'cp /usr/nginx/conf/nginx.conf /tmp/cm/original_nginx.conf' 2015-10-15 14:23:44,936 DEBUG cm_boot:25 - Successfully ran 'uniq /tmp/cm/original_nginx.conf > /usr/nginx/conf/nginx.conf' 2015-10-15 14:23:44,937 DEBUG cm_boot:25 - Successfully ran 'grep 'client_max_body_size' /usr/nginx/conf/nginx.conf' 2015-10-15 14:23:44,938 DEBUG cm_boot:265 - Creating tmp dir for nginx /mnt/galaxy/upload_store 2015-10-15 14:23:44,938 DEBUG cm_boot:68 - Checking /usr/local/sbin/nginx 2015-10-15 14:23:44,938 DEBUG cm_boot:58 - /usr/local/sbin/nginx is file: False; it's executable: False 2015-10-15 14:23:44,938 DEBUG cm_boot:68 - Checking /usr/local/bin/nginx 2015-10-15 14:23:44,938 DEBUG cm_boot:58 - /usr/local/bin/nginx is file: False; it's executable: False 2015-10-15 14:23:44,938 DEBUG cm_boot:68 - Checking /usr/bin/nginx 2015-10-15 14:23:44,938 DEBUG cm_boot:58 - /usr/bin/nginx is file: False; it's executable: False 2015-10-15 14:23:44,938 DEBUG cm_boot:68 - Checking /usr/sbin/nginx 2015-10-15 14:23:44,938 DEBUG cm_boot:58 - /usr/sbin/nginx is file: False; it's executable: False 2015-10-15 14:23:44,938 DEBUG cm_boot:68 - Checking /sbin/nginx 2015-10-15 14:23:44,939 DEBUG cm_boot:58 - /sbin/nginx is file: False; it's executable: False 2015-10-15 14:23:44,939 DEBUG cm_boot:68 - Checking /bin/nginx 2015-10-15 14:23:44,939 DEBUG cm_boot:58 - /bin/nginx is file: False; it's executable: False 2015-10-15 14:23:44,939 DEBUG cm_boot:68 - Checking /usr/sbin/nginx 2015-10-15 14:23:44,939 DEBUG cm_boot:58 - /usr/sbin/nginx is file: False; it's executable: False 2015-10-15 14:23:44,939 DEBUG cm_boot:68 - Checking /usr/nginx/sbin/nginx 2015-10-15 14:23:44,939 DEBUG cm_boot:58 - /usr/nginx/sbin/nginx is file: True; it's executable: True 2015-10-15 14:23:44,939 DEBUG cm_boot:270 - Using '/usr/nginx/sbin/nginx' as the nginx executable 2015-10-15 14:23:44,946 ERROR cm_boot:31 - Error running 'ps xa | grep nginx | grep -v grep'. Process returned code '1' and following stderr: '' 2015-10-15 14:23:44,964 DEBUG cm_boot:25 - Successfully ran '/usr/nginx/sbin/nginx' 2015-10-15 14:23:44,966 DEBUG cm_boot:25 - Successfully ran 'rm -rf /mnt/galaxy/upload_store' 2015-10-15 14:23:44,966 DEBUG cm_boot:281 - Deleting tmp dir for nginx /mnt/galaxy/upload_store 2015-10-15 14:23:44,966 INFO cm_boot:339 - << Downloading CloudMan >> 2015-10-15 14:23:44,966 DEBUG cm_boot:43 - Checking existence of directory '/mnt/cm' 2015-10-15 14:23:44,966 DEBUG cm_boot:52 - Directory '/mnt/cm' exists. 2015-10-15 14:23:44,966 DEBUG cm_boot:344 - Using user-provided default bucket: cloudman-gvl-304 2015-10-15 14:23:44,966 INFO cm_boot:324 - Connecting to a custom Object Store 2015-10-15 14:23:44,967 DEBUG cm_boot:333 - Got boto S3 connection: S3Connection:swift.rc.nectar.org.au 2015-10-15 14:23:44,967 DEBUG cm_boot:210 - Checking if key 'cm.tar.gz' exists in bucket 'cm-45b53bf5024e962bd27e15fd81fcc07d' 2015-10-15 14:23:45,276 INFO cm_boot:356 - CloudMan found in cluster bucket 'cm-45b53bf5024e962bd27e15fd81fcc07d'. 2015-10-15 14:23:45,276 DEBUG cm_boot:190 - Getting file cm.tar.gz from bucket cm-45b53bf5024e962bd27e15fd81fcc07d 2015-10-15 14:23:45,276 DEBUG cm_boot:194 - Attempting to retrieve file 'cm.tar.gz' from bucket 'cm-45b53bf5024e962bd27e15fd81fcc07d' 2015-10-15 14:23:45,726 INFO cm_boot:197 - Successfully retrieved file 'cm.tar.gz' from bucket 'cm-45b53bf5024e962bd27e15fd81fcc07d' via connection 'swift.rc.nectar.org.au' to '/mnt/cm/cm.tar.gz' 2015-10-15 14:23:45,727 DEBUG cm_boot:388 - Getting metadata 'revision' for file 'cm.tar.gz' from bucket 'cm-45b53bf5024e962bd27e15fd81fcc07d'