CloudMan - Can't start nodes
Hi guys, I created a new Galaxy instance (probably around early July) with the web launcher (https://biocloudcentral.herokuapp.com/launch). I've been coming back and re-using it since then. However for the past week at least I haven't been able to launch new nodes. They show up as red on the indicators, and below I've pasted the error messages. (Could this be related to a new version of cloudman being released?) Thanks, Greg This is the cluster status log from my last attempt: 15:55:02 - Retrieved file 'persistent_data.yaml' from bucket 'cm-[redacted]' to 'pd.yaml'. 15:55:02 - Master starting 15:55:05 - Completed initial cluster configuration. 15:55:25 - Prerequisites OK; starting service 'SGE' 15:55:37 - Configuring SGE... 15:55:37 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 15:55:57 - Saved file 'persistent_data.yaml' to bucket 'cm-[redacted]' 15:55:57 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 15:55:58 - Adding 2 instance(s)... 15:57:32 - Instance 'i-56ba942e' reported alive 15:57:33 - Successfully generated root user's public key. 15:57:33 - Sent master public key to worker instance 'i-56ba942e'. 15:57:47 - Adding instance 'i-56ba942e' as SGE administrative host. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as administrative host. Process returned code 2 15:57:47 - Adding instance 'i-56ba942e' to SGE execution host list. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as execution host. Process returned code 2 15:57:47 - Problems updating @allhosts aimed at adding 'i-56ba942e', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_57_47' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:57:47 - Waiting on worker instance 'i-56ba942e' to configure itself... 15:57:47 - Instance 'i-54ba942c' reported alive 15:57:47 - Sent master public key to worker instance 'i-54ba942c'. 15:58:01 - Adding instance 'i-54ba942c' as SGE administrative host. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as administrative host. Process returned code 2 15:58:01 - Adding instance 'i-54ba942c' to SGE execution host list. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as execution host. Process returned code 2 15:58:01 - Problems updating @allhosts aimed at adding 'i-54ba942c', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_58_01' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:58:01 - Waiting on worker instance 'i-54ba942c' to configure itself...
Greg;
I created a new Galaxy instance (probably around early July) with the web launcher (https://biocloudcentral.herokuapp.com/launch).
I've been coming back and re-using it since then. However for the past week at least I haven't been able to launch new nodes. They show up as red on the indicators, and below I've pasted the error messages.
BioCloudCentral updated to the latest Ubuntu release, 12.04, recently so I'd guess this is the source of your error. CloudMan required some changes to handle SGE compatibility with the updated libraries in 12.04.
(Could this be related to a new version of cloudman being released?)
Did you upgrade your version of CloudMan? It should give you an option to 'Update CloudMan' in the upper panel on the cloud console page when it's out of date. Hopefully the updated CloudMan will take care of the issue, Brad
This is the cluster status log from my last attempt:
15:55:02 - Retrieved file 'persistent_data.yaml' from bucket 'cm-[redacted]' to 'pd.yaml'. 15:55:02 - Master starting 15:55:05 - Completed initial cluster configuration. 15:55:25 - Prerequisites OK; starting service 'SGE' 15:55:37 - Configuring SGE... 15:55:37 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 15:55:57 - Saved file 'persistent_data.yaml' to bucket 'cm-[redacted]' 15:55:57 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 15:55:58 - Adding 2 instance(s)... 15:57:32 - Instance 'i-56ba942e' reported alive 15:57:33 - Successfully generated root user's public key. 15:57:33 - Sent master public key to worker instance 'i-56ba942e'. 15:57:47 - Adding instance 'i-56ba942e' as SGE administrative host. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as administrative host. Process returned code 2 15:57:47 - Adding instance 'i-56ba942e' to SGE execution host list. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as execution host. Process returned code 2 15:57:47 - Problems updating @allhosts aimed at adding 'i-56ba942e', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_57_47' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:57:47 - Waiting on worker instance 'i-56ba942e' to configure itself... 15:57:47 - Instance 'i-54ba942c' reported alive 15:57:47 - Sent master public key to worker instance 'i-54ba942c'. 15:58:01 - Adding instance 'i-54ba942c' as SGE administrative host. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as administrative host. Process returned code 2 15:58:01 - Adding instance 'i-54ba942c' to SGE execution host list. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as execution host. Process returned code 2 15:58:01 - Problems updating @allhosts aimed at adding 'i-54ba942c', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_58_01' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:58:01 - Waiting on worker instance 'i-54ba942c' to configure itself... ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thanks for the reply, Brad. I tried clicking "update cloudman" but it now says "there was an error updating cloudman". Here are the contents of the Cluster status log but I'm not sure what the error is: 18:06:43 - Retrieved file 'persistent_data.yaml' from bucket 'cm-xxx' to 'pd.yaml'. 18:06:43 - Master starting 18:06:44 - Completed initial cluster configuration. 18:07:04 - Prerequisites OK; starting service 'SGE' 18:07:24 - Configuring SGE... 18:07:24 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 18:07:38 - Saved file 'persistent_data.yaml' to bucket 'cm-xxx' 18:07:38 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 18:09:23 - Updating CloudMan application source file in cluster's bucket 'cm-xxx'. It will be automatically available the next this cluster is instantiated. Thanks again, Greg On Mon, Jul 23, 2012 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg;
I created a new Galaxy instance (probably around early July) with the web launcher (https://biocloudcentral.herokuapp.com/launch).
I've been coming back and re-using it since then. However for the past week at least I haven't been able to launch new nodes. They show up as red on the indicators, and below I've pasted the error messages.
BioCloudCentral updated to the latest Ubuntu release, 12.04, recently so I'd guess this is the source of your error. CloudMan required some changes to handle SGE compatibility with the updated libraries in 12.04.
(Could this be related to a new version of cloudman being released?)
Did you upgrade your version of CloudMan? It should give you an option to 'Update CloudMan' in the upper panel on the cloud console page when it's out of date. Hopefully the updated CloudMan will take care of the issue, Brad
This is the cluster status log from my last attempt:
15:55:02 - Retrieved file 'persistent_data.yaml' from bucket 'cm-[redacted]' to 'pd.yaml'. 15:55:02 - Master starting 15:55:05 - Completed initial cluster configuration. 15:55:25 - Prerequisites OK; starting service 'SGE' 15:55:37 - Configuring SGE... 15:55:37 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 15:55:57 - Saved file 'persistent_data.yaml' to bucket 'cm-[redacted]' 15:55:57 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 15:55:58 - Adding 2 instance(s)... 15:57:32 - Instance 'i-56ba942e' reported alive 15:57:33 - Successfully generated root user's public key. 15:57:33 - Sent master public key to worker instance 'i-56ba942e'. 15:57:47 - Adding instance 'i-56ba942e' as SGE administrative host. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as administrative host. Process returned code 2 15:57:47 - Adding instance 'i-56ba942e' to SGE execution host list. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as execution host. Process returned code 2 15:57:47 - Problems updating @allhosts aimed at adding 'i-56ba942e', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_57_47' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:57:47 - Waiting on worker instance 'i-56ba942e' to configure itself... 15:57:47 - Instance 'i-54ba942c' reported alive 15:57:47 - Sent master public key to worker instance 'i-54ba942c'. 15:58:01 - Adding instance 'i-54ba942c' as SGE administrative host. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as administrative host. Process returned code 2 15:58:01 - Adding instance 'i-54ba942c' to SGE execution host list. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as execution host. Process returned code 2 15:58:01 - Problems updating @allhosts aimed at adding 'i-54ba942c', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_58_01' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:58:01 - Waiting on worker instance 'i-54ba942c' to configure itself... ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg; Did you try rebooting or restarting the cluster? The error logs complain about SGE due to the original older CloudMan but it seems like cm.tar.gz got an update so will work going forward. You can double check by looking at your S3 console for the cm-yourinstance and seeing the date on the cm.tar.gz module. Hope this fixes it for you, Brad
Thanks for the reply, Brad.
I tried clicking "update cloudman" but it now says "there was an error updating cloudman".
Here are the contents of the Cluster status log but I'm not sure what the error is:
18:06:43 - Retrieved file 'persistent_data.yaml' from bucket 'cm-xxx' to 'pd.yaml'. 18:06:43 - Master starting 18:06:44 - Completed initial cluster configuration. 18:07:04 - Prerequisites OK; starting service 'SGE' 18:07:24 - Configuring SGE... 18:07:24 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 18:07:38 - Saved file 'persistent_data.yaml' to bucket 'cm-xxx' 18:07:38 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 18:09:23 - Updating CloudMan application source file in cluster's bucket 'cm-xxx'. It will be automatically available the next this cluster is instantiated.
Thanks again,
Greg
On Mon, Jul 23, 2012 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg;
I created a new Galaxy instance (probably around early July) with the web launcher (https://biocloudcentral.herokuapp.com/launch).
I've been coming back and re-using it since then. However for the past week at least I haven't been able to launch new nodes. They show up as red on the indicators, and below I've pasted the error messages.
BioCloudCentral updated to the latest Ubuntu release, 12.04, recently so I'd guess this is the source of your error. CloudMan required some changes to handle SGE compatibility with the updated libraries in 12.04.
(Could this be related to a new version of cloudman being released?)
Did you upgrade your version of CloudMan? It should give you an option to 'Update CloudMan' in the upper panel on the cloud console page when it's out of date. Hopefully the updated CloudMan will take care of the issue, Brad
This is the cluster status log from my last attempt:
15:55:02 - Retrieved file 'persistent_data.yaml' from bucket 'cm-[redacted]' to 'pd.yaml'. 15:55:02 - Master starting 15:55:05 - Completed initial cluster configuration. 15:55:25 - Prerequisites OK; starting service 'SGE' 15:55:37 - Configuring SGE... 15:55:37 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 15:55:57 - Saved file 'persistent_data.yaml' to bucket 'cm-[redacted]' 15:55:57 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 15:55:58 - Adding 2 instance(s)... 15:57:32 - Instance 'i-56ba942e' reported alive 15:57:33 - Successfully generated root user's public key. 15:57:33 - Sent master public key to worker instance 'i-56ba942e'. 15:57:47 - Adding instance 'i-56ba942e' as SGE administrative host. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as administrative host. Process returned code 2 15:57:47 - Adding instance 'i-56ba942e' to SGE execution host list. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as execution host. Process returned code 2 15:57:47 - Problems updating @allhosts aimed at adding 'i-56ba942e', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_57_47' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:57:47 - Waiting on worker instance 'i-56ba942e' to configure itself... 15:57:47 - Instance 'i-54ba942c' reported alive 15:57:47 - Sent master public key to worker instance 'i-54ba942c'. 15:58:01 - Adding instance 'i-54ba942c' as SGE administrative host. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as administrative host. Process returned code 2 15:58:01 - Adding instance 'i-54ba942c' to SGE execution host list. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as execution host. Process returned code 2 15:58:01 - Problems updating @allhosts aimed at adding 'i-54ba942c', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_58_01' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:58:01 - Waiting on worker instance 'i-54ba942c' to configure itself... ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Brad, I clicked terminate cluster and restarted it but it didn't seem to help. Here's what's in S3: Name Size Last Modified cm.tar.gz 502.7 KB Thu Jun 21 14:03:05 GMT-400 2012 cm.tar.gz_2012-07-24 502.7 KB Tue Jul 24 14:21:20 GMT-400 2012 cm_boot.py 9.9 KB Thu Jun 21 14:03:05 GMT-400 2012 persistent_data.yaml 170 bytes Tue Jul 24 14:53:10 GMT-400 2012 post_start_script 0 bytes Wed Jun 27 15:06:02 GMT-400 2012 resistance.clusterName 0 bytes Thu Jun 21 13:52:27 GMT-400 2012 What do you think happened? It appears to have downloaded a new cm.tar.gz and added a date suffix to the name. Should I delete cm.tar.gz and rename cm.tar.gz_2012-07-24 to cm.tar.gz? -Greg On Tue, Jul 24, 2012 at 9:45 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg; Did you try rebooting or restarting the cluster? The error logs complain about SGE due to the original older CloudMan but it seems like cm.tar.gz got an update so will work going forward. You can double check by looking at your S3 console for the cm-yourinstance and seeing the date on the cm.tar.gz module.
Hope this fixes it for you, Brad
Thanks for the reply, Brad.
I tried clicking "update cloudman" but it now says "there was an error updating cloudman".
Here are the contents of the Cluster status log but I'm not sure what the error is:
18:06:43 - Retrieved file 'persistent_data.yaml' from bucket 'cm-xxx' to 'pd.yaml'. 18:06:43 - Master starting 18:06:44 - Completed initial cluster configuration. 18:07:04 - Prerequisites OK; starting service 'SGE' 18:07:24 - Configuring SGE... 18:07:24 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 18:07:38 - Saved file 'persistent_data.yaml' to bucket 'cm-xxx' 18:07:38 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 18:09:23 - Updating CloudMan application source file in cluster's bucket 'cm-xxx'. It will be automatically available the next this cluster is instantiated.
Thanks again,
Greg
On Mon, Jul 23, 2012 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg;
I created a new Galaxy instance (probably around early July) with the web launcher (https://biocloudcentral.herokuapp.com/launch).
I've been coming back and re-using it since then. However for the past week at least I haven't been able to launch new nodes. They show up as red on the indicators, and below I've pasted the error messages.
BioCloudCentral updated to the latest Ubuntu release, 12.04, recently so I'd guess this is the source of your error. CloudMan required some changes to handle SGE compatibility with the updated libraries in 12.04.
(Could this be related to a new version of cloudman being released?)
Did you upgrade your version of CloudMan? It should give you an option to 'Update CloudMan' in the upper panel on the cloud console page when it's out of date. Hopefully the updated CloudMan will take care of the issue, Brad
This is the cluster status log from my last attempt:
15:55:02 - Retrieved file 'persistent_data.yaml' from bucket 'cm-[redacted]' to 'pd.yaml'. 15:55:02 - Master starting 15:55:05 - Completed initial cluster configuration. 15:55:25 - Prerequisites OK; starting service 'SGE' 15:55:37 - Configuring SGE... 15:55:37 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 15:55:57 - Saved file 'persistent_data.yaml' to bucket 'cm-[redacted]' 15:55:57 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 15:55:58 - Adding 2 instance(s)... 15:57:32 - Instance 'i-56ba942e' reported alive 15:57:33 - Successfully generated root user's public key. 15:57:33 - Sent master public key to worker instance 'i-56ba942e'. 15:57:47 - Adding instance 'i-56ba942e' as SGE administrative host. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as administrative host. Process returned code 2 15:57:47 - Adding instance 'i-56ba942e' to SGE execution host list. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as execution host. Process returned code 2 15:57:47 - Problems updating @allhosts aimed at adding 'i-56ba942e', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_57_47' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:57:47 - Waiting on worker instance 'i-56ba942e' to configure itself... 15:57:47 - Instance 'i-54ba942c' reported alive 15:57:47 - Sent master public key to worker instance 'i-54ba942c'. 15:58:01 - Adding instance 'i-54ba942c' as SGE administrative host. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as administrative host. Process returned code 2 15:58:01 - Adding instance 'i-54ba942c' to SGE execution host list. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as execution host. Process returned code 2 15:58:01 - Problems updating @allhosts aimed at adding 'i-54ba942c', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_58_01' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:58:01 - Waiting on worker instance 'i-54ba942c' to configure itself... ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg; I have to appeal to Enis for a detailed explanation of the cases where it can fail like that, but doing it manually seems like a great approach at this point. You can get the latest cm.tar.gz and cm_boot.py via: $ wget https://s3.amazonaws.com/cloudman/cm.tar.gz $ wget https://s3.amazonaws.com/cloudman/cm_boot.py and then upload these into your cm-xxx bucket. This is what the automated update script should do for you. Before doing this I'd download backups of the current cm.tar.gz/cm_boot.py. Hopefully this'll do it, Brad
Hi Brad,
I clicked terminate cluster and restarted it but it didn't seem to help.
Here's what's in S3:
Name Size Last Modified cm.tar.gz 502.7 KB Thu Jun 21 14:03:05 GMT-400 2012 cm.tar.gz_2012-07-24 502.7 KB Tue Jul 24 14:21:20 GMT-400 2012 cm_boot.py 9.9 KB Thu Jun 21 14:03:05 GMT-400 2012 persistent_data.yaml 170 bytes Tue Jul 24 14:53:10 GMT-400 2012 post_start_script 0 bytes Wed Jun 27 15:06:02 GMT-400 2012 resistance.clusterName 0 bytes Thu Jun 21 13:52:27 GMT-400 2012
What do you think happened? It appears to have downloaded a new cm.tar.gz and added a date suffix to the name.
Should I delete cm.tar.gz and rename cm.tar.gz_2012-07-24 to cm.tar.gz?
-Greg
On Tue, Jul 24, 2012 at 9:45 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg; Did you try rebooting or restarting the cluster? The error logs complain about SGE due to the original older CloudMan but it seems like cm.tar.gz got an update so will work going forward. You can double check by looking at your S3 console for the cm-yourinstance and seeing the date on the cm.tar.gz module.
Hope this fixes it for you, Brad
Thanks for the reply, Brad.
I tried clicking "update cloudman" but it now says "there was an error updating cloudman".
Here are the contents of the Cluster status log but I'm not sure what the error is:
18:06:43 - Retrieved file 'persistent_data.yaml' from bucket 'cm-xxx' to 'pd.yaml'. 18:06:43 - Master starting 18:06:44 - Completed initial cluster configuration. 18:07:04 - Prerequisites OK; starting service 'SGE' 18:07:24 - Configuring SGE... 18:07:24 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 18:07:38 - Saved file 'persistent_data.yaml' to bucket 'cm-xxx' 18:07:38 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 18:09:23 - Updating CloudMan application source file in cluster's bucket 'cm-xxx'. It will be automatically available the next this cluster is instantiated.
Thanks again,
Greg
On Mon, Jul 23, 2012 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg;
I created a new Galaxy instance (probably around early July) with the web launcher (https://biocloudcentral.herokuapp.com/launch).
I've been coming back and re-using it since then. However for the past week at least I haven't been able to launch new nodes. They show up as red on the indicators, and below I've pasted the error messages.
BioCloudCentral updated to the latest Ubuntu release, 12.04, recently so I'd guess this is the source of your error. CloudMan required some changes to handle SGE compatibility with the updated libraries in 12.04.
(Could this be related to a new version of cloudman being released?)
Did you upgrade your version of CloudMan? It should give you an option to 'Update CloudMan' in the upper panel on the cloud console page when it's out of date. Hopefully the updated CloudMan will take care of the issue, Brad
This is the cluster status log from my last attempt:
15:55:02 - Retrieved file 'persistent_data.yaml' from bucket 'cm-[redacted]' to 'pd.yaml'. 15:55:02 - Master starting 15:55:05 - Completed initial cluster configuration. 15:55:25 - Prerequisites OK; starting service 'SGE' 15:55:37 - Configuring SGE... 15:55:37 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 15:55:57 - Saved file 'persistent_data.yaml' to bucket 'cm-[redacted]' 15:55:57 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 15:55:58 - Adding 2 instance(s)... 15:57:32 - Instance 'i-56ba942e' reported alive 15:57:33 - Successfully generated root user's public key. 15:57:33 - Sent master public key to worker instance 'i-56ba942e'. 15:57:47 - Adding instance 'i-56ba942e' as SGE administrative host. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as administrative host. Process returned code 2 15:57:47 - Adding instance 'i-56ba942e' to SGE execution host list. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as execution host. Process returned code 2 15:57:47 - Problems updating @allhosts aimed at adding 'i-56ba942e', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_57_47' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:57:47 - Waiting on worker instance 'i-56ba942e' to configure itself... 15:57:47 - Instance 'i-54ba942c' reported alive 15:57:47 - Sent master public key to worker instance 'i-54ba942c'. 15:58:01 - Adding instance 'i-54ba942c' as SGE administrative host. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as administrative host. Process returned code 2 15:58:01 - Adding instance 'i-54ba942c' to SGE execution host list. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as execution host. Process returned code 2 15:58:01 - Problems updating @allhosts aimed at adding 'i-54ba942c', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_58_01' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:58:01 - Waiting on worker instance 'i-54ba942c' to configure itself... ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Ok, yes, that seems to have fixed it. Thanks for the help! -Greg On Wed, Jul 25, 2012 at 9:16 AM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg; I have to appeal to Enis for a detailed explanation of the cases where it can fail like that, but doing it manually seems like a great approach at this point. You can get the latest cm.tar.gz and cm_boot.py via:
$ wget https://s3.amazonaws.com/cloudman/cm.tar.gz $ wget https://s3.amazonaws.com/cloudman/cm_boot.py
and then upload these into your cm-xxx bucket. This is what the automated update script should do for you. Before doing this I'd download backups of the current cm.tar.gz/cm_boot.py.
Hopefully this'll do it, Brad
Hi Brad,
I clicked terminate cluster and restarted it but it didn't seem to help.
Here's what's in S3:
Name Size Last Modified cm.tar.gz 502.7 KB Thu Jun 21 14:03:05 GMT-400 2012 cm.tar.gz_2012-07-24 502.7 KB Tue Jul 24 14:21:20 GMT-400 2012 cm_boot.py 9.9 KB Thu Jun 21 14:03:05 GMT-400 2012 persistent_data.yaml 170 bytes Tue Jul 24 14:53:10 GMT-400 2012 post_start_script 0 bytes Wed Jun 27 15:06:02 GMT-400 2012 resistance.clusterName 0 bytes Thu Jun 21 13:52:27 GMT-400 2012
What do you think happened? It appears to have downloaded a new cm.tar.gz and added a date suffix to the name.
Should I delete cm.tar.gz and rename cm.tar.gz_2012-07-24 to cm.tar.gz?
-Greg
On Tue, Jul 24, 2012 at 9:45 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg; Did you try rebooting or restarting the cluster? The error logs complain about SGE due to the original older CloudMan but it seems like cm.tar.gz got an update so will work going forward. You can double check by looking at your S3 console for the cm-yourinstance and seeing the date on the cm.tar.gz module.
Hope this fixes it for you, Brad
Thanks for the reply, Brad.
I tried clicking "update cloudman" but it now says "there was an error updating cloudman".
Here are the contents of the Cluster status log but I'm not sure what the error is:
18:06:43 - Retrieved file 'persistent_data.yaml' from bucket 'cm-xxx' to 'pd.yaml'. 18:06:43 - Master starting 18:06:44 - Completed initial cluster configuration. 18:07:04 - Prerequisites OK; starting service 'SGE' 18:07:24 - Configuring SGE... 18:07:24 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 18:07:38 - Saved file 'persistent_data.yaml' to bucket 'cm-xxx' 18:07:38 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 18:09:23 - Updating CloudMan application source file in cluster's bucket 'cm-xxx'. It will be automatically available the next this cluster is instantiated.
Thanks again,
Greg
On Mon, Jul 23, 2012 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Greg;
I created a new Galaxy instance (probably around early July) with the web launcher (https://biocloudcentral.herokuapp.com/launch).
I've been coming back and re-using it since then. However for the past week at least I haven't been able to launch new nodes. They show up as red on the indicators, and below I've pasted the error messages.
BioCloudCentral updated to the latest Ubuntu release, 12.04, recently so I'd guess this is the source of your error. CloudMan required some changes to handle SGE compatibility with the updated libraries in 12.04.
(Could this be related to a new version of cloudman being released?)
Did you upgrade your version of CloudMan? It should give you an option to 'Update CloudMan' in the upper panel on the cloud console page when it's out of date. Hopefully the updated CloudMan will take care of the issue, Brad
This is the cluster status log from my last attempt:
15:55:02 - Retrieved file 'persistent_data.yaml' from bucket 'cm-[redacted]' to 'pd.yaml'. 15:55:02 - Master starting 15:55:05 - Completed initial cluster configuration. 15:55:25 - Prerequisites OK; starting service 'SGE' 15:55:37 - Configuring SGE... 15:55:37 - Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '1' and following stderr: '' 15:55:57 - Saved file 'persistent_data.yaml' to bucket 'cm-[redacted]' 15:55:57 - Trouble comparing local (/mnt/cm/post_start_script) and remote (post_start_script) file modified times: [Errno 2] No such file or directory: '/mnt/cm/post_start_script' 15:55:58 - Adding 2 instance(s)... 15:57:32 - Instance 'i-56ba942e' reported alive 15:57:33 - Successfully generated root user's public key. 15:57:33 - Sent master public key to worker instance 'i-56ba942e'. 15:57:47 - Adding instance 'i-56ba942e' as SGE administrative host. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as administrative host. Process returned code 2 15:57:47 - Adding instance 'i-56ba942e' to SGE execution host list. 15:57:47 - Process encountered problems adding instance 'i-56ba942e' as execution host. Process returned code 2 15:57:47 - Problems updating @allhosts aimed at adding 'i-56ba942e', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_57_47' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:57:47 - Waiting on worker instance 'i-56ba942e' to configure itself... 15:57:47 - Instance 'i-54ba942c' reported alive 15:57:47 - Sent master public key to worker instance 'i-54ba942c'. 15:58:01 - Adding instance 'i-54ba942c' as SGE administrative host. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as administrative host. Process returned code 2 15:58:01 - Adding instance 'i-54ba942c' to SGE execution host list. 15:58:01 - Process encountered problems adding instance 'i-54ba942c' as execution host. Process returned code 2 15:58:01 - Problems updating @allhosts aimed at adding 'i-54ba942c', running command 'export SGE_ROOT=/opt/sge;. $SGE_ROOT/default/common/settings.sh; /opt/sge/bin/lx24-amd64/qconf -Mhgrp /tmp/ah_add_15_58_01' returned code '2' and following stderr: '/bin/sh: 1: .: Can't open /opt/sge/default/common/settings.sh ' 15:58:01 - Waiting on worker instance 'i-54ba942c' to configure itself... ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
Brad Chapman
-
mailing list