cloud instance missing /opt/sge/default/common directory - galaxy-dev - lists.galaxyproject.org

newer
Proxy login issues

cloud instance missing /opt/sge/default/common directory

older
Export Galaxy workflows to gUSE...

Joseph Hargitai

11 Sep 2011 11 Sep '11

5:35 a.m.

Hi, Upon restarting a saved cloud instance I am missing: -bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved? The instance now does not boot beyond this point. Have login and admin console access. joe

Attachments:

attachment.htm (text/html — 780 bytes)

Reply

Sign in to reply online Use email software

Show replies by date

Enis Afgan

13 Sep 13 Sep

1:50 p.m.

Hi Joe, If you look in /mnt/cm/paster.log on the instance, are there any indications as to what went wrong? It should be toward the top of the log after the server gets started. SGE gets installed each time an instance is rebooted so simply rebooting it again may do the trick. You can also chose to manually remove/clean SGE before rebooting. To do so, you can follow the basic approach captured in this method: https://bitbucket.org/galaxy/cloudman/src/862d1087080f/cm/services/apps/sge.... Enis On Sun, Sep 11, 2011 at 12:05 AM, Joseph Hargitai < joseph.hargitai@einstein.yu.edu> wrote:

Hi,

Upon restarting a saved cloud instance I am missing:

-bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory

all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved?

The instance now does not boot beyond this point. Have login and admin console access.

joe

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Reply

Sign in to reply online Use email software

Joseph Hargitai

22 Sep 22 Sep

6:16 a.m.

New subject: cloud instance missing /opt/sge/default/common directory

the error is ' [DEBUG] galaxy:139 2011-09-22 00:03:21,055: Galaxy UI does not seem to be accessible. [DEBUG] master:1491 2011-09-22 00:03:21,055: S&S: SGE..Shut down; FS-galaxyIndices..OK; FS-galaxyTools..OK; FS-galaxyData..OK; Postgres..OK; Galaxy..Starting; [DEBUG] root:354 2011-09-22 00:03:24,724: Managing services: [] [INFO] galaxy:30 2011-09-22 00:03:24,724: Removing 'Galaxy' service [INFO] galaxy:122 2011-09-22 00:03:24,724: Shutting down Galaxy... [DEBUG] misc:511 2011-09-22 00:03:26,067: Successfully stopped Galaxy. [DEBUG] root:354 2011-09-22 00:03:33,936: Managing services: [] [DEBUG] sge:61 2011-09-22 00:03:33,937: Unpacking SGE from '/opt/galaxy/pkg/ge6.2u5' [DEBUG] sge:76 2011-09-22 00:03:33,937: Cleaning '/opt/sge' directory. [DEBUG] sge:82 2011-09-22 00:03:34,117: Unpacking SGE to '/opt/sge'. [INFO] sge:96 2011-09-22 00:03:35,557: Configuring SGE... [DEBUG] sge:104 2011-09-22 00:03:35,558: Created SGE install template as file '/opt/sge/galaxyEC2.conf' [DEBUG] sge:112 2011-09-22 00:03:35,558: Setting up SGE. [ERROR] misc:514 2011-09-22 00:03:35,651: Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '2' and following stderr: '[: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator error resolving local host: can't resolve host name (h_errno = HOST_NOT_FOUND) j ________________________________ From: Enis Afgan [afgane@gmail.com] Sent: Tuesday, September 13, 2011 4:20 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Hi Joe, If you look in /mnt/cm/paster.log on the instance, are there any indications as to what went wrong? It should be toward the top of the log after the server gets started. SGE gets installed each time an instance is rebooted so simply rebooting it again may do the trick. You can also chose to manually remove/clean SGE before rebooting. To do so, you can follow the basic approach captured in this method: https://bitbucket.org/galaxy/cloudman/src/862d1087080f/cm/services/apps/sge.... Enis On Sun, Sep 11, 2011 at 12:05 AM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: Hi, Upon restarting a saved cloud instance I am missing: -bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved? The instance now does not boot beyond this point. Have login and admin console access. joe ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

Reply

Sign in to reply online Use email software

Enis Afgan

6:18 p.m.

Hi Joe, And this is happening on a freshly booted instance (from a previously existing cluster) using the same AMI? The order of execution seems a bit odd, seeing Galaxy being removed before SGE is setup; SGE should be the first thing that gets setup so I'm wondering... If you log into the instance, what is in /etc/hosts? Does it match the instance DNS? And if you try executing that same command (cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf) by hand (as root), is any more info produced? Also, qmaster log should be available under /opt/sge/ge6 (or something like this) /default/spool/qmaster/ so please take a look there as well and see if more info is available. On Thu, Sep 22, 2011 at 12:46 AM, Joseph Hargitai < joseph.hargitai@einstein.yu.edu> wrote:

the error is

' [DEBUG] galaxy:139 2011-09-22 00:03:21,055: Galaxy UI does not seem to be accessible. [DEBUG] master:1491 2011-09-22 00:03:21,055: S&S: SGE..Shut down; FS-galaxyIndices..OK; FS-galaxyTools..OK; FS-galaxyData..OK; Postgres..OK; Galaxy..Starting; [DEBUG] root:354 2011-09-22 00:03:24,724: Managing services: [] [INFO] galaxy:30 2011-09-22 00:03:24,724: Removing 'Galaxy' service [INFO] galaxy:122 2011-09-22 00:03:24,724: Shutting down Galaxy... [DEBUG] misc:511 2011-09-22 00:03:26,067: Successfully stopped Galaxy. [DEBUG] root:354 2011-09-22 00:03:33,936: Managing services: [] [DEBUG] sge:61 2011-09-22 00:03:33,937: Unpacking SGE from '/opt/galaxy/pkg/ge6.2u5' [DEBUG] sge:76 2011-09-22 00:03:33,937: Cleaning '/opt/sge' directory. [DEBUG] sge:82 2011-09-22 00:03:34,117: Unpacking SGE to '/opt/sge'. [INFO] sge:96 2011-09-22 00:03:35,557: Configuring SGE... [DEBUG] sge:104 2011-09-22 00:03:35,558: Created SGE install template as file '/opt/sge/galaxyEC2.conf' [DEBUG] sge:112 2011-09-22 00:03:35,558: Setting up SGE. [ERROR] misc:514 2011-09-22 00:03:35,651: Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '2' and following stderr: '[: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator error resolving local host: can't resolve host name (h_errno = HOST_NOT_FOUND)

j

------------------------------ *From:* Enis Afgan [afgane@gmail.com] *Sent:* Tuesday, September 13, 2011 4:20 AM *To:* Joseph Hargitai *Cc:* galaxy-dev@lists.bx.psu.edu *Subject:* Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory

Hi Joe, If you look in /mnt/cm/paster.log on the instance, are there any indications as to what went wrong? It should be toward the top of the log after the server gets started. SGE gets installed each time an instance is rebooted so simply rebooting it again may do the trick. You can also chose to manually remove/clean SGE before rebooting. To do so, you can follow the basic approach captured in this method: https://bitbucket.org/galaxy/cloudman/src/862d1087080f/cm/services/apps/sge....

Enis

On Sun, Sep 11, 2011 at 12:05 AM, Joseph Hargitai < joseph.hargitai@einstein.yu.edu> wrote:

...
Hi,

Upon restarting a saved cloud instance I am missing:

-bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory

all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved?

The instance now does not boot beyond this point. Have login and admin console access.

joe

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Reply

Sign in to reply online Use email software

Joseph Hargitai

6:51 p.m.

New subject: cloud instance missing /opt/sge/default/common directory

Hi, Correct. Same instance. Slowly deteriorating even more - at first it had only SGE and Galaxy hung. Now when rebooting, only CM starts. No fs, postgres... The knee-jerk reaction is to just start a brand new instance, but that would not help anyone who wants to use this AIM in the future for production. I'll check your recommendation shortly. j ________________________________ From: Enis Afgan [eafgan@emory.edu] Sent: Thursday, September 22, 2011 8:48 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Hi Joe, And this is happening on a freshly booted instance (from a previously existing cluster) using the same AMI? The order of execution seems a bit odd, seeing Galaxy being removed before SGE is setup; SGE should be the first thing that gets setup so I'm wondering... If you log into the instance, what is in /etc/hosts? Does it match the instance DNS? And if you try executing that same command (cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf) by hand (as root), is any more info produced? Also, qmaster log should be available under /opt/sge/ge6 (or something like this) /default/spool/qmaster/ so please take a look there as well and see if more info is available. On Thu, Sep 22, 2011 at 12:46 AM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: the error is ' [DEBUG] galaxy:139 2011-09-22 00:03:21,055: Galaxy UI does not seem to be accessible. [DEBUG] master:1491 2011-09-22 00:03:21,055: S&S: SGE..Shut down; FS-galaxyIndices..OK; FS-galaxyTools..OK; FS-galaxyData..OK; Postgres..OK; Galaxy..Starting; [DEBUG] root:354 2011-09-22 00:03:24,724: Managing services: [] [INFO] galaxy:30 2011-09-22 00:03:24,724: Removing 'Galaxy' service [INFO] galaxy:122 2011-09-22 00:03:24,724: Shutting down Galaxy... [DEBUG] misc:511 2011-09-22 00:03:26,067: Successfully stopped Galaxy. [DEBUG] root:354 2011-09-22 00:03:33,936: Managing services: [] [DEBUG] sge:61 2011-09-22 00:03:33,937: Unpacking SGE from '/opt/galaxy/pkg/ge6.2u5' [DEBUG] sge:76 2011-09-22 00:03:33,937: Cleaning '/opt/sge' directory. [DEBUG] sge:82 2011-09-22 00:03:34,117: Unpacking SGE to '/opt/sge'. [INFO] sge:96 2011-09-22 00:03:35,557: Configuring SGE... [DEBUG] sge:104 2011-09-22 00:03:35,558: Created SGE install template as file '/opt/sge/galaxyEC2.conf' [DEBUG] sge:112 2011-09-22 00:03:35,558: Setting up SGE. [ERROR] misc:514 2011-09-22 00:03:35,651: Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '2' and following stderr: '[: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator error resolving local host: can't resolve host name (h_errno = HOST_NOT_FOUND) j ________________________________ From: Enis Afgan [afgane@gmail.com<mailto:afgane@gmail.com>] Sent: Tuesday, September 13, 2011 4:20 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu> Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Hi Joe, If you look in /mnt/cm/paster.log on the instance, are there any indications as to what went wrong? It should be toward the top of the log after the server gets started. SGE gets installed each time an instance is rebooted so simply rebooting it again may do the trick. You can also chose to manually remove/clean SGE before rebooting. To do so, you can follow the basic approach captured in this method: https://bitbucket.org/galaxy/cloudman/src/862d1087080f/cm/services/apps/sge.... Enis On Sun, Sep 11, 2011 at 12:05 AM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: Hi, Upon restarting a saved cloud instance I am missing: -bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved? The instance now does not boot beyond this point. Have login and admin console access. joe ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

Reply

Sign in to reply online Use email software

Joseph Hargitai

23 Sep 23 Sep

6:39 p.m.

New subject: cloud instance missing /opt/sge/default/common directory

Hi, the mount is no longer there: ubuntu@ip-10-68-42-15:~$ more /etc/hosts 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts ubuntu@ip-10-68-42-15:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 15G 9.5G 5.6G 63% / devtmpfs 7.3G 128K 7.3G 1% /dev none 7.6G 0 7.6G 0% /dev/shm none 7.6G 96K 7.6G 1% /var/run none 7.6G 0 7.6G 0% /var/lock none 7.6G 0 7.6G 0% /lib/init/rw /dev/sdb 414G 201M 393G 1% /mnt ubuntu@ip-10-68-42-15:~$ cd /mnt ubuntu@ip-10-68-42-15:/mnt$ ls cm lost+found ubuntu@ip-10-68-42-15:/mnt$ more /etc/fstab # /etc/fstab: static file system information. # <file system> <mount point> <type> <options

<dump> <pass>

proc /proc proc nodev,no exec,nosuid 0 0 /dev/sda1 / xfs defaults 0 0 /dev/sdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 0 ________________________________ From: Enis Afgan [eafgan@emory.edu] Sent: Thursday, September 22, 2011 8:48 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Hi Joe, And this is happening on a freshly booted instance (from a previously existing cluster) using the same AMI? The order of execution seems a bit odd, seeing Galaxy being removed before SGE is setup; SGE should be the first thing that gets setup so I'm wondering... If you log into the instance, what is in /etc/hosts? Does it match the instance DNS? And if you try executing that same command (cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf) by hand (as root), is any more info produced? Also, qmaster log should be available under /opt/sge/ge6 (or something like this) /default/spool/qmaster/ so please take a look there as well and see if more info is available. On Thu, Sep 22, 2011 at 12:46 AM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: the error is ' [DEBUG] galaxy:139 2011-09-22 00:03:21,055: Galaxy UI does not seem to be accessible. [DEBUG] master:1491 2011-09-22 00:03:21,055: S&S: SGE..Shut down; FS-galaxyIndices..OK; FS-galaxyTools..OK; FS-galaxyData..OK; Postgres..OK; Galaxy..Starting; [DEBUG] root:354 2011-09-22 00:03:24,724: Managing services: [] [INFO] galaxy:30 2011-09-22 00:03:24,724: Removing 'Galaxy' service [INFO] galaxy:122 2011-09-22 00:03:24,724: Shutting down Galaxy... [DEBUG] misc:511 2011-09-22 00:03:26,067: Successfully stopped Galaxy. [DEBUG] root:354 2011-09-22 00:03:33,936: Managing services: [] [DEBUG] sge:61 2011-09-22 00:03:33,937: Unpacking SGE from '/opt/galaxy/pkg/ge6.2u5' [DEBUG] sge:76 2011-09-22 00:03:33,937: Cleaning '/opt/sge' directory. [DEBUG] sge:82 2011-09-22 00:03:34,117: Unpacking SGE to '/opt/sge'. [INFO] sge:96 2011-09-22 00:03:35,557: Configuring SGE... [DEBUG] sge:104 2011-09-22 00:03:35,558: Created SGE install template as file '/opt/sge/galaxyEC2.conf' [DEBUG] sge:112 2011-09-22 00:03:35,558: Setting up SGE. [ERROR] misc:514 2011-09-22 00:03:35,651: Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '2' and following stderr: '[: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator error resolving local host: can't resolve host name (h_errno = HOST_NOT_FOUND) j ________________________________ From: Enis Afgan [afgane@gmail.com<mailto:afgane@gmail.com>] Sent: Tuesday, September 13, 2011 4:20 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu> Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Hi Joe, If you look in /mnt/cm/paster.log on the instance, are there any indications as to what went wrong? It should be toward the top of the log after the server gets started. SGE gets installed each time an instance is rebooted so simply rebooting it again may do the trick. You can also chose to manually remove/clean SGE before rebooting. To do so, you can follow the basic approach captured in this method: https://bitbucket.org/galaxy/cloudman/src/862d1087080f/cm/services/apps/sge.... Enis On Sun, Sep 11, 2011 at 12:05 AM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: Hi, Upon restarting a saved cloud instance I am missing: -bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved? The instance now does not boot beyond this point. Have login and admin console access. joe ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

Reply

Sign in to reply online Use email software

Enis Afgan

26 Sep 26 Sep

6:22 p.m.

Are you continuously running this instance and just restarting it or do you start a clean instance each time? I'd suggest ensuring the persistent_data.yaml in your cluster's bucket is correct (see below), terminating this instance, and starting a new one with the same user data as used for this cluster. That way you should get a clean (i.e., standardized) start and that is something that should follow the process correctly. persistent_data.yaml contains the data required to reestablish an existing cluster and must follow the following format but contain proper values (especially if you customized your cluster via custom snapshots). This file is in your cluster's bucket on S3 so you can download it, check it, and upload if anything was changed or got out of sync. data_filesystems: galaxyData: - size: 50 vol_id: vol-909342f8 services: - service: SGE - service: Postgres - service: Galaxy static_filesystems: - filesystem: galaxyIndices size: 700 snap_id: !!python/unicode 'snap-5b030634' - filesystem: galaxyTools size: 2 snap_id: !!python/unicode 'snap-1688b978' On Fri, Sep 23, 2011 at 1:09 PM, Joseph Hargitai < joseph.hargitai@einstein.yu.edu> wrote:

Hi,

the mount is no longer there:

ubuntu@ip-10-68-42-15:~$ more /etc/hosts 127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts ubuntu@ip-10-68-42-15:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 15G 9.5G 5.6G 63% / devtmpfs 7.3G 128K 7.3G 1% /dev none 7.6G 0 7.6G 0% /dev/shm none 7.6G 96K 7.6G 1% /var/run none 7.6G 0 7.6G 0% /var/lock none 7.6G 0 7.6G 0% /lib/init/rw /dev/sdb 414G 201M 393G 1% /mnt ubuntu@ip-10-68-42-15:~$ cd /mnt ubuntu@ip-10-68-42-15:/mnt$ ls cm lost+found

ubuntu@ip-10-68-42-15:/mnt$ more /etc/fstab # /etc/fstab: static file system information. # <file system> <mount point> <type> <options

...
<dump> <pass>

proc /proc proc nodev,no exec,nosuid 0 0 /dev/sda1 / xfs defaults 0 0 /dev/sdb /mnt auto defaults,nobootwait,comment=cloudconfig 0

0

------------------------------ *From:* Enis Afgan [eafgan@emory.edu] *Sent:* Thursday, September 22, 2011 8:48 AM

*To:* Joseph Hargitai *Cc:* galaxy-dev@lists.bx.psu.edu *Subject:* Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory

Hi Joe, And this is happening on a freshly booted instance (from a previously existing cluster) using the same AMI? The order of execution seems a bit odd, seeing Galaxy being removed before SGE is setup; SGE should be the first thing that gets setup so I'm wondering...

If you log into the instance, what is in /etc/hosts? Does it match the instance DNS?

And if you try executing that same command (cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf) by hand (as root), is any more info produced? Also, qmaster log should be available under /opt/sge/ge6 (or something like this) /default/spool/qmaster/ so please take a look there as well and see if more info is available.

On Thu, Sep 22, 2011 at 12:46 AM, Joseph Hargitai < joseph.hargitai@einstein.yu.edu> wrote:

...
the error is

' [DEBUG] galaxy:139 2011-09-22 00:03:21,055: Galaxy UI does not seem to be accessible. [DEBUG] master:1491 2011-09-22 00:03:21,055: S&S: SGE..Shut down; FS-galaxyIndices..OK; FS-galaxyTools..OK; FS-galaxyData..OK; Postgres..OK; Galaxy..Starting; [DEBUG] root:354 2011-09-22 00:03:24,724: Managing services: [] [INFO] galaxy:30 2011-09-22 00:03:24,724: Removing 'Galaxy' service [INFO] galaxy:122 2011-09-22 00:03:24,724: Shutting down Galaxy... [DEBUG] misc:511 2011-09-22 00:03:26,067: Successfully stopped Galaxy. [DEBUG] root:354 2011-09-22 00:03:33,936: Managing services: [] [DEBUG] sge:61 2011-09-22 00:03:33,937: Unpacking SGE from '/opt/galaxy/pkg/ge6.2u5' [DEBUG] sge:76 2011-09-22 00:03:33,937: Cleaning '/opt/sge' directory. [DEBUG] sge:82 2011-09-22 00:03:34,117: Unpacking SGE to '/opt/sge'. [INFO] sge:96 2011-09-22 00:03:35,557: Configuring SGE... [DEBUG] sge:104 2011-09-22 00:03:35,558: Created SGE install template as file '/opt/sge/galaxyEC2.conf' [DEBUG] sge:112 2011-09-22 00:03:35,558: Setting up SGE. [ERROR] misc:514 2011-09-22 00:03:35,651: Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '2' and following stderr: '[: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator error resolving local host: can't resolve host name (h_errno = HOST_NOT_FOUND)

j

------------------------------ *From:* Enis Afgan [afgane@gmail.com] *Sent:* Tuesday, September 13, 2011 4:20 AM *To:* Joseph Hargitai *Cc:* galaxy-dev@lists.bx.psu.edu *Subject:* Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory

Hi Joe, If you look in /mnt/cm/paster.log on the instance, are there any indications as to what went wrong? It should be toward the top of the log after the server gets started. SGE gets installed each time an instance is rebooted so simply rebooting it again may do the trick. You can also chose to manually remove/clean SGE before rebooting. To do so, you can follow the basic approach captured in this method: https://bitbucket.org/galaxy/cloudman/src/862d1087080f/cm/services/apps/sge....

Enis

On Sun, Sep 11, 2011 at 12:05 AM, Joseph Hargitai < joseph.hargitai@einstein.yu.edu> wrote:

...
Hi,

Upon restarting a saved cloud instance I am missing:

-bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory

all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved?

The instance now does not boot beyond this point. Have login and admin console access.

joe

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Reply

Sign in to reply online Use email software

Joseph Hargitai

11:35 p.m.

New subject: cloud instance missing /opt/sge/default/common directory

It is there: data_filesystems: galaxyData: - size: 200 vol_id: vol-5b059a30 services: - service: SGE - service: Postgres - service: Galaxy static_filesystems: - filesystem: galaxyIndices size: 700 snap_id: !!python/unicode 'snap-5b030634' - filesystem: galaxyTools size: 2 snap_id: !!python/unicode 'snap-7dad9712' ________________________________ From: Enis Afgan [eafgan@emory.edu] Sent: Monday, September 26, 2011 8:52 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Are you continuously running this instance and just restarting it or do you start a clean instance each time? I'd suggest ensuring the persistent_data.yaml in your cluster's bucket is correct (see below), terminating this instance, and starting a new one with the same user data as used for this cluster. That way you should get a clean (i.e., standardized) start and that is something that should follow the process correctly. persistent_data.yaml contains the data required to reestablish an existing cluster and must follow the following format but contain proper values (especially if you customized your cluster via custom snapshots). This file is in your cluster's bucket on S3 so you can download it, check it, and upload if anything was changed or got out of sync. data_filesystems: galaxyData: - size: 50 vol_id: vol-909342f8 services: - service: SGE - service: Postgres - service: Galaxy static_filesystems: - filesystem: galaxyIndices size: 700 snap_id: !!python/unicode 'snap-5b030634' - filesystem: galaxyTools size: 2 snap_id: !!python/unicode 'snap-1688b978' On Fri, Sep 23, 2011 at 1:09 PM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: Hi, the mount is no longer there: ubuntu@ip-10-68-42-15:~$ more /etc/hosts 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts ubuntu@ip-10-68-42-15:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 15G 9.5G 5.6G 63% / devtmpfs 7.3G 128K 7.3G 1% /dev none 7.6G 0 7.6G 0% /dev/shm none 7.6G 96K 7.6G 1% /var/run none 7.6G 0 7.6G 0% /var/lock none 7.6G 0 7.6G 0% /lib/init/rw /dev/sdb 414G 201M 393G 1% /mnt ubuntu@ip-10-68-42-15:~$ cd /mnt ubuntu@ip-10-68-42-15:/mnt$ ls cm lost+found ubuntu@ip-10-68-42-15:/mnt$ more /etc/fstab # /etc/fstab: static file system information. # <file system> <mount point> <type> <options

<dump> <pass>

proc /proc proc nodev,no exec,nosuid 0 0 /dev/sda1 / xfs defaults 0 0 /dev/sdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 0 ________________________________ From: Enis Afgan [eafgan@emory.edu<mailto:eafgan@emory.edu>] Sent: Thursday, September 22, 2011 8:48 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu> Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Hi Joe, And this is happening on a freshly booted instance (from a previously existing cluster) using the same AMI? The order of execution seems a bit odd, seeing Galaxy being removed before SGE is setup; SGE should be the first thing that gets setup so I'm wondering... If you log into the instance, what is in /etc/hosts? Does it match the instance DNS? And if you try executing that same command (cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf) by hand (as root), is any more info produced? Also, qmaster log should be available under /opt/sge/ge6 (or something like this) /default/spool/qmaster/ so please take a look there as well and see if more info is available. On Thu, Sep 22, 2011 at 12:46 AM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: the error is ' [DEBUG] galaxy:139 2011-09-22 00:03:21,055: Galaxy UI does not seem to be accessible. [DEBUG] master:1491 2011-09-22 00:03:21,055: S&S: SGE..Shut down; FS-galaxyIndices..OK; FS-galaxyTools..OK; FS-galaxyData..OK; Postgres..OK; Galaxy..Starting; [DEBUG] root:354 2011-09-22 00:03:24,724: Managing services: [] [INFO] galaxy:30 2011-09-22 00:03:24,724: Removing 'Galaxy' service [INFO] galaxy:122 2011-09-22 00:03:24,724: Shutting down Galaxy... [DEBUG] misc:511 2011-09-22 00:03:26,067: Successfully stopped Galaxy. [DEBUG] root:354 2011-09-22 00:03:33,936: Managing services: [] [DEBUG] sge:61 2011-09-22 00:03:33,937: Unpacking SGE from '/opt/galaxy/pkg/ge6.2u5' [DEBUG] sge:76 2011-09-22 00:03:33,937: Cleaning '/opt/sge' directory. [DEBUG] sge:82 2011-09-22 00:03:34,117: Unpacking SGE to '/opt/sge'. [INFO] sge:96 2011-09-22 00:03:35,557: Configuring SGE... [DEBUG] sge:104 2011-09-22 00:03:35,558: Created SGE install template as file '/opt/sge/galaxyEC2.conf' [DEBUG] sge:112 2011-09-22 00:03:35,558: Setting up SGE. [ERROR] misc:514 2011-09-22 00:03:35,651: Setting up SGE did not go smoothly, running command 'cd /opt/sge; ./inst_sge -m -x -auto /opt/sge/galaxyEC2.conf' returned code '2' and following stderr: '[: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator [: 359: 11: unexpected operator error resolving local host: can't resolve host name (h_errno = HOST_NOT_FOUND) j ________________________________ From: Enis Afgan [afgane@gmail.com<mailto:afgane@gmail.com>] Sent: Tuesday, September 13, 2011 4:20 AM To: Joseph Hargitai Cc: galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu> Subject: Re: [galaxy-dev] cloud instance missing /opt/sge/default/common directory Hi Joe, If you look in /mnt/cm/paster.log on the instance, are there any indications as to what went wrong? It should be toward the top of the log after the server gets started. SGE gets installed each time an instance is rebooted so simply rebooting it again may do the trick. You can also chose to manually remove/clean SGE before rebooting. To do so, you can follow the basic approach captured in this method: https://bitbucket.org/galaxy/cloudman/src/862d1087080f/cm/services/apps/sge.... Enis On Sun, Sep 11, 2011 at 12:05 AM, Joseph Hargitai <joseph.hargitai@einstein.yu.edu<mailto:joseph.hargitai@einstein.yu.edu>> wrote: Hi, Upon restarting a saved cloud instance I am missing: -bash: /opt/sge/default/common/settings.sh: No such file or directory -bash: /opt/sge/default/common/settings.sh: No such file or directory all the other mounts are there and well preserved. Is this pulled from a special place i may have not saved? The instance now does not boot beyond this point. Have login and admin console access. joe ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

Reply

Sign in to reply online Use email software

5287

Age (days ago)

5302

Last active (days ago)

Download

7 comments

3 participants

tags

participants (3)

Enis Afgan
Enis Afgan
Joseph Hargitai