Re: [galaxy-dev] Error with setuptools version in Galaxy installation on Cluster
Hi Sonali, Since this is a local installation question, I have moved the discussion to galaxy-dev. Further responses are inline below. Sonali Amonkar wrote:
Hi,
First of all I would like to appreciate the clean and neat documentation written for the Galaxy setup for Cluster.
I am attempting to install Galaxy on Penguin cluster with Torque PBS manager. While following the steps I came across an error as below:
=========================================================================================================================================================== [user@server pbs_python]$ /galaxy/Python-2.4.6/python scramble.py
--------------------------------------------------------------------------- This script requires setuptools version 0.6c12 to run (even to display help). I will attempt to download it for you (from http://pypi.python.org/packages/2.4/s/setuptools/), but you may need to enable firewall access for this script first. I will start the download in 8 seconds.
(Note: if this machine does not have network access, please obtain the file
http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c12-py2.4.egg
and place it in this directory before rerunning this script.) --------------------------------------------------------------------------- Downloading http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c12-py2.4.egg Traceback (most recent call last): File "scramble.py", line 14, in ? from scramble_lib import * File "../../../lib/scramble_lib.py", line 150, in ? use_setuptools( download_delay=8, to_dir=os.path.dirname( __file__ ) ) File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 92, in use_setuptools return do_download() File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 70, in do_download egg = download_setuptools(version, download_base, to_dir, download_delay) File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 131, in download_setuptools src = urllib2.urlopen(url) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 130, in urlopen return _opener.open(url, data) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 364, in open response = meth(req, response) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 471, in http_response response = self.parent.error( File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 402, in error return self._call_chain(*args) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 337, in _call_chain result = func(*args) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 480, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found ===========================================================================================================================================================
The scramble script here is trying to download an egg which does not exist at the location (try opening the link it is trying to download in a browser - you'll get 404, then see the parent directory - http://pypi.python.org/packages/2.4/s/setuptools/ you'll see last version available i.e. 0.6c11).
Hi Sonali, 0.6c12 is still in development and I'm not sure where this is coming from. The version of ez_setup.py in: /galaxy/galaxy-dist/scripts/scramble/lib/ references 0.6c11, and this file is copied to the build directory in: galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ What's the value of DEFAULT_VERSION in: /galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py ?
If we try to download the last version of setuptools 0.6c11 and rename it to the latest version (desperate attempt to make it work), that doesn't work either.
=========================================================================================================================================================== [user@server pbs_python]$ mv setuptools-0.6c11-py2.4.egg setuptools-0.6c12-py2.4.egg
[user@server pbs_python]$ /galaxy/Python-2.4.6/python scramble.py checking for pbs-config... /usr/lib64/../bin/pbs-config Found torque version: 2.4.0-snap.200812091621 checking for python... /galaxy/Python-2.4.6/python checking for python version... 2.4 checking for python platform... linux2 checking for python script directory... ${prefix}/lib/python2.4/site-packages checking for python extension module directory... ${exec_prefix}/lib/python2.4/site-packages configure: creating ./config.status config.status: creating Makefile config.status: creating setup.py scramble(): Patching setup.py Traceback (most recent call last): File "scramble.py", line 49, in ? execfile( "setup.py", globals(), locals() ) File "setup.py", line 32, in ? build_version = int(tmp[2]) ValueError: invalid literal for int(): 0-snap
===========================================================================================================================================================
This is a problem with your TORQUE version and pbs_python's assumptions about its version numbering. pbs_python expects it to be all-numeric whereas yours is a development snapshot with a non-integer value in the revision portion of the version (2.4.0-snap.200812091621). ^^^^^^^^^^^^^^^^^^^ There are a few ways to proceed. One would be to upgrade your TORQUE client to a release version. Another would be to try the drmaa job runner instead of pbs, since TORQUE provides a DRMAA C library, although I don't believe anyone has used it with TORQUE yet. The last would be to modify pbs_python's build process. Start by changing directories to: /galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python Edit setup.py and comment all of the following: #VERSION = "2.4.0-snap.200812091621" #tmp = VERSION.split('.') #major_version = int(tmp[0]) #minor_version = int(tmp[1]) #build_version = int(tmp[2]) #if major_version >= 2 and minor_version >= 4 and build_version >= 7: # os.symlink('pbs_wrap_2.4.c', 'pbs_wrap.c') # os.symlink('pbs_2.4.py', 'pbs.py') # TORQUE_VERSION='TORQUE_2_4' #else: And then force the old version by un-indenting or doing something silly like: if True: os.symlink('pbs_wrap_2.1.c', 'pbs_wrap.c') os.symlink('pbs_2.1.py', 'pbs.py') TORQUE_VERSION='TORQUE_OLD' However, running scramble.py again will overwrite your changes, so you will also need to modify scramble.py (in the same directory, not the main scramble.py in /galaxy/galaxy-dist/scripts) and comment out: #run( 'sh configure --with-pbsdir=%s' % os.environ['LIBTORQUE_DIR'], os.getcwd(), 'Running pbs_python configure script' ) Once this is done you should be able to run the *LOCAL* scramble.py in the build directory: $ /galaxy/Python-2.4.6/python ./scramble.py If this succeeds, the egg will be built and placed in the dist/ subdirectory. The egg can then be copied to: /galaxy/galaxy-dist/eggs/ Please let us know how this goes. --nate
Do let me know if anyone else has faced this issue of the scramble script trying to download a wrong version? Greatly appreciate your time.
Warm Regards, Sonali Amonkar
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
Hi Nate, The last option you gave worked like a charm! Thank you for your assistance! Warm regards, Sonali Amonkar -----Original Message----- From: Nate Coraor [mailto:nate@bx.psu.edu] Sent: Thursday, January 27, 2011 7:21 PM To: Sonali Amonkar Cc: Galaxy Dev Subject: Re: Error with setuptools version in Galaxy installation on Cluster Hi Sonali, Since this is a local installation question, I have moved the discussion to galaxy-dev. Further responses are inline below. Sonali Amonkar wrote:
Hi,
First of all I would like to appreciate the clean and neat documentation written for the Galaxy setup for Cluster.
I am attempting to install Galaxy on Penguin cluster with Torque PBS manager. While following the steps I came across an error as below:
====================================================================== ====================================================================== =============== [user@server pbs_python]$ /galaxy/Python-2.4.6/python scramble.py
---------------------------------------------------------------------- ----- This script requires setuptools version 0.6c12 to run (even to display help). I will attempt to download it for you (from http://pypi.python.org/packages/2.4/s/setuptools/), but you may need to enable firewall access for this script first. I will start the download in 8 seconds.
(Note: if this machine does not have network access, please obtain the file
http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c12-py2 .4.egg
and place it in this directory before rerunning this script.) ---------------------------------------------------------------------- ----- Downloading http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c12-py2 .4.egg Traceback (most recent call last): File "scramble.py", line 14, in ? from scramble_lib import * File "../../../lib/scramble_lib.py", line 150, in ? use_setuptools( download_delay=8, to_dir=os.path.dirname( __file__ ) ) File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 92, in use_setuptools return do_download() File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 70, in do_download egg = download_setuptools(version, download_base, to_dir, download_delay) File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 131, in download_setuptools src = urllib2.urlopen(url) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 130, in urlopen return _opener.open(url, data) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 364, in open response = meth(req, response) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 471, in http_response response = self.parent.error( File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 402, in error return self._call_chain(*args) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 337, in _call_chain result = func(*args) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 480, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found ====================================================================== ====================================================================== ===============
The scramble script here is trying to download an egg which does not exist at the location (try opening the link it is trying to download in a browser - you'll get 404, then see the parent directory - http://pypi.python.org/packages/2.4/s/setuptools/ you'll see last version available i.e. 0.6c11).
Hi Sonali, 0.6c12 is still in development and I'm not sure where this is coming from. The version of ez_setup.py in: /galaxy/galaxy-dist/scripts/scramble/lib/ references 0.6c11, and this file is copied to the build directory in: galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ What's the value of DEFAULT_VERSION in: /galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py ?
If we try to download the last version of setuptools 0.6c11 and rename it to the latest version (desperate attempt to make it work), that doesn't work either.
====================================================================== ====================================================================== =============== [user@server pbs_python]$ mv setuptools-0.6c11-py2.4.egg setuptools-0.6c12-py2.4.egg
[user@server pbs_python]$ /galaxy/Python-2.4.6/python scramble.py checking for pbs-config... /usr/lib64/../bin/pbs-config Found torque version: 2.4.0-snap.200812091621 checking for python... /galaxy/Python-2.4.6/python checking for python version... 2.4 checking for python platform... linux2 checking for python script directory... ${prefix}/lib/python2.4/site-packages checking for python extension module directory... ${exec_prefix}/lib/python2.4/site-packages configure: creating ./config.status config.status: creating Makefile config.status: creating setup.py scramble(): Patching setup.py Traceback (most recent call last): File "scramble.py", line 49, in ? execfile( "setup.py", globals(), locals() ) File "setup.py", line 32, in ? build_version = int(tmp[2]) ValueError: invalid literal for int(): 0-snap
====================================================================== ====================================================================== ===============
This is a problem with your TORQUE version and pbs_python's assumptions about its version numbering. pbs_python expects it to be all-numeric whereas yours is a development snapshot with a non-integer value in the revision portion of the version (2.4.0-snap.200812091621). ^^^^^^^^^^^^^^^^^^^ There are a few ways to proceed. One would be to upgrade your TORQUE client to a release version. Another would be to try the drmaa job runner instead of pbs, since TORQUE provides a DRMAA C library, although I don't believe anyone has used it with TORQUE yet. The last would be to modify pbs_python's build process. Start by changing directories to: /galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python Edit setup.py and comment all of the following: #VERSION = "2.4.0-snap.200812091621" #tmp = VERSION.split('.') #major_version = int(tmp[0]) #minor_version = int(tmp[1]) #build_version = int(tmp[2]) #if major_version >= 2 and minor_version >= 4 and build_version >= 7: # os.symlink('pbs_wrap_2.4.c', 'pbs_wrap.c') # os.symlink('pbs_2.4.py', 'pbs.py') # TORQUE_VERSION='TORQUE_2_4' #else: And then force the old version by un-indenting or doing something silly like: if True: os.symlink('pbs_wrap_2.1.c', 'pbs_wrap.c') os.symlink('pbs_2.1.py', 'pbs.py') TORQUE_VERSION='TORQUE_OLD' However, running scramble.py again will overwrite your changes, so you will also need to modify scramble.py (in the same directory, not the main scramble.py in /galaxy/galaxy-dist/scripts) and comment out: #run( 'sh configure --with-pbsdir=%s' % os.environ['LIBTORQUE_DIR'], os.getcwd(), 'Running pbs_python configure script' ) Once this is done you should be able to run the *LOCAL* scramble.py in the build directory: $ /galaxy/Python-2.4.6/python ./scramble.py If this succeeds, the egg will be built and placed in the dist/ subdirectory. The egg can then be copied to: /galaxy/galaxy-dist/eggs/ Please let us know how this goes. --nate
Do let me know if anyone else has faced this issue of the scramble script trying to download a wrong version? Greatly appreciate your time.
Warm Regards, Sonali Amonkar
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
To be more precise I made the following change in scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/scramble.py # version string in 2.9.4 setup.py is wrong print "scramble(): Patching setup.py" if not os.path.exists( 'setup.py.orig' ): shutil.copyfile( 'setup.py', 'setup.py.orig' ) i = open( 'setup.py.orig', 'r' ) o = open( 'setup.py', 'w' ) for line in i.readlines(): if line == " version = '4.0.0',\n": line = " version = '4.1.0',\n" print >>o, line, i.close() o.close() I am currently facing another issue. When I run my Workflow, I am seeing the following error on the server log. This error is not consistent, and occurs in an erratic manner. galaxy.jobs INFO 2011-02-03 05:17:03,522 job 151 dispatched galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,755 (150/69156.<primaryserver>) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,879 (151) submitting file galaxy-dist/database/pbs/151.sh galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) command is: java -cp galaxy-dist/tools/my_tools/jars/PreRef1.jar RefFilterModule galaxy-dist/database/files/000/dataset_192.dat galaxy-dist/database/files/000/dataset_194.dat galaxy-dist/database/files/000/dataset_195.dat galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error galaxy.jobs DEBUG 2011-02-03 05:17:13,363 job 150 ended galaxy.jobs ERROR 2011-02-03 05:17:15,816 Unable to cleanup job 152 Any help/pointer would be appreciated for this issue. Thank you very much for your time Nate. Regards, Sonali -----Original Message----- From: Sonali Amonkar Sent: Wednesday, February 02, 2011 7:54 PM To: 'Nate Coraor' Cc: Galaxy Dev Subject: RE: Error with setuptools version in Galaxy installation on Cluster Hi Nate, The last option you gave worked like a charm! Thank you for your assistance! Warm regards, Sonali Amonkar -----Original Message----- From: Nate Coraor [mailto:nate@bx.psu.edu] Sent: Thursday, January 27, 2011 7:21 PM To: Sonali Amonkar Cc: Galaxy Dev Subject: Re: Error with setuptools version in Galaxy installation on Cluster Hi Sonali, Since this is a local installation question, I have moved the discussion to galaxy-dev. Further responses are inline below. Sonali Amonkar wrote:
Hi,
First of all I would like to appreciate the clean and neat documentation written for the Galaxy setup for Cluster.
I am attempting to install Galaxy on Penguin cluster with Torque PBS manager. While following the steps I came across an error as below:
====================================================================== ====================================================================== =============== [user@server pbs_python]$ /galaxy/Python-2.4.6/python scramble.py
---------------------------------------------------------------------- ----- This script requires setuptools version 0.6c12 to run (even to display help). I will attempt to download it for you (from http://pypi.python.org/packages/2.4/s/setuptools/), but you may need to enable firewall access for this script first. I will start the download in 8 seconds.
(Note: if this machine does not have network access, please obtain the file
http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c12-py2 .4.egg
and place it in this directory before rerunning this script.) ---------------------------------------------------------------------- ----- Downloading http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c12-py2 .4.egg Traceback (most recent call last): File "scramble.py", line 14, in ? from scramble_lib import * File "../../../lib/scramble_lib.py", line 150, in ? use_setuptools( download_delay=8, to_dir=os.path.dirname( __file__ ) ) File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 92, in use_setuptools return do_download() File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 70, in do_download egg = download_setuptools(version, download_base, to_dir, download_delay) File "/galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py", line 131, in download_setuptools src = urllib2.urlopen(url) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 130, in urlopen return _opener.open(url, data) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 364, in open response = meth(req, response) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 471, in http_response response = self.parent.error( File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 402, in error return self._call_chain(*args) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 337, in _call_chain result = func(*args) File "/galaxy/Python-2.4.6/Lib/urllib2.py", line 480, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found ====================================================================== ====================================================================== ===============
The scramble script here is trying to download an egg which does not exist at the location (try opening the link it is trying to download in a browser - you'll get 404, then see the parent directory - http://pypi.python.org/packages/2.4/s/setuptools/ you'll see last version available i.e. 0.6c11).
Hi Sonali, 0.6c12 is still in development and I'm not sure where this is coming from. The version of ez_setup.py in: /galaxy/galaxy-dist/scripts/scramble/lib/ references 0.6c11, and this file is copied to the build directory in: galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ What's the value of DEFAULT_VERSION in: /galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python/ez_setup.py ?
If we try to download the last version of setuptools 0.6c11 and rename it to the latest version (desperate attempt to make it work), that doesn't work either.
====================================================================== ====================================================================== =============== [user@server pbs_python]$ mv setuptools-0.6c11-py2.4.egg setuptools-0.6c12-py2.4.egg
[user@server pbs_python]$ /galaxy/Python-2.4.6/python scramble.py checking for pbs-config... /usr/lib64/../bin/pbs-config Found torque version: 2.4.0-snap.200812091621 checking for python... /galaxy/Python-2.4.6/python checking for python version... 2.4 checking for python platform... linux2 checking for python script directory... ${prefix}/lib/python2.4/site-packages checking for python extension module directory... ${exec_prefix}/lib/python2.4/site-packages configure: creating ./config.status config.status: creating Makefile config.status: creating setup.py scramble(): Patching setup.py Traceback (most recent call last): File "scramble.py", line 49, in ? execfile( "setup.py", globals(), locals() ) File "setup.py", line 32, in ? build_version = int(tmp[2]) ValueError: invalid literal for int(): 0-snap
====================================================================== ====================================================================== ===============
This is a problem with your TORQUE version and pbs_python's assumptions about its version numbering. pbs_python expects it to be all-numeric whereas yours is a development snapshot with a non-integer value in the revision portion of the version (2.4.0-snap.200812091621). ^^^^^^^^^^^^^^^^^^^ There are a few ways to proceed. One would be to upgrade your TORQUE client to a release version. Another would be to try the drmaa job runner instead of pbs, since TORQUE provides a DRMAA C library, although I don't believe anyone has used it with TORQUE yet. The last would be to modify pbs_python's build process. Start by changing directories to: /galaxy/galaxy-dist/scripts/scramble/build/py2.4-linux-x86_64-ucs2/pbs_python Edit setup.py and comment all of the following: #VERSION = "2.4.0-snap.200812091621" #tmp = VERSION.split('.') #major_version = int(tmp[0]) #minor_version = int(tmp[1]) #build_version = int(tmp[2]) #if major_version >= 2 and minor_version >= 4 and build_version >= 7: # os.symlink('pbs_wrap_2.4.c', 'pbs_wrap.c') # os.symlink('pbs_2.4.py', 'pbs.py') # TORQUE_VERSION='TORQUE_2_4' #else: And then force the old version by un-indenting or doing something silly like: if True: os.symlink('pbs_wrap_2.1.c', 'pbs_wrap.c') os.symlink('pbs_2.1.py', 'pbs.py') TORQUE_VERSION='TORQUE_OLD' However, running scramble.py again will overwrite your changes, so you will also need to modify scramble.py (in the same directory, not the main scramble.py in /galaxy/galaxy-dist/scripts) and comment out: #run( 'sh configure --with-pbsdir=%s' % os.environ['LIBTORQUE_DIR'], os.getcwd(), 'Running pbs_python configure script' ) Once this is done you should be able to run the *LOCAL* scramble.py in the build directory: $ /galaxy/Python-2.4.6/python ./scramble.py If this succeeds, the egg will be built and placed in the dist/ subdirectory. The egg can then be copied to: /galaxy/galaxy-dist/eggs/ Please let us know how this goes. --nate
Do let me know if anyone else has faced this issue of the scramble script trying to download a wrong version? Greatly appreciate your time.
Warm Regards, Sonali Amonkar
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Sonali Amonkar wrote:
I am currently facing another issue. When I run my Workflow, I am seeing the following error on the server log. This error is not consistent, and occurs in an erratic manner.
galaxy.jobs INFO 2011-02-03 05:17:03,522 job 151 dispatched galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,755 (150/69156.<primaryserver>) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,879 (151) submitting file galaxy-dist/database/pbs/151.sh galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) command is: java -cp galaxy-dist/tools/my_tools/jars/PreRef1.jar RefFilterModule galaxy-dist/database/files/000/dataset_192.dat galaxy-dist/database/files/000/dataset_194.dat galaxy-dist/database/files/000/dataset_195.dat galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error galaxy.jobs DEBUG 2011-02-03 05:17:13,363 job 150 ended galaxy.jobs ERROR 2011-02-03 05:17:15,816 Unable to cleanup job 152
Hi Sonali, I am pretty sure this problem is somehow specific to the TORQUE setup, and I see you also posted this to the torquedev list, but unfortunately received no response. I am not sure what is up here, but you may want to try adjusting the 'tcp_timeout' server setting. (qmgr -c 'set server tcp_timeout = X') --nate
Any help/pointer would be appreciated for this issue. Thank you very much for your time Nate.
Regards, Sonali
Nate Coraor wrote:
Sonali Amonkar wrote:
I am currently facing another issue. When I run my Workflow, I am seeing the following error on the server log. This error is not consistent, and occurs in an erratic manner.
galaxy.jobs INFO 2011-02-03 05:17:03,522 job 151 dispatched galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,755 (150/69156.<primaryserver>) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,879 (151) submitting file galaxy-dist/database/pbs/151.sh galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) command is: java -cp galaxy-dist/tools/my_tools/jars/PreRef1.jar RefFilterModule galaxy-dist/database/files/000/dataset_192.dat galaxy-dist/database/files/000/dataset_194.dat galaxy-dist/database/files/000/dataset_195.dat galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error galaxy.jobs DEBUG 2011-02-03 05:17:13,363 job 150 ended galaxy.jobs ERROR 2011-02-03 05:17:15,816 Unable to cleanup job 152
Hi Sonali,
I am pretty sure this problem is somehow specific to the TORQUE setup, and I see you also posted this to the torquedev list, but unfortunately received no response.
I am not sure what is up here, but you may want to try adjusting the 'tcp_timeout' server setting. (qmgr -c 'set server tcp_timeout = X')
Also, you may want to see if PBS is making an attempt to queue this job on a particular node, and if so, check the mom_logs for that node.
--nate
Any help/pointer would be appreciated for this issue. Thank you very much for your time Nate.
Regards, Sonali
_______________________________________________ To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Nate, We went through the mom_logs. We could see the jobs which were completed successfully in Galaxy. However, the job/ tool run which failed were not seen. These failed jobs were never submitted to PBS queue to be logged in the mom_logs. On another note, is the error, "pbs_submit failed, PBS error 15031: Protocol (ASN.1) error" related to the pbs_python version in Galaxy and the Torque version? Also find attached a workaround we had adapted to get the Galaxy working on a PBS Torque version. Could this problem be related to one of those workarounds? Is there any specific PBS Torque version on which Galaxy has been tested? Thank you for your time Nate, Warm Regards, Sonali Amonkar -----Original Message----- From: Nate Coraor [mailto:nate@bx.psu.edu] Sent: Saturday, February 12, 2011 2:10 AM To: Sonali Amonkar Cc: Galaxy Dev Subject: Re: [galaxy-dev] Error with setuptools version in Galaxy installation on Cluster Nate Coraor wrote:
Sonali Amonkar wrote:
I am currently facing another issue. When I run my Workflow, I am seeing the following error on the server log. This error is not consistent, and occurs in an erratic manner.
galaxy.jobs INFO 2011-02-03 05:17:03,522 job 151 dispatched galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,755 (150/69156.<primaryserver>) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,879 (151) submitting file galaxy-dist/database/pbs/151.sh galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) command is: java -cp galaxy-dist/tools/my_tools/jars/PreRef1.jar RefFilterModule galaxy-dist/database/files/000/dataset_192.dat galaxy-dist/database/files/000/dataset_194.dat galaxy-dist/database/files/000/dataset_195.dat galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error galaxy.jobs DEBUG 2011-02-03 05:17:13,363 job 150 ended galaxy.jobs ERROR 2011-02-03 05:17:15,816 Unable to cleanup job 152
Hi Sonali,
I am pretty sure this problem is somehow specific to the TORQUE setup, and I see you also posted this to the torquedev list, but unfortunately received no response.
I am not sure what is up here, but you may want to try adjusting the 'tcp_timeout' server setting. (qmgr -c 'set server tcp_timeout = X')
Also, you may want to see if PBS is making an attempt to queue this job on a particular node, and if so, check the mom_logs for that node.
--nate
Any help/pointer would be appreciated for this issue. Thank you very much for your time Nate.
Regards, Sonali
_______________________________________________ To manage your subscriptions to this and other Galaxy lists, please use the interface at:
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
On further digging, we found that the script is failing in the following part of $GALAXY_HOME/lib/galaxy/jobs/runners/pbs.py: # submit galaxy_job_id = job_wrapper.job_id log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) ) log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) ) job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None) pbs.pbs_disconnect(c) # check to see if it submitted if not job_id: errno, text = pbs.error() log.debug( "(%s) pbs_submit failed, PBS error %d: %s" % (galaxy_job_id, errno, text) ) job_wrapper.fail( "Unable to run this job due to a cluster error" ) return Could this be a problem related to the pbs_python egg (v. pbs_python-4.1.0) being used by Galaxy or a Torque-specific issue? Just to reiterate, we are on a development snapshot of Torque which is hard to replace as many other people using it. Also, could you please advise which Torque & pbs_python version combinations have you successfully tested against? Regards, Sonali PS: pbs_python has a new version 4.3 out (https://subtrac.sara.nl/oss/pbs_python/wiki/TorqueInstallation), why is this not in the PSU egg repository yet? Would that make a difference? -----Original Message----- From: Sonali Amonkar Sent: Tuesday, February 15, 2011 4:10 PM To: 'Nate Coraor' Cc: Galaxy Dev Subject: RE: [galaxy-dev] Error with setuptools version in Galaxy installation on Cluster Hi Nate, We went through the mom_logs. We could see the jobs which were completed successfully in Galaxy. However, the job/ tool run which failed were not seen. These failed jobs were never submitted to PBS queue to be logged in the mom_logs. On another note, is the error, "pbs_submit failed, PBS error 15031: Protocol (ASN.1) error" related to the pbs_python version in Galaxy and the Torque version? Also find attached a workaround we had adapted to get the Galaxy working on a PBS Torque version. Could this problem be related to one of those workarounds? Is there any specific PBS Torque version on which Galaxy has been tested? Thank you for your time Nate, Warm Regards, Sonali Amonkar -----Original Message----- From: Nate Coraor [mailto:nate@bx.psu.edu] Sent: Saturday, February 12, 2011 2:10 AM To: Sonali Amonkar Cc: Galaxy Dev Subject: Re: [galaxy-dev] Error with setuptools version in Galaxy installation on Cluster Nate Coraor wrote:
Sonali Amonkar wrote:
I am currently facing another issue. When I run my Workflow, I am seeing the following error on the server log. This error is not consistent, and occurs in an erratic manner.
galaxy.jobs INFO 2011-02-03 05:17:03,522 job 151 dispatched galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,755 (150/69156.<primaryserver>) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,879 (151) submitting file galaxy-dist/database/pbs/151.sh galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) command is: java -cp galaxy-dist/tools/my_tools/jars/PreRef1.jar RefFilterModule galaxy-dist/database/files/000/dataset_192.dat galaxy-dist/database/files/000/dataset_194.dat galaxy-dist/database/files/000/dataset_195.dat galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error galaxy.jobs DEBUG 2011-02-03 05:17:13,363 job 150 ended galaxy.jobs ERROR 2011-02-03 05:17:15,816 Unable to cleanup job 152
Hi Sonali,
I am pretty sure this problem is somehow specific to the TORQUE setup, and I see you also posted this to the torquedev list, but unfortunately received no response.
I am not sure what is up here, but you may want to try adjusting the 'tcp_timeout' server setting. (qmgr -c 'set server tcp_timeout = X')
Also, you may want to see if PBS is making an attempt to queue this job on a particular node, and if so, check the mom_logs for that node.
--nate
Any help/pointer would be appreciated for this issue. Thank you very much for your time Nate.
Regards, Sonali
_______________________________________________ To manage your subscriptions to this and other Galaxy lists, please use the interface at:
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Sonali Amonkar wrote:
On further digging, we found that the script is failing in the following part of $GALAXY_HOME/lib/galaxy/jobs/runners/pbs.py:
# submit galaxy_job_id = job_wrapper.job_id log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) ) log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) ) job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None)
This is the line here, it's failing to submit the job.
pbs.pbs_disconnect(c)
# check to see if it submitted if not job_id: errno, text = pbs.error() log.debug( "(%s) pbs_submit failed, PBS error %d: %s" % (galaxy_job_id, errno, text) ) job_wrapper.fail( "Unable to run this job due to a cluster error" ) return
Could this be a problem related to the pbs_python egg (v. pbs_python-4.1.0) being used by Galaxy or a Torque-specific issue? Just to reiterate, we are on a development snapshot of Torque which is hard to replace as many other people using it.
It's possible that pbs_python is generating code which is incompatible, but since it's linked against your version of TORQUE this should not be the case. It's hard to say exactly what's causing this since it's outside of Galaxy. I'm not sure if TORQUE has any client-side debugging that would help with this issue but that's where I'd start.
Also, could you please advise which Torque & pbs_python version combinations have you successfully tested against?
We're using an older version (2.1.11) on our submission hosts since we saw performance problems when using pbs_python with the newer 2.4.x versions. The TORQUE server and execution hosts run 2.4.9.
Regards, Sonali
PS: pbs_python has a new version 4.3 out (https://subtrac.sara.nl/oss/pbs_python/wiki/TorqueInstallation), why is this not in the PSU egg repository yet? Would that make a difference?
I'm not sure if it would make a difference. I upgrade the pbs_python egg as necessary or when it's particularly far out of date. --nate
Hi Nate, We are still awaiting any replies to the error on the Torque community. About the debugging, we did try tracejob, however since the job was not getting submitted itself, Torque did not have any logging to the job(it wasn't even a job yet). Meanwhile, we are retrying deployment of Galaxy on a different version of Torque(2.3.6) with pbs_python(2.6), but now face a new error, galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,345 (34/2519.server) Removed from PBS queue before job completion galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,344 (34/2519.server) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,351 Job output not returned by PBS: the output datasets were deleted while the job was running, the job was manually dequeued or there was a cluster error. One certain job gets removed, failing the entire workflow. Please let me know if you have any information / if you have come across this error before. Many thanks for your time. Regards, Sonali -----Original Message----- From: Nate Coraor [mailto:nate@bx.psu.edu] Sent: Tuesday, February 15, 2011 10:30 PM To: Sonali Amonkar Cc: Galaxy Dev Subject: Re: [galaxy-dev] Error with setuptools version in Galaxy installation on Cluster Sonali Amonkar wrote:
On further digging, we found that the script is failing in the following part of $GALAXY_HOME/lib/galaxy/jobs/runners/pbs.py:
# submit galaxy_job_id = job_wrapper.job_id log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) ) log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) ) job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None)
This is the line here, it's failing to submit the job.
pbs.pbs_disconnect(c)
# check to see if it submitted if not job_id: errno, text = pbs.error() log.debug( "(%s) pbs_submit failed, PBS error %d: %s" % (galaxy_job_id, errno, text) ) job_wrapper.fail( "Unable to run this job due to a cluster error" ) return
Could this be a problem related to the pbs_python egg (v. pbs_python-4.1.0) being used by Galaxy or a Torque-specific issue? Just to reiterate, we are on a development snapshot of Torque which is hard to replace as many other people using it.
It's possible that pbs_python is generating code which is incompatible, but since it's linked against your version of TORQUE this should not be the case. It's hard to say exactly what's causing this since it's outside of Galaxy. I'm not sure if TORQUE has any client-side debugging that would help with this issue but that's where I'd start.
Also, could you please advise which Torque & pbs_python version combinations have you successfully tested against?
We're using an older version (2.1.11) on our submission hosts since we saw performance problems when using pbs_python with the newer 2.4.x versions. The TORQUE server and execution hosts run 2.4.9.
Regards, Sonali
PS: pbs_python has a new version 4.3 out (https://subtrac.sara.nl/oss/pbs_python/wiki/TorqueInstallation), why is this not in the PSU egg repository yet? Would that make a difference?
I'm not sure if it would make a difference. I upgrade the pbs_python egg as necessary or when it's particularly far out of date. --nate DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Sonali Amonkar wrote:
Hi Nate,
We are still awaiting any replies to the error on the Torque community. About the debugging, we did try tracejob, however since the job was not getting submitted itself, Torque did not have any logging to the job(it wasn't even a job yet). Meanwhile, we are retrying deployment of Galaxy on a different version of Torque(2.3.6) with pbs_python(2.6), but now face a new error,
galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,345 (34/2519.server) Removed from PBS queue before job completion
This would indicate the job is being stopped either by a user, or the job walltime or job output size limit configured in universe_wsgi.ini. --nate
galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,344 (34/2519.server) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,351 Job output not returned by PBS: the output datasets were deleted while the job was running, the job was manually dequeued or there was a cluster error.
One certain job gets removed, failing the entire workflow. Please let me know if you have any information / if you have come across this error before.
Many thanks for your time.
Regards, Sonali
-----Original Message----- From: Nate Coraor [mailto:nate@bx.psu.edu] Sent: Tuesday, February 15, 2011 10:30 PM To: Sonali Amonkar Cc: Galaxy Dev Subject: Re: [galaxy-dev] Error with setuptools version in Galaxy installation on Cluster
Sonali Amonkar wrote:
On further digging, we found that the script is failing in the following part of $GALAXY_HOME/lib/galaxy/jobs/runners/pbs.py:
# submit galaxy_job_id = job_wrapper.job_id log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) ) log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) ) job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None)
This is the line here, it's failing to submit the job.
pbs.pbs_disconnect(c)
# check to see if it submitted if not job_id: errno, text = pbs.error() log.debug( "(%s) pbs_submit failed, PBS error %d: %s" % (galaxy_job_id, errno, text) ) job_wrapper.fail( "Unable to run this job due to a cluster error" ) return
Could this be a problem related to the pbs_python egg (v. pbs_python-4.1.0) being used by Galaxy or a Torque-specific issue? Just to reiterate, we are on a development snapshot of Torque which is hard to replace as many other people using it.
It's possible that pbs_python is generating code which is incompatible, but since it's linked against your version of TORQUE this should not be the case.
It's hard to say exactly what's causing this since it's outside of Galaxy. I'm not sure if TORQUE has any client-side debugging that would help with this issue but that's where I'd start.
Also, could you please advise which Torque & pbs_python version combinations have you successfully tested against?
We're using an older version (2.1.11) on our submission hosts since we saw performance problems when using pbs_python with the newer 2.4.x versions.
The TORQUE server and execution hosts run 2.4.9.
Regards, Sonali
PS: pbs_python has a new version 4.3 out (https://subtrac.sara.nl/oss/pbs_python/wiki/TorqueInstallation), why is this not in the PSU egg repository yet? Would that make a difference?
I'm not sure if it would make a difference. I upgrade the pbs_python egg as necessary or when it's particularly far out of date.
--nate
DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
participants (2)
-
Nate Coraor
-
Sonali Amonkar