I added this snippet to the top of my extract_dataset_part.py: pkg_resources.require("simplejson") # wait until this process' PID is the first PID of all processes with the same name, then import while True: with os.popen("ps ax|grep extract_dataset_part.py |grep -v grep|awk '{print $1}'") as allpids: if os.getpid() == int(allpids.readline().strip() ): break import simplejson The file will wait its turn based on its PID (lower PIDs show up higher in the table). Problems may however arise when an extract_dataset_part.py becomes a zombie or something, but since it's a small script, this may do the job. If anyone sees more problems, I'd be happy to know. cheers, jorrit On 09/19/2012 09:16 AM, Jorrit Boekel wrote:
For completeness, here's two tracebacks (there were more similar ones) from the same job:
/mnt/galaxyData/tmp/job_working_directory/000/75/task_4: Traceback (most recent call last): File "./scripts/extract_dataset_part.py", line 25, in <module> import galaxy.model.mapping #need to load this before we unpickle, in order to setup properties assigned by the mappers File "/mnt/galaxyTools/galaxy-central/lib/galaxy/model/__init__.py", line 13, in <module> import galaxy.datatypes.registry File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/registry.py", line 8, in <module> from display_applications.application import DisplayApplication File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/application.py", line 9, in <module> from util import encode_dataset_user File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/util.py", line 3, in <module> from Crypto.Cipher import Blowfish File "/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", line 4, in __bootstrap__ File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in resource_filename self, resource_name File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in get_resource_filename self._extract_resource(manager, self._eager_to_zip(name)) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in _extract_resource self.egg_name, self._parts(zip_path) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in get_cache_path self.extraction_error() File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in extraction_error raise err pkg_resources.ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg cache:
[Errno 17] File exists: '/home/galaxy/.python-eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg-tmp'
The Python egg cache directory is currently set to:
/home/galaxy/.python-eggs
Perhaps your account does not have write access to this directory? You can change the cache directory by setting the PYTHON_EGG_CACHE environment variable to point to an accessible directory. /mnt/galaxyData/tmp/job_working_directory/000/75/task_5: Traceback (most recent call last): File "./scripts/extract_dataset_part.py", line 22, in <module> import simplejson File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/__init__.py", line 111, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/decoder.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", line 10, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", line 6, in _import_c_make_scanner File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", line 4, in __bootstrap__ File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in resource_filename self, resource_name File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in get_resource_filename self._extract_resource(manager, self._eager_to_zip(name)) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in _extract_resource self.egg_name, self._parts(zip_path) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in get_cache_path self.extraction_error() File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in extraction_error raise err pkg_resources.ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg cache:
[Errno 17] File exists: '/home/galaxy/.python-eggs'
The Python egg cache directory is currently set to:
/home/galaxy/.python-eggs
Perhaps your account does not have write access to this directory? You can change the cache directory by setting the PYTHON_EGG_CACHE environment variable to point to an accessible directory.
On 09/18/2012 05:24 PM, James Taylor wrote:
Interesting. If I'm reading this correctly the problem is happening inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so on install [fetch_eggs] time not run time which would avoid this). If so this would seem to be a locking bug in pkg_resources. Dannon, we could put a guard around the imports in extract_dataset_part.py as an (overly aggressive and hacky) fix.
-- jt
On Tue, Sep 18, 2012 at 10:37 AM, Jorrit Boekel <jorrit.boekel@scilifelab.se> wrote:
- which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir.