python egg cache exists error
Dear list, I am running galaxy-dist on Amazon EC2 through Cloudman, and am using the enable_tasked_jobs to run jobs in parallel. Yes, I know it's not recommended in production. My jobs usually get split in 72 parts, and sometimes (but not always, maybe in 30-50% of cases), errors are returned concerning the python egg cache, usually: [Errno 17] File exists: '/home/galaxy/.python-eggs' or something like [Errno 17] File exists: '/home/galaxy/.python-eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg-tmp' The errors arise AFAIK from when scripts/extract_dataset_part.py is run. I am guessing that the tmp python egg dir is created for every task of the mentioned 72, that they sometimes coincide and that this leads to an error. I would like to solve this problem, but before doing so, I'd like to know if someone else has already fixed it in a galaxy-central changeset. cheers, jorrit
Hi again, I have looked into this matter a little bit more, and it looks like this is happening: - tasked job is split - tasks commands are sent to workers (I am running 8-core high cpu extra large workers on EC2) - per task, worker runs env.sh for the respective tool - per task, worker runs scripts/extract_dataset_part.py - this scripts issues import statements (ones forsimplejson and galaxy.model.mapping have caused me problems) - which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir. That last point is my guessing. I don't really know how to solve this in a non-hackish way, so until someone finds out, I may use reading from a 'eggs_extracted.txt' file to determine if the eggs have been extracted. And locking the file when writing to it of course. cheers, jorrit On 09/14/2012 10:57 AM, Jorrit Boekel wrote:
Dear list,
I am running galaxy-dist on Amazon EC2 through Cloudman, and am using the enable_tasked_jobs to run jobs in parallel. Yes, I know it's not recommended in production. My jobs usually get split in 72 parts, and sometimes (but not always, maybe in 30-50% of cases), errors are returned concerning the python egg cache, usually:
[Errno 17] File exists: '/home/galaxy/.python-eggs'
or something like
[Errno 17] File exists: '/home/galaxy/.python-eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg-tmp'
The errors arise AFAIK from when scripts/extract_dataset_part.py is run. I am guessing that the tmp python egg dir is created for every task of the mentioned 72, that they sometimes coincide and that this leads to an error.
I would like to solve this problem, but before doing so, I'd like to know if someone else has already fixed it in a galaxy-central changeset.
cheers, jorrit
Interesting. If I'm reading this correctly the problem is happening inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so on install [fetch_eggs] time not run time which would avoid this). If so this would seem to be a locking bug in pkg_resources. Dannon, we could put a guard around the imports in extract_dataset_part.py as an (overly aggressive and hacky) fix. -- jt On Tue, Sep 18, 2012 at 10:37 AM, Jorrit Boekel <jorrit.boekel@scilifelab.se> wrote:
- which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir.
For completeness, here's two tracebacks (there were more similar ones) from the same job: /mnt/galaxyData/tmp/job_working_directory/000/75/task_4: Traceback (most recent call last): File "./scripts/extract_dataset_part.py", line 25, in <module> import galaxy.model.mapping #need to load this before we unpickle, in order to setup properties assigned by the mappers File "/mnt/galaxyTools/galaxy-central/lib/galaxy/model/__init__.py", line 13, in <module> import galaxy.datatypes.registry File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/registry.py", line 8, in <module> from display_applications.application import DisplayApplication File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/application.py", line 9, in <module> from util import encode_dataset_user File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/util.py", line 3, in <module> from Crypto.Cipher import Blowfish File "/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", line 4, in __bootstrap__ File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in resource_filename self, resource_name File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in get_resource_filename self._extract_resource(manager, self._eager_to_zip(name)) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in _extract_resource self.egg_name, self._parts(zip_path) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in get_cache_path self.extraction_error() File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in extraction_error raise err pkg_resources.ExtractionError: Can't extract file(s) to egg cache The following error occurred while trying to extract file(s) to the Python egg cache: [Errno 17] File exists: '/home/galaxy/.python-eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg-tmp' The Python egg cache directory is currently set to: /home/galaxy/.python-eggs Perhaps your account does not have write access to this directory? You can change the cache directory by setting the PYTHON_EGG_CACHE environment variable to point to an accessible directory. /mnt/galaxyData/tmp/job_working_directory/000/75/task_5: Traceback (most recent call last): File "./scripts/extract_dataset_part.py", line 22, in <module> import simplejson File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/__init__.py", line 111, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/decoder.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", line 10, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", line 6, in _import_c_make_scanner File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", line 4, in __bootstrap__ File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in resource_filename self, resource_name File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in get_resource_filename self._extract_resource(manager, self._eager_to_zip(name)) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in _extract_resource self.egg_name, self._parts(zip_path) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in get_cache_path self.extraction_error() File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in extraction_error raise err pkg_resources.ExtractionError: Can't extract file(s) to egg cache The following error occurred while trying to extract file(s) to the Python egg cache: [Errno 17] File exists: '/home/galaxy/.python-eggs' The Python egg cache directory is currently set to: /home/galaxy/.python-eggs Perhaps your account does not have write access to this directory? You can change the cache directory by setting the PYTHON_EGG_CACHE environment variable to point to an accessible directory. On 09/18/2012 05:24 PM, James Taylor wrote:
Interesting. If I'm reading this correctly the problem is happening inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so on install [fetch_eggs] time not run time which would avoid this). If so this would seem to be a locking bug in pkg_resources. Dannon, we could put a guard around the imports in extract_dataset_part.py as an (overly aggressive and hacky) fix.
-- jt
On Tue, Sep 18, 2012 at 10:37 AM, Jorrit Boekel <jorrit.boekel@scilifelab.se> wrote:
- which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir.
I added this snippet to the top of my extract_dataset_part.py: pkg_resources.require("simplejson") # wait until this process' PID is the first PID of all processes with the same name, then import while True: with os.popen("ps ax|grep extract_dataset_part.py |grep -v grep|awk '{print $1}'") as allpids: if os.getpid() == int(allpids.readline().strip() ): break import simplejson The file will wait its turn based on its PID (lower PIDs show up higher in the table). Problems may however arise when an extract_dataset_part.py becomes a zombie or something, but since it's a small script, this may do the job. If anyone sees more problems, I'd be happy to know. cheers, jorrit On 09/19/2012 09:16 AM, Jorrit Boekel wrote:
For completeness, here's two tracebacks (there were more similar ones) from the same job:
/mnt/galaxyData/tmp/job_working_directory/000/75/task_4: Traceback (most recent call last): File "./scripts/extract_dataset_part.py", line 25, in <module> import galaxy.model.mapping #need to load this before we unpickle, in order to setup properties assigned by the mappers File "/mnt/galaxyTools/galaxy-central/lib/galaxy/model/__init__.py", line 13, in <module> import galaxy.datatypes.registry File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/registry.py", line 8, in <module> from display_applications.application import DisplayApplication File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/application.py", line 9, in <module> from util import encode_dataset_user File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/util.py", line 3, in <module> from Crypto.Cipher import Blowfish File "/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", line 4, in __bootstrap__ File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in resource_filename self, resource_name File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in get_resource_filename self._extract_resource(manager, self._eager_to_zip(name)) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in _extract_resource self.egg_name, self._parts(zip_path) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in get_cache_path self.extraction_error() File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in extraction_error raise err pkg_resources.ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg cache:
[Errno 17] File exists: '/home/galaxy/.python-eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg-tmp'
The Python egg cache directory is currently set to:
/home/galaxy/.python-eggs
Perhaps your account does not have write access to this directory? You can change the cache directory by setting the PYTHON_EGG_CACHE environment variable to point to an accessible directory. /mnt/galaxyData/tmp/job_working_directory/000/75/task_5: Traceback (most recent call last): File "./scripts/extract_dataset_part.py", line 22, in <module> import simplejson File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/__init__.py", line 111, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/decoder.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", line 10, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", line 6, in _import_c_make_scanner File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", line 7, in <module> File "/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", line 4, in __bootstrap__ File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in resource_filename self, resource_name File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in get_resource_filename self._extract_resource(manager, self._eager_to_zip(name)) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in _extract_resource self.egg_name, self._parts(zip_path) File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in get_cache_path self.extraction_error() File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in extraction_error raise err pkg_resources.ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg cache:
[Errno 17] File exists: '/home/galaxy/.python-eggs'
The Python egg cache directory is currently set to:
/home/galaxy/.python-eggs
Perhaps your account does not have write access to this directory? You can change the cache directory by setting the PYTHON_EGG_CACHE environment variable to point to an accessible directory.
On 09/18/2012 05:24 PM, James Taylor wrote:
Interesting. If I'm reading this correctly the problem is happening inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so on install [fetch_eggs] time not run time which would avoid this). If so this would seem to be a locking bug in pkg_resources. Dannon, we could put a guard around the imports in extract_dataset_part.py as an (overly aggressive and hacky) fix.
-- jt
On Tue, Sep 18, 2012 at 10:37 AM, Jorrit Boekel <jorrit.boekel@scilifelab.se> wrote:
- which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir.
For Test/Main, I have the user's ~/.bash_profile set $PYTHON_EGG_CACHE on a per-node basis. This could also be done per-node and per-pty to ensure uniqueness per job. --nate On Sep 18, 2012, at 11:24 AM, James Taylor wrote:
Interesting. If I'm reading this correctly the problem is happening inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so on install [fetch_eggs] time not run time which would avoid this). If so this would seem to be a locking bug in pkg_resources. Dannon, we could put a guard around the imports in extract_dataset_part.py as an (overly aggressive and hacky) fix.
-- jt
On Tue, Sep 18, 2012 at 10:37 AM, Jorrit Boekel <jorrit.boekel@scilifelab.se> wrote:
- which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir.
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (3)
-
James Taylor
-
Jorrit Boekel
-
Nate Coraor