Re: [galaxy-dev] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/###/galaxy_624.o'
Just a warning, -noac has a pretty severe impact on performance in my (and others' on this list) experience. You might also want to try messing with the 'lookupcache' mount option. --nate On Thu, Mar 6, 2014 at 2:20 PM, Pete Schmitt <Peter.R.Schmitt@dartmouth.edu>wrote:
Hello Nate,
I had that parameter set to 1, but I up'd it to 5. I also added -noac to the nfs mounts for /nextgen3
That appears to have fixed it.
Thank you!!!
On 3/6/14, 1:57 PM, Nate Coraor wrote:
Hi Pete,
I'd suggest setting retry_job_output_collection > 0 in universe_wsgi.ini. This is usually a symptom of attribute caching on network filesystems.
--nate
On Wed, Mar 5, 2014 at 8:06 PM, Pete Schmitt < Peter.R.Schmitt@dartmouth.edu> wrote:
In trying something simple, using galaxy I downloaded data from USCS main. The data gets downloaded but the job errors out. I verified that the job actually ran, and completed successfully according to the scheduler but I get errors like this:
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 18:17:35,941 (624/ 46.dirigo.mdibl.org) state change: job finished normally galaxy.jobs.runners ERROR 2014-03-05 18:17:36,060 (624/ 46.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/624/galaxy_624.o'
There are no directories being created below the 000 directory. I verified that the directory tree is owned by galaxy and that the galaxy user can run jobs from the command line as a normal user.
I set the parameter "cleanup_job = never". It was set to "always" which is probably why the files were never there. Now the files are there, including the galaxy_###.o file but galaxy still errors like above.
I had set the parameter "cluster_files_directory = database/pbs", but that doesn't seem to work any longer. The .o and .e files used to end up there.
Here is an example:
(galaxyvenv)[galaxy@dirigo 630]$ ll total 16 -rw------- 1 galaxy galaxy 0 Mar 5 19:29 galaxy_630.e -rw-rw-r-- 1 galaxy galaxy 2 Mar 5 19:29 galaxy_630.ec -rw------- 1 galaxy galaxy 940 Mar 5 19:29 galaxy_630.o -rwxr-xr-x 1 galaxy galaxy 2429 Mar 5 19:29 galaxy_630.sh -rw-rw-r-- 1 galaxy galaxy 138 Mar 5 19:29 galaxy.json -rw-rw-r-- 1 galaxy galaxy 2139 Mar 5 19:29 metadata_in_HistoryDatasetAssociation_1182_o830e3 -rw-rw-r-- 1 galaxy galaxy 20 Mar 5 19:29 metadata_kwds_HistoryDatasetAssociation_1182_hOhPp7 -rw-rw-r-- 1 galaxy galaxy 55 Mar 5 19:29 metadata_out_HistoryDatasetAssociation_1182_Ynb70M -rw-rw-r-- 1 galaxy galaxy 2 Mar 5 19:29 metadata_override_HistoryDatasetAssociation_1182_HsMljG -rw-rw-r-- 1 galaxy galaxy 44 Mar 5 19:29 metadata_results_HistoryDatasetAssociation_1182_LxdsAZ (galaxyvenv)[galaxy@dirigo 630]$ pwd /nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630
Here is the error from this:
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:37,731 (630/ 51.dirigo.mdibl.org) state change: job is running galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:49,119 (630/ 51.dirigo.mdibl.org) state change: job finished normally galaxy.jobs.runners ERROR 2014-03-05 19:31:50,225 (630/ 51.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_630.o' galaxy.jobs DEBUG 2014-03-05 19:31:50,252 finish(): Moved /nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_dataset_856.dat to /nextgen3/galaxy/galaxy-dist/database/files/000/dataset_856.dat galaxy.jobs DEBUG 2014-03-05 19:31:50,351 job 630 ended
On the galaxy page in the history you get in pink: 1 UCSC Main on Human: knownGene (chr22:1-51304566) error An error occurred with this dataset: Job output not returned from cluster
But the dataset is there.
On 3/5/14, 3:33 PM, Nate Coraor wrote:
The old-style url syntax is supposed to continue to work, if you have any details on what's not working I can look in to it. That said, job_conf.xml is the way forward, and a job_conf.xml for the drmaa runner would be a pretty trivial change from the one you have for the pbs runner, e.g.:
<?xml version="1.0"?> <job_conf> <plugins> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/> </plugins> <handlers> <handler id="dirigo"/> </handlers> <destinations default="pbs_default"> <destination id="pbs_default" runner="pbs"/> <param id="nativeSpecification">-l walltime= 72:00:00,nodes=1:ppn=4</param> </destination> </destinations> </job_conf>
Just make sure you set $DRMAA_LIBRARY_PATH in your environment to the correct libdrmaa.so.
--nate
On Wed, Mar 5, 2014 at 3:27 PM, Pete Schmitt < Peter.R.Schmitt@dartmouth.edu> wrote:
Hello Nate,
I have that version installed and was using it in the older versions of galaxy for a few years. Once I loaded this new version, it no longer worked with the old definitions in the universe file using: default_cluster_job_runner = drmaa:///
Do I need a job_conf.xml that uses the drmaa runner?
On 3/5/14, 3:21 PM, Nate Coraor wrote:
Hi Pete,
The latest error is pretty strange and not one I've encountered before. It suggests that scramble is not loading setuptools in place of distutils and thus does not have access to the setuptools extensions (notably, egg-related functionality). Something abnormal still seems to be going on with your python environment.
You can use drmaa if you like (this is known to work well). You will want to use the libdrmaa for Torque that's maintained by the Poznan Supercomputing and Networking Center, rather than the libdrmaa that can be built directly with the Torque source. PSNC libdrmaa for Torque/PBS can be found here: http://apps.man.poznan.pl/trac/pbs-drmaa
--nate
On Wed, Mar 5, 2014 at 3:10 PM, Pete Schmitt < Peter.R.Schmitt@dartmouth.edu> wrote:
Is there any other alternatives to pbs_python for interfacing to a torque scheduler. This method appears to be a dead end.
On 3/4/14, 9:51 AM, Nate Coraor wrote:
Pete,
Is it possible that `python` as the Galaxy user is calling a python other than /opt/python/2.7.6/bin/python (e.g. the system version without the -dev/-devel package installed)? The safest bet for ensuring you're using the right python and that it's not going to have conflicting modules, I'd suggest using a Python virtualenv. This is easy to set up, just make sure you run it with the correct python executable, for example:
% wget https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.11.4.tar.g... % tar zxf virtualenv-1.11.4.tar.gz % /opt/python/2.7.6/bin/python ./virtualenv-1.11.4/virtualenv.py galaxyvenv % . ./galaxyvenv/bin/activate % cd galaxy-dist % python ./scripts/fetch_eggs.py % LIBTORQUE_DIR=/opt/torque/active/lib python scripts/scramble.py -e pbs_python
--nate
On Mon, Mar 3, 2014 at 5:19 PM, Pete Schmitt < Peter.R.Schmitt@dartmouth.edu> wrote:
I uninstalled pbs_python 4.4.0 and reinstalled 4.3.5 as root (not using scramble)
When I try this method as the galaxy user: LIBTORQUE_DIR=/opt/torque/active/lib python scripts/scramble.py -e pbs_python
I get the following output:
src/pbs_wrap.c:2813: warning: function declaration isn't a prototype gcc -pthread -shared build/temp.linux-x86_64-2.7/src/pbs_wrap.o -L/opt/torque/4.2.7/lib -L. -ltorque -lpython2.7 -o build/lib.linux-x86_64-2.7/_pbs.so -L/opt/torque/4.2.7/lib -ltorque -Wl,-rpath -Wl,/opt/torque/4.2.7/lib /usr/bin/ld: cannot find -lpython2.7
LD_LIBRARY_PATH="/opt/python/2.7.6/lib:/opt/torque/active/lib:/usr/local/lib"
I'm not sure why it can't find the libpython2.7.so file. When I built it as root there is a -L/opt/python/2.7.6/lib in that gcc line.
On 3/3/14, 4:13 PM, Nate Coraor wrote:
Hi Pete,
Your subject says you are unable to build pbs_python using scramble. Could you provide details on what's not working there?
Galaxy is not going to work with a different version of pbs_python unless a bit of hacking is done to make it attempt to do so. We test Galaxy with specific versions of its dependencies, which is why we control the versions of those dependencies and provide the scramble script to (hopefully) make it painless to build them yourself, should it be necessary to do so, as is always the case with pbs_python.
--nate
On Mon, Mar 3, 2014 at 3:57 PM, Pete Schmitt<Peter.R.Schmitt@dartmouth.edu> <Peter.R.Schmitt@dartmouth.edu> wrote:
Following the directions from here:https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#PBS
I'm trying to get pbs_python to work as I'm using torque for scheduling galaxy jobs.
Note: This is a fresh install of galaxy from galaxy-dist on CentOS 5.10
I have pbs_python 4.4.0 module installed into a source-built version of python/2.7.6
I get the following error in the output of run.sh: galaxy.jobs INFO 2014-03-03 15:46:45,485 Handler 'main' will load all configured runner plugins Traceback (most recent call last): File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/webapps/galaxy/buildapp.py", line 39, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/app.py", line 130, in __init__ self.job_manager = manager.JobManager( self ) File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/manager.py", line 31, in __init__ self.job_handler = handler.JobHandler( app ) File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/handler.py", line 30, in __init__ self.dispatcher = DefaultJobDispatcher( app ) File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/handler.py", line 568, in __init__ self.job_runners = self.app.job_config.get_job_runner_plugins( self.app.config.server_name ) File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/__init__.py", line 449, in get_job_runner_plugins module = __import__( module_name ) File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/runners/pbs.py", line 31, in <module> raise Exception( egg_message % str( e ) ) Exception:
The 'pbs' runner depends on 'pbs_python' which is not installed or not configured properly. Galaxy's "scramble" system should make this installation simple, please follow the instructions found at:
http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
Additional errors may follow: pbs-python==4.3.5
This is the job_conf.xml file:
<?xml version="1.0"?> <job_conf> <plugins> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner"/> </plugins> <handlers> <handler id="dirigo"/> </handlers> <destinations default="pbs_default"> <destination id="pbs_default" runner="pbs"/> <param id="Resource_List">walltime=72:00:00,nodes=1:ppn=4</param> </destinations> </job_conf>
I did not use the scramble system to install the pbs_python module. I downloaded the latest version available and installed it from the root account.
--
Pete Schmitt
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid Computational Genetics Lab Institute for Quantitative Biomedical Sciences Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone: 603-646-8109 <603-646-8109> http://discovery.dartmouth.edu <http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid <http://columbia.dartmouth.edu/grid> http://www.epistasis.org <http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid Computational Genetics Lab Institute for Quantitative Biomedical Sciences Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone: 603-646-8109 <603-646-8109> http://discovery.dartmouth.edu <http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid <http://columbia.dartmouth.edu/grid> http://www.epistasis.org <http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid Computational Genetics Lab Institute for Quantitative Biomedical Sciences Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone: 603-646-8109 <603-646-8109> http://discovery.dartmouth.edu <http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid <http://columbia.dartmouth.edu/grid> http://www.epistasis.org <http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid Computational Genetics Lab Institute for Quantitative Biomedical Sciences Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone: 603-646-8109 <603-646-8109> http://discovery.dartmouth.edu <http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid <http://columbia.dartmouth.edu/grid> http://www.epistasis.org <http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid Computational Genetics Lab Institute for Quantitative Biomedical Sciences Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone: 603-646-8109 <603-646-8109> http://discovery.dartmouth.edu <http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid <http://columbia.dartmouth.edu/grid> http://www.epistasis.org <http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
participants (1)
-
Nate Coraor