Re: [galaxy-dev] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/###/galaxy_624.o'

6 Mar 2014


      Just a warning, -noac has a pretty severe impact on performance in my (and
others' on this list) experience. You might also want to try messing with
the 'lookupcache' mount option.

--nate


On Thu, Mar 6, 2014 at 2:20 PM, Pete Schmitt
<Peter.R.Schmitt@dartmouth.edu>wrote:
...
Hello Nate,
I had that parameter set to 1, but I up'd it to 5.  I also added -noac to
the nfs mounts for /nextgen3
That appears to have fixed it.
Thank you!!!
On 3/6/14, 1:57 PM, Nate Coraor wrote:
Hi Pete,
I'd suggest setting retry_job_output_collection > 0 in
universe_wsgi.ini. This is usually a symptom of attribute caching on
network filesystems.
--nate
On Wed, Mar 5, 2014 at 8:06 PM, Pete Schmitt <
Peter.R.Schmitt@dartmouth.edu> wrote:
...
In trying something simple, using galaxy I downloaded data from USCS
main.   The data gets downloaded but the job errors out.   I verified that
the job actually ran, and completed successfully according to the scheduler
but  I get errors like this:
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 18:17:35,941 (624/
46.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 18:17:36,060 (624/
46.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No
such file or directory:
'/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/624/galaxy_624.o'
There are no directories being created below the 000 directory.   I
verified that the directory tree is owned by galaxy and that the galaxy
user can run jobs from the command line as a normal user.
I set the parameter "cleanup_job = never".  It was set to "always" which
is probably why the files were never there.  Now the files are there,
including the galaxy_###.o file but galaxy still errors like above.
I had set the parameter "cluster_files_directory = database/pbs", but
that doesn't seem to work any longer.  The .o and .e files used to end up
there.
Here is an example:
(galaxyvenv)[galaxy@dirigo 630]$ ll
total 16
-rw------- 1 galaxy galaxy    0 Mar  5 19:29 galaxy_630.e
-rw-rw-r-- 1 galaxy galaxy    2 Mar  5 19:29 galaxy_630.ec
-rw------- 1 galaxy galaxy  940 Mar  5 19:29 galaxy_630.o
-rwxr-xr-x 1 galaxy galaxy 2429 Mar  5 19:29 galaxy_630.sh
-rw-rw-r-- 1 galaxy galaxy  138 Mar  5 19:29 galaxy.json
-rw-rw-r-- 1 galaxy galaxy 2139 Mar  5 19:29
metadata_in_HistoryDatasetAssociation_1182_o830e3
-rw-rw-r-- 1 galaxy galaxy   20 Mar  5 19:29
metadata_kwds_HistoryDatasetAssociation_1182_hOhPp7
-rw-rw-r-- 1 galaxy galaxy   55 Mar  5 19:29
metadata_out_HistoryDatasetAssociation_1182_Ynb70M
-rw-rw-r-- 1 galaxy galaxy    2 Mar  5 19:29
metadata_override_HistoryDatasetAssociation_1182_HsMljG
-rw-rw-r-- 1 galaxy galaxy   44 Mar  5 19:29
metadata_results_HistoryDatasetAssociation_1182_LxdsAZ
(galaxyvenv)[galaxy@dirigo 630]$ pwd
/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630
Here is the error from this:
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:37,731 (630/
51.dirigo.mdibl.org) state change: job is running
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:49,119 (630/
51.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 19:31:50,225 (630/
51.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No
such file or directory:
'/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_630.o'
galaxy.jobs DEBUG 2014-03-05 19:31:50,252 finish(): Moved
/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_dataset_856.dat
to /nextgen3/galaxy/galaxy-dist/database/files/000/dataset_856.dat
galaxy.jobs DEBUG 2014-03-05 19:31:50,351 job 630 ended
On the galaxy page in the history you get in pink:
 1 UCSC Main on Human: knownGene (chr22:1-51304566)
  error
An error occurred with this dataset:
Job output not returned from cluster
But the dataset is there.
On 3/5/14, 3:33 PM, Nate Coraor wrote:
The old-style url syntax is supposed to continue to work, if you have any
details on what's not working I can look in to it. That said, job_conf.xml
is the way forward, and a job_conf.xml for the drmaa runner would be a
pretty trivial change from the one you have for the pbs runner, e.g.:
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="pbs" type="runner"
load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/>
    </plugins>
    <handlers>
        <handler id="dirigo"/>
    </handlers>
    <destinations default="pbs_default">
        <destination id="pbs_default" runner="pbs"/>
                <param id="nativeSpecification">-l walltime=
72:00:00,nodes=1:ppn=4</param>
        </destination>
    </destinations>
</job_conf>
Just make sure you set $DRMAA_LIBRARY_PATH in your environment to the
correct libdrmaa.so.
--nate
On Wed, Mar 5, 2014 at 3:27 PM, Pete Schmitt <
Peter.R.Schmitt@dartmouth.edu> wrote:
...
Hello Nate,
I have that version installed and was using it in the older versions of
galaxy for a few years.  Once I loaded this new version, it no longer
worked with the old
definitions in the universe file using: default_cluster_job_runner =
drmaa:///
Do I need a job_conf.xml that uses the drmaa runner?
On 3/5/14, 3:21 PM, Nate Coraor wrote:
Hi Pete,
The latest error is pretty strange and not one I've encountered
before. It suggests that scramble is not loading setuptools in place of
distutils and thus does not have access to the setuptools extensions
(notably, egg-related functionality). Something abnormal still seems to be
going on with your python environment.
You can use drmaa if you like (this is known to work well). You will
want to use the libdrmaa for Torque that's maintained by the Poznan
Supercomputing and Networking Center, rather than the libdrmaa that can be
built directly with the Torque source. PSNC libdrmaa for Torque/PBS can be
found here: http://apps.man.poznan.pl/trac/pbs-drmaa
--nate
On Wed, Mar 5, 2014 at 3:10 PM, Pete Schmitt <
Peter.R.Schmitt@dartmouth.edu> wrote:
...
Is there any other alternatives to pbs_python for interfacing to a
torque scheduler.   This method appears to be a dead end.
On 3/4/14, 9:51 AM, Nate Coraor wrote:
Pete,
Is it possible that `python` as the Galaxy user is calling a python
other than /opt/python/2.7.6/bin/python (e.g. the system version without
the -dev/-devel package installed)?  The safest bet for ensuring you're
using the right python and that it's not going to have conflicting modules,
I'd suggest using a Python virtualenv. This is easy to set up, just make
sure you run it with the correct python executable, for example:
% wget
https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.11.4.tar.g...
% tar zxf virtualenv-1.11.4.tar.gz
% /opt/python/2.7.6/bin/python ./virtualenv-1.11.4/virtualenv.py
galaxyvenv
% . ./galaxyvenv/bin/activate
% cd galaxy-dist
% python ./scripts/fetch_eggs.py
% LIBTORQUE_DIR=/opt/torque/active/lib python scripts/scramble.py -e
pbs_python
--nate
On Mon, Mar 3, 2014 at 5:19 PM, Pete Schmitt <
Peter.R.Schmitt@dartmouth.edu> wrote:
...
I uninstalled pbs_python 4.4.0 and reinstalled 4.3.5 as root (not
using scramble)
When I try this method as the galaxy user:
LIBTORQUE_DIR=/opt/torque/active/lib python scripts/scramble.py -e
pbs_python
I get the following output:
src/pbs_wrap.c:2813: warning: function declaration isn't a prototype
gcc -pthread -shared build/temp.linux-x86_64-2.7/src/pbs_wrap.o
-L/opt/torque/4.2.7/lib -L. -ltorque -lpython2.7 -o
build/lib.linux-x86_64-2.7/_pbs.so -L/opt/torque/4.2.7/lib -ltorque
-Wl,-rpath -Wl,/opt/torque/4.2.7/lib
/usr/bin/ld: cannot find -lpython2.7
LD_LIBRARY_PATH="/opt/python/2.7.6/lib:/opt/torque/active/lib:/usr/local/lib"
I'm not sure why it can't find the libpython2.7.so file.   When I
built it as root there is a -L/opt/python/2.7.6/lib in that gcc line.
On 3/3/14, 4:13 PM, Nate Coraor wrote:
Hi Pete,
Your subject says you are unable to build pbs_python using scramble.
Could you provide details on what's not working there?
Galaxy is not going to work with a different version of pbs_python
unless a bit of hacking is done to make it attempt to do so. We test
Galaxy with specific versions of its dependencies, which is why we
control the versions of those dependencies and provide the scramble
script to (hopefully) make it painless to build them yourself, should
it be necessary to do so, as is always the case with pbs_python.
--nate
On Mon, Mar 3, 2014 at 3:57 PM, Pete Schmitt<Peter.R.Schmitt@dartmouth.edu> <Peter.R.Schmitt@dartmouth.edu> wrote:
Following the directions from here:https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#PBS
I'm trying to get pbs_python to work as I'm using torque for scheduling
galaxy jobs.
Note: This is a fresh install of galaxy from galaxy-dist on CentOS 5.10
I have pbs_python 4.4.0 module installed into a source-built version of
python/2.7.6
I get the following error in the output of run.sh:
galaxy.jobs INFO 2014-03-03 15:46:45,485 Handler 'main' will load all
configured runner plugins
Traceback (most recent call last):
  File
"/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/webapps/galaxy/buildapp.py",
line 39, in app_factory
    app = UniverseApplication( global_conf = global_conf, **kwargs )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/app.py", line 130, in
__init__
    self.job_manager = manager.JobManager( self )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/manager.py", line
31, in __init__
    self.job_handler = handler.JobHandler( app )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/handler.py", line
30, in __init__
    self.dispatcher = DefaultJobDispatcher( app )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/handler.py", line
568, in __init__
    self.job_runners = self.app.job_config.get_job_runner_plugins(
self.app.config.server_name )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/__init__.py", line
449, in get_job_runner_plugins
    module = __import__( module_name )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/runners/pbs.py",
line 31, in <module>
    raise Exception( egg_message % str( e ) )
Exception:
The 'pbs' runner depends on 'pbs_python' which is not installed or not
configured properly.  Galaxy's "scramble" system should make this
installation
simple, please follow the instructions found at:
http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
Additional errors may follow:
pbs-python==4.3.5
This is the job_conf.xml file:
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="pbs" type="runner"
load="galaxy.jobs.runners.pbs:PBSJobRunner"/>
    </plugins>
    <handlers>
        <handler id="dirigo"/>
    </handlers>
    <destinations default="pbs_default">
        <destination id="pbs_default" runner="pbs"/>
                <param
id="Resource_List">walltime=72:00:00,nodes=1:ppn=4</param>
    </destinations>
</job_conf>
I did not use the scramble system to install the pbs_python module.  I
downloaded the latest version
available and installed it from the root account.
--
Pete Schmitt
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid
Computational Genetics Lab Institute for Quantitative Biomedical Sciences
Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone:
603-646-8109 <603-646-8109> http://discovery.dartmouth.edu
<http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid
<http://columbia.dartmouth.edu/grid> http://www.epistasis.org
<http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid
Computational Genetics Lab Institute for Quantitative Biomedical Sciences
Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone:
603-646-8109 <603-646-8109> http://discovery.dartmouth.edu
<http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid
<http://columbia.dartmouth.edu/grid> http://www.epistasis.org
<http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid
Computational Genetics Lab Institute for Quantitative Biomedical Sciences
Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone:
603-646-8109 <603-646-8109> http://discovery.dartmouth.edu
<http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid
<http://columbia.dartmouth.edu/grid> http://www.epistasis.org
<http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid
Computational Genetics Lab Institute for Quantitative Biomedical Sciences
Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone:
603-646-8109 <603-646-8109> http://discovery.dartmouth.edu
<http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid
<http://columbia.dartmouth.edu/grid> http://www.epistasis.org
<http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *
--
* Pete Schmitt Technical Director: Discovery Cluster NH INBRE Grid
Computational Genetics Lab Institute for Quantitative Biomedical Sciences
Dartmouth College, HB 6203 L12 Berry/Baker Library Hanover, NH 03755 Phone:
603-646-8109 <603-646-8109> http://discovery.dartmouth.edu
<http://discovery.dartmouth.edu> http://columbia.dartmouth.edu/grid
<http://columbia.dartmouth.edu/grid> http://www.epistasis.org
<http://www.epistasis.org> http://iQBS.org <http://iQBS.org> *

Nate Coraor

tags

participants (1)