Just a warning, -noac has a pretty severe impact on performance in my (and others' on this list) experience. You might also want to try messing with the 'lookupcache' mount option.

--nate


On Thu, Mar 6, 2014 at 2:20 PM, Pete Schmitt <Peter.R.Schmitt@dartmouth.edu> wrote:
Hello Nate,

I had that parameter set to 1, but I up'd it to 5.  I also added -noac to the nfs mounts for /nextgen3

That appears to have fixed it.

Thank you!!!


On 3/6/14, 1:57 PM, Nate Coraor wrote:
Hi Pete,

I'd suggest setting retry_job_output_collection > 0 in universe_wsgi.ini. This is usually a symptom of attribute caching on network filesystems.

--nate


On Wed, Mar 5, 2014 at 8:06 PM, Pete Schmitt <Peter.R.Schmitt@dartmouth.edu> wrote:


In trying something simple, using galaxy I downloaded data from USCS main.   The data gets downloaded but the job errors out.   I verified that the job actually ran, and completed successfully according to the scheduler but  I get errors like this:

galaxy.jobs.runners.drmaa DEBUG 2014-03-05 18:17:35,941 (624/46.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 18:17:36,060 (624/46.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/624/galaxy_624.o'

There are no directories being created below the 000 directory.   I verified that the directory tree is owned by galaxy and that the galaxy user can run jobs from the command line as a normal user.

I set the parameter "cleanup_job = never".  It was set to "always" which is probably why the files were never there.  Now the files are there, including the galaxy_###.o file but galaxy still errors like above.

I had set the parameter "cluster_files_directory = database/pbs", but that doesn't seem to work any longer.  The .o and .e files used to end up there.

Here is an example:

(galaxyvenv)[galaxy@dirigo 630]$ ll
total 16
-rw------- 1 galaxy galaxy    0 Mar  5 19:29 galaxy_630.e
-rw-rw-r-- 1 galaxy galaxy    2 Mar  5 19:29 galaxy_630.ec
-rw------- 1 galaxy galaxy  940 Mar  5 19:29 galaxy_630.o
-rwxr-xr-x 1 galaxy galaxy 2429 Mar  5 19:29 galaxy_630.sh
-rw-rw-r-- 1 galaxy galaxy  138 Mar  5 19:29 galaxy.json
-rw-rw-r-- 1 galaxy galaxy 2139 Mar  5 19:29 metadata_in_HistoryDatasetAssociation_1182_o830e3
-rw-rw-r-- 1 galaxy galaxy   20 Mar  5 19:29 metadata_kwds_HistoryDatasetAssociation_1182_hOhPp7
-rw-rw-r-- 1 galaxy galaxy   55 Mar  5 19:29 metadata_out_HistoryDatasetAssociation_1182_Ynb70M
-rw-rw-r-- 1 galaxy galaxy    2 Mar  5 19:29 metadata_override_HistoryDatasetAssociation_1182_HsMljG
-rw-rw-r-- 1 galaxy galaxy   44 Mar  5 19:29 metadata_results_HistoryDatasetAssociation_1182_LxdsAZ
(galaxyvenv)[galaxy@dirigo 630]$ pwd
/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630

Here is the error from this:

galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:37,731 (630/51.dirigo.mdibl.org) state change: job is running
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:49,119 (630/51.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 19:31:50,225 (630/51.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_630.o'
galaxy.jobs DEBUG 2014-03-05 19:31:50,252 finish(): Moved /nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_dataset_856.dat to /nextgen3/galaxy/galaxy-dist/database/files/000/dataset_856.dat
galaxy.jobs DEBUG 2014-03-05 19:31:50,351 job 630 ended

On the galaxy page in the history you get in pink:
1 UCSC Main on Human: knownGene (chr22:1-51304566)
error
An error occurred with this dataset:
Job output not returned from cluster

But the dataset is there.

On 3/5/14, 3:33 PM, Nate Coraor wrote:
The old-style url syntax is supposed to continue to work, if you have any details on what's not working I can look in to it. That said, job_conf.xml is the way forward, and a job_conf.xml for the drmaa runner would be a pretty trivial change from the one you have for the pbs runner, e.g.:

<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="pbs" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/>
    </plugins>
    <handlers>
        <handler id="dirigo"/>
    </handlers>
    <destinations default="pbs_default">
        <destination id="pbs_default" runner="pbs"/>
                <param id="nativeSpecification">-l walltime=72:00:00,nodes=1:ppn=4</param>
        </destination>
    </destinations>
</job_conf>

Just make sure you set $DRMAA_LIBRARY_PATH in your environment to the correct libdrmaa.so.

--nate


On Wed, Mar 5, 2014 at 3:27 PM, Pete Schmitt <Peter.R.Schmitt@dartmouth.edu> wrote:
Hello Nate,

I have that version installed and was using it in the older versions of galaxy for a few years.  Once I loaded this new version, it no longer worked with the old
definitions in the universe file using: default_cluster_job_runner = drmaa:///   

Do I need a job_conf.xml that uses the drmaa runner?



On 3/5/14, 3:21 PM, Nate Coraor wrote:
Hi Pete,

The latest error is pretty strange and not one I've encountered before. It suggests that scramble is not loading setuptools in place of distutils and thus does not have access to the setuptools extensions (notably, egg-related functionality). Something abnormal still seems to be going on with your python environment.

You can use drmaa if you like (this is known to work well). You will want to use the libdrmaa for Torque that's maintained by the Poznan Supercomputing and Networking Center, rather than the libdrmaa that can be built directly with the Torque source. PSNC libdrmaa for Torque/PBS can be found here: http://apps.man.poznan.pl/trac/pbs-drmaa

--nate


On Wed, Mar 5, 2014 at 3:10 PM, Pete Schmitt <Peter.R.Schmitt@dartmouth.edu> wrote:
Is there any other alternatives to pbs_python for interfacing to a torque scheduler.   This method appears to be a dead end.



On 3/4/14, 9:51 AM, Nate Coraor wrote:
Pete,

Is it possible that `python` as the Galaxy user is calling a python other than /opt/python/2.7.6/bin/python (e.g. the system version without the -dev/-devel package installed)?  The safest bet for ensuring you're using the right python and that it's not going to have conflicting modules, I'd suggest using a Python virtualenv. This is easy to set up, just make sure you run it with the correct python executable, for example:

% wget https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.11.4.tar.gz
% tar zxf virtualenv-1.11.4.tar.gz
% /opt/python/2.7.6/bin/python ./virtualenv-1.11.4/virtualenv.py galaxyvenv
% . ./galaxyvenv/bin/activate
% cd galaxy-dist
% python ./scripts/fetch_eggs.py
% LIBTORQUE_DIR=/opt/torque/active/lib python scripts/scramble.py -e pbs_python

--nate


On Mon, Mar 3, 2014 at 5:19 PM, Pete Schmitt <Peter.R.Schmitt@dartmouth.edu> wrote:
I uninstalled pbs_python 4.4.0 and reinstalled 4.3.5 as root (not using scramble)

When I try this method as the galaxy user:
LIBTORQUE_DIR=/opt/torque/active/lib python scripts/scramble.py -e pbs_python

I get the following output:

src/pbs_wrap.c:2813: warning: function declaration isn't a prototype
gcc -pthread -shared build/temp.linux-x86_64-2.7/src/pbs_wrap.o -L/opt/torque/4.2.7/lib -L. -ltorque -lpython2.7 -o build/lib.linux-x86_64-2.7/_pbs.so -L/opt/torque/4.2.7/lib -ltorque -Wl,-rpath -Wl,/opt/torque/4.2.7/lib
/usr/bin/ld: cannot find -lpython2.7

LD_LIBRARY_PATH="/opt/python/2.7.6/lib:/opt/torque/active/lib:/usr/local/lib"

I'm not sure why it can't find the libpython2.7.so file.   When I built it as root there is a -L/opt/python/2.7.6/lib in that gcc line.




On 3/3/14, 4:13 PM, Nate Coraor wrote:
Hi Pete,

Your subject says you are unable to build pbs_python using scramble.
Could you provide details on what's not working there?

Galaxy is not going to work with a different version of pbs_python
unless a bit of hacking is done to make it attempt to do so. We test
Galaxy with specific versions of its dependencies, which is why we
control the versions of those dependencies and provide the scramble
script to (hopefully) make it painless to build them yourself, should
it be necessary to do so, as is always the case with pbs_python.

--nate

On Mon, Mar 3, 2014 at 3:57 PM, Pete Schmitt
<Peter.R.Schmitt@dartmouth.edu> wrote:
Following the directions from here:
https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#PBS

I'm trying to get pbs_python to work as I'm using torque for scheduling
galaxy jobs.

Note: This is a fresh install of galaxy from galaxy-dist on CentOS 5.10

I have pbs_python 4.4.0 module installed into a source-built version of
python/2.7.6

I get the following error in the output of run.sh:

galaxy.jobs INFO 2014-03-03 15:46:45,485 Handler 'main' will load all
configured runner plugins
Traceback (most recent call last):
  File
"/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/webapps/galaxy/buildapp.py",
line 39, in app_factory
    app = UniverseApplication( global_conf = global_conf, **kwargs )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/app.py", line 130, in
__init__
    self.job_manager = manager.JobManager( self )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/manager.py", line
31, in __init__
    self.job_handler = handler.JobHandler( app )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/handler.py", line
30, in __init__
    self.dispatcher = DefaultJobDispatcher( app )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/handler.py", line
568, in __init__
    self.job_runners = self.app.job_config.get_job_runner_plugins(
self.app.config.server_name )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/__init__.py", line
449, in get_job_runner_plugins
    module = __import__( module_name )
  File "/nextgen3/galaxy/galaxy-dist-py27/lib/galaxy/jobs/runners/pbs.py",
line 31, in <module>
    raise Exception( egg_message % str( e ) )
Exception:

The 'pbs' runner depends on 'pbs_python' which is not installed or not
configured properly.  Galaxy's "scramble" system should make this
installation
simple, please follow the instructions found at:

    http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster

Additional errors may follow:
pbs-python==4.3.5

This is the job_conf.xml file:

<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="pbs" type="runner"
load="galaxy.jobs.runners.pbs:PBSJobRunner"/>
    </plugins>
    <handlers>
        <handler id="dirigo"/>
    </handlers>
    <destinations default="pbs_default">
        <destination id="pbs_default" runner="pbs"/>
                <param
id="Resource_List">walltime=72:00:00,nodes=1:ppn=4</param>
    </destinations>
</job_conf>

I did not use the scramble system to install the pbs_python module.  I
downloaded the latest version
available and installed it from the root account.


--

Pete Schmitt


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

--
Pete Schmitt
Technical Director: 
   Discovery Cluster
   NH INBRE Grid
   Computational Genetics Lab
   Institute for Quantitative
          Biomedical Sciences
Dartmouth College, HB 6203
L12 Berry/Baker Library
Hanover, NH 03755

Phone: 603-646-8109

http://discovery.dartmouth.edu
http://columbia.dartmouth.edu/grid
http://www.epistasis.org
http://iQBS.org





--
Pete Schmitt
Technical Director: 
   Discovery Cluster
   NH INBRE Grid
   Computational Genetics Lab
   Institute for Quantitative
          Biomedical Sciences
Dartmouth College, HB 6203
L12 Berry/Baker Library
Hanover, NH 03755

Phone: 603-646-8109

http://discovery.dartmouth.edu
http://columbia.dartmouth.edu/grid
http://www.epistasis.org
http://iQBS.org





--
Pete Schmitt
Technical Director: 
   Discovery Cluster
   NH INBRE Grid
   Computational Genetics Lab
   Institute for Quantitative
          Biomedical Sciences
Dartmouth College, HB 6203
L12 Berry/Baker Library
Hanover, NH 03755

Phone: 603-646-8109

http://discovery.dartmouth.edu
http://columbia.dartmouth.edu/grid
http://www.epistasis.org
http://iQBS.org





--
Pete Schmitt
Technical Director: 
   Discovery Cluster
   NH INBRE Grid
   Computational Genetics Lab
   Institute for Quantitative
          Biomedical Sciences
Dartmouth College, HB 6203
L12 Berry/Baker Library
Hanover, NH 03755

Phone: 603-646-8109

http://discovery.dartmouth.edu
http://columbia.dartmouth.edu/grid
http://www.epistasis.org
http://iQBS.org





--
Pete Schmitt
Technical Director: 
   Discovery Cluster
   NH INBRE Grid
   Computational Genetics Lab
   Institute for Quantitative
          Biomedical Sciences
Dartmouth College, HB 6203
L12 Berry/Baker Library
Hanover, NH 03755

Phone: 603-646-8109

http://discovery.dartmouth.edu
http://columbia.dartmouth.edu/grid
http://www.epistasis.org
http://iQBS.org