Hello again Nate,
I’ve uninstalled and reinstalled a fresh package of galaxy and torque.
I’ve also configure, make and install pbs_drmaa.
Finally, when I start galaxy, the main works but the handler0 doesn’t.
Here is the error:
galaxy.jobs.manager DEBUG 2014-12-05 13:12:10,035 Starting job handler galaxy.jobs INFO 2014-12-05 13:12:10,035 Handler 'handler0' will load all configured runner plugins galaxy.jobs.runners.state_handler_factory DEBUG 2014-12-05 13:12:10,040 Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit Traceback (most recent call last): File "/home/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/buildapp.py", line 44, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/home/galaxy/galaxy-dist/lib/galaxy/app.py", line 136, in __init__ self.job_manager = manager.JobManager( self ) File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/manager.py", line 23, in __init__ self.job_handler = handler.JobHandler( app ) File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 32, in __init__ self.dispatcher = DefaultJobDispatcher( app ) File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 715, in __init__ self.job_runners = self.app.job_config.get_job_runner_plugins( self.app.config.server_name ) File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 626, in get_job_runner_plugins rval[id] = runner_class( self.app, runner[ 'workers' ], **runner.get( 'kwds', {} ) ) File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/runners/drmaa.py", line 61, in __init__ drmaa = __import__( "drmaa" ) File "build/bdist.linux-x86_64/egg/drmaa/__init__.py", line 63, in <module> File "build/bdist.linux-x86_64/egg/drmaa/session.py", line 39, in <module> File "build/bdist.linux-x86_64/egg/drmaa/helpers.py", line 36, in <module> File "build/bdist.linux-x86_64/egg/drmaa/wrappers.py", line 56, in <module> File "/usr/lib64/python2.6/ctypes/__init__.py", line 353, in __init__ self._handle = _dlopen(self._name, mode) OSError: /usr/local/pbs-drmaa/lib/: cannot read file data: Is a directory Removing PID file handler0.pid Exception in thread Thread-1 (most likely raised during interpreter shutdown): Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner File "/usr/lib64/python2.6/threading.py", line 484, in run File "/home/galaxy/galaxy-dist/lib/tool_shed/galaxy_install/update_repository_manager.py", line 93, in __restarter File "/home/galaxy/galaxy-dist/lib/tool_shed/galaxy_install/update_repository_manager.py", line 133, in sleep File "/usr/lib64/python2.6/threading.py", line 137, in release <type 'exceptions.TypeError'>: 'NoneType' object is not callable |
Please advise!
Cordialement / Regards,
Edgar Fernandez
De : Nate Coraor [mailto:nate@bx.psu.edu]
Envoyé : December-05-14 11:27 AM
À : Fernandez Edgar
Cc : galaxy-dev@bx.psu.edu
Objet : Re: [galaxy-dev] galaxy with torque
On Fri, Dec 5, 2014 at 9:13 AM, Fernandez Edgar <edgar.fernandez@umontreal.ca> wrote:
Hello Nate,
Thank you for this explanation, this clears up a lot and I have a better understanding
of how galaxy works.
I still have some points that I would like more clarifications:
1.
What are the purpose of the 4 variables defined under my [server:handler0] section in the file config/galaxy.ini?
At a minimum, use = egg:Paste#http is required as this tells PasteScript, which is used to start the Galaxy server, what web server module will be used. The other variables are optional to
some degree but you must set unique/unused ports for each
[server:].
2. What does the load (="galaxy.jobs.runners.pbs:PBSJobRunner") under the plugin tag defines in the file config/job_conf.xml?
This instructs Galaxy to load a specific module and class as a job running plugin. In this case it is the PBSJobRunner:
3. Finally, when I start galaxy, I see my two process:
a. galaxy 11104 1 84 08:43 ? 00:00:05 python ./scripts/paster.py serve config/galaxy.ini --server-name=main --pid-file=main.pid --log-file=main.log --daemon
b. galaxy 11112 1 83 08:43 ? 00:00:05 python ./scripts/paster.py serve config/galaxy.ini --server-name=handler0 --pid-file=handler0.pid --log-file=handler0.log –daemon
However, the second one fails and here is the error message I’ve been getting:
galaxy.jobs.manager DEBUG 2014-12-05 08:43:19,059 Starting job handler
galaxy.jobs INFO 2014-12-05 08:43:19,059 Handler 'handler0' will load all configured runner plugins
Traceback (most recent call last):
File "/home/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/buildapp.py", line 44, in app_factory
app = UniverseApplication( global_conf = global_conf, **kwargs )
File "/home/galaxy/galaxy-dist/lib/galaxy/app.py", line 136, in __init__
self.job_manager = manager.JobManager( self )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/manager.py", line 23, in __init__
self.job_handler = handler.JobHandler( app )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 32, in __init__
self.dispatcher = DefaultJobDispatcher( app )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 715, in __init__
self.job_runners = self.app.job_config.get_job_runner_plugins( self.app.config.server_name )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 586, in get_job_runner_plugins
module = __import__( module_name )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 32, in <module>
raise Exception( egg_message % str( e ) )
Exception:
The 'pbs' runner depends on 'pbs_python' which is not installed or not
configured properly. Galaxy's "scramble" system should make this installation
simple, please follow the instructions found at:
http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
Additional errors may follow:
/home/galaxy/galaxy-dist/eggs/pbs_python-4.3.5-py2.6-linux-x86_64-ucs4.egg/_pbs.so: undefined symbol: log_record
Removing PID file handler0.pid
Please try the following:
In /home/galaxy/galaxy-dist/eggs.ini, change the version of pbs_python to 4.4.0. Then, re-scramble pbs_python:
% cd /home/galaxy/galaxy-dist
% rm -rf ./eggs/pbs_python-4.3.5-py2.6-linux-x86_64-ucs4.egg
% LIBTORQUE_DIR=/usr/local/torque/lib python ./scripts/scramble.py -e pbs_python
If pbs_python 4.4.0 does not work, you'll need to use the DRMAA interface to Torque instead.
--nate
Any suggestions is more than welcome since I have a lot of pressure to make this work.
Thanks gents !!!
Cordialement / Regards,
Edgar Fernandez
De : Nate Coraor [mailto:nate@bx.psu.edu]
Envoyé : December-04-14 4:09 PM
À : Fernandez Edgar
Cc : John Chilton; galaxy-dev@bx.psu.edu
Objet : Re: [galaxy-dev] galaxy with torque
On Thu, Dec 4, 2014 at 11:03 AM, Fernandez Edgar <edgar.fernandez@umontreal.ca> wrote:
Good morning gents,
I found one of your previous answers on the internet.
And it made me figured out my problem with job_conf.xml
So I finally made galaxy start without a glitch.
First, I added the following lines in the config/galaxy.ini file:
[server:handler0]
use = egg:Paste#http
port = 9010
use_threadpool = True
threadpool_workers = 10
I’ve also changed the config/job.conf.xml:
<?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner"/>
</plugins>
<handlers>
<handler id="handler0"/>
</handlers>
<destinations default="torque">
<destination id="torque" runner="pbs"/>
</destinations>
</job_conf>
Now, I’m uncertain what needs to listen to the port number 9010 under [server:handler0] section.
Hi Edgar,
There are many ways to run Galaxy servers. By default, if starting using the provided run.sh script, Galaxy starts and runs in a single process, which is defined by the [server:main] section in galaxy.ini. If your intent is to run Galaxy in this default single process setup, you can remove the [server:handler0] section from galaxy.ini and set your handler id in job_conf.xml to "main" like so:
<handlers default="main">
<handler id="main/>
</handlers>
However, for most Galaxy servers that see a moderate amount of use, it is a good idea to run multiple processes. At a minimum, this would be one process which serves web requests (which is typically proxied by a traditional webserver such as Apache) but which does not handle running Galaxy jobs, and a second process which does not serve web requests, but which handles running jobs. In that case you can keep galaxy.ini as you have it now (with both a [server:main] section and a [server:handler0] section), and with the handler in job_conf.xml defined as <handler id="handler0"/>.
However, run.sh run without arguments will only start the server process defined as [server:main]. To start all server processes in galaxy.ini, use:
% GALAXY_RUN_ALL=1 sh run.sh --daemon
Documentation on multiprocess Galaxy setups can be found at:
Other documentation on running a "production" Galaxy service (including using a proxy server) can be found at:
--nate
Cordialement / Regards,
Edgar Fernandez
De : John Chilton [mailto:jmchilton@gmail.com]
Envoyé : December-03-14 9:53 AM
À : Fernandez Edgar
Cc : galaxy-dev@bx.psu.edu
Objet : Re: [galaxy-dev] galaxy with torque
That handle id seems wrong... at least it is not what I am used to. It needs to match a server section specified in your galaxay ini file - usually this is a simple string like handler0 or something.
It looks like the specific error is :
/home/galaxy/galaxy-dist/eggs/pbs_python-4.3.5-py2.6-linux-x86_64-ucs4.egg/_pbs.so: undefined symbol: log_record
I am not sure what causes this - some subtle incompatibility.
So the pbs_python 4.4.0 egg is available on eggs.galaxyproject.org (http://eggs.galaxyproject.org/pbs_python/pbs_python-4.4.0.tar.gz) - I think it may be needed for newer torque versions - do you want to update the version specified in eggs.ini, delete the old egg, and rescramble the egg with 4.4.0?
-John
P.S. Since you are setting up a new server I would strongly suggest using postgres instead of MySQL - but the previous comment about it not needing to be accessed on the compute servers is correct.
On Tue, Dec 2, 2014 at 3:10 PM, Fernandez Edgar <edgar.fernandez@umontreal.ca> wrote:
Hello again,
I’m very close in making pbs_python work but I’m hitting a new wall.
So I’ve created the file config/job_conf.xml which looks like this
<?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner"/>
</plugins>
<handlers default="gavroche.esi.umontreal.ca">
<handler id="gavroche.esi.umontreal.ca" tags="pbs"/>
</handlers>
<destinations default="pbs_default">
<destination id="pbs_default" runner="pbs" tags="mycluster"/>
<destination id="pbs_longjobs" runner="pbs" tags="mycluster,longjobs">
<param id="Resource_List">walltime=72:00:00</param>
</destination>
</destinations>
</job_conf>
gavroche.esi.umontreal.ca is my torque server.
I know that in your documentation it doesn’t say to put in a handlers tag but galaxy doesn’t parse the xml without it.
Now, once I try to start galaxy, I get the error you see in the file paster.log attached to this email.
Can anyone help please?
Cordialement / Regards,
Edgar Fernandez
De : Fernandez Edgar
Envoyé : December-02-14 1:33 PM
À : 'Rémy Dernat'
Cc : John Chilton; galaxy-dev@bx.psu.edu
Objet : RE: Re : [galaxy-dev] galaxy with torque
Thank you for that correction.
Just a small FYI (maybe it will be useful to update the wiki)…
I had to export three variables to make the scramble possible:
export PBS_PYTHON_INCLUDEDIR=/usr/local/torque/include/
export PBSCONFIG=/usr/local/torque/bin/pbs-config
export LIBTORQUE_DIR=/usr/local/torque/lib/libtorque.so
python scripts/scramble.py -e pbs_python
Cordialement / Regards,
Edgar Fernandez
De : Rémy Dernat [mailto:remy.d1@gmail.com]
Envoyé : December-02-14 11:49 AM
À : Fernandez Edgar
Cc : John Chilton; galaxy-dev@bx.psu.edu
Objet : Re: Re : [galaxy-dev] galaxy with torque
Sorry for answer 7. There is no benefit to do that. Once the egg is done, there is nothing to do from here, except if you change your python version... If that variable is empty, that is normal, because it is not an environment variable, it is just used by the following python command:
LIBTORQUE_DIR=/path/to/libtorque python scripts/scramble.py -e pbs_python
2014-12-02 16:27 GMT+01:00 Rémy Dernat <remy.d1@gmail.com>:
Hi Edgar,
You are right. It is very annoying...
So, to answer your questions:
1/ First answer of google / wikipedia with DRMAA : http://en.wikipedia.org/wiki/DRMAA
It is a pattern to talk with any DRM system (SGE, torque, whatever...)
2/ This is a library (python version) for your Torque installation.
3/ MySQL access is only needed by your galaxy frontend.
4/ Internet access is not required for your compute node (except the galaxy one), but it is better if you want to use a package manager on your compute nodes, for example...
5/ On my part, I use permissions like 760 on galaxy directory. It depends on your needs... Some applications might need an access to your galaxy installation, but you should split binaries, the galaxy installation and your data (datasets, libraries...). But do not forget to share this folders by NFS (if needed).
6/ Sorry, no idea; but I see no reason for that to become unavailable, if your proxy is well configured.
7/ You have to put this command line into a file which will be sourced like $HOME_GALAXY/.bashrc or your environment file ("environment_setup_file" in universe_wsgi.ini or config/galaxy.ini)
Regards,
Remy
2014-12-02 13:59 GMT+01:00 Fernandez Edgar <edgar.fernandez@umontreal.ca>:
Hi everyone,
I’m guessing Remy, you click on the send button by mistake on your HTC device.
It happens to me ALL the time…
I wanted to take this opportunity to add some questions to the three questions in my previous email.
So here goes:
4. Is it necessary that my torque compute nodes have internet access?
Because right now, only my torque server and my galaxy server has internet access.
However, communication between the submit node (a.k.a. galaxy server) and the torque server is enabled.
Likewise, between the torque server and the compute nodes.
5. Furthermore, would I disable any galaxy functionalities if I change the permissions on the whole galaxy installation directory like so: chmod –R 700 galaxy_install_directory
I have created a user galaxy for running everything that is galaxy.
6. I’ve actually installed my galaxy server using a proxy server (as described on your web site). Can I still use the Report Tool functionalities on galaxy?
7. Any instructions on making PBS (running jobs via TORQUE resource manager), because I have successfully scrambled the egg pbs_python with the command:
LIBTORQUE_DIR=/path/to/libtorque python scripts/scramble.py -e pbs_python.
However, LIBTORQUE_DIR is empty.
Once again, thank you in advance for all your help!
Cordialement / Regards,
Edgar Fernandez
De : remy.d1@gmail.com [mailto:remy.d1@gmail.com]
Envoyé : December-02-14 6:34 AM
À : Fernandez Edgar; John Chilton
Cc : galaxy-dev@bx.psu.edu
Objet : Re : [galaxy-dev] galaxy with torque
1 a drmaa d
Envoyé depuis mon HTC
----- Reply message -----
De : "Fernandez Edgar" <edgar.fernandez@umontreal.ca>
Pour : "John Chilton" <jmchilton@gmail.com>
Cc : "galaxy-dev@bx.psu.edu" <galaxy-dev@bx.psu.edu>
Objet : [galaxy-dev] galaxy with torque
Date : mer., nov. 26, 2014 19:09
Hi John,
First, thank you very much for your prompt answer.
It's extremely appreciated.
Secondly, I have some other questions: whatever answers you can provide me with, will be greatly helpful.
Please, forgive my beginner level understanding of an DRM system.
1. Once I compile, make and install the Torque Submit Node code against the server running Galaxy, what exactly is the purpose for an DRMAA ?
2. What is exactly the PBS step describe here: https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
galaxy_user@galaxy_server% LIBTORQUE_DIR=/path/to/libtorque python scripts/scramble.py -e pbs_python
What does it do exactly ?
3. All servers which includes my torque server, torque compute nodes and torque submit node (a.k.a galaxy server) have the galaxy user defines and its home shared on all of them. This means all of them have access (via NFS) of the installation directory of galaxy. But what about the MySQL server access. Does the torque server or the compute nodes need access to that service?
I hope I'm not sending too many emails/questions...
Thank you very much!
Cordialement / Regards,
Edgar Fernandez
-----Message d'origine-----
De : John Chilton [mailto:jmchilton@gmail.com]
Envoyé : November-25-14 12:07 PM
À : Fernandez Edgar
Cc : galaxy-dev@bx.psu.edu
Objet : Re: [galaxy-dev] galaxy with torque
I am not sure we have a walkthrough for Torque specifically - but if you have Galaxy up and running and you can qsub commands to torque - hopefully you have done all of the hard parts.
You will need a DRMAA library for your torque setup - https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
suggests compiling pbs_drmaa and outlines how to set it up. After that you just need to add a plugin and default destination to your job_conf.xml file - also outlined on that wiki page.
Other good resources to consult if you are scaling up your Galaxy this way are:
https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer
https://wiki.galaxyproject.org/Events/GCC2014/TrainingDay/AdminWalkthrough
Good luck and let us know if you encounter any problems.
-John
On Fri, Nov 21, 2014 at 2:30 PM, Fernandez Edgar <edgar.fernandez@umontreal.ca> wrote:
> Hello all,
>
>
>
> My name is Edgar Fernandez. I’m a sys. admin. at University of Montreal.
>
> I’ve contacted you a while back about installing galaxy and I’ve
> successfully done it on a redhat 6 server.
>
>
>
> I see myself in a situation where I need to utilise all my redhat
> servers (who are identical to the server running the galaxy website).
>
>
>
> I’ve also successfully installed a server torque with compute notes
> and clients nodes.
>
>
>
> What are the last step to make the link between galaxy and torque?
>
> Also, once that connection is made, how will galaxy keep track of the
> jobs sent?
>
> I mean who will it know this job that just finished is for this user
> and not another ?
>
>
>
> Also, my torque installation is so that my server running galaxy is a
> submit node and a client node.
>
> I hope this is not a problem.
>
>
>
> Please help!
>
>
>
> Cordialement / Regards,
>
>
>
> Edgar Fernandez
>
> System Administrator (Linux)
>
> Direction Générale des Technologies de l'Information et de la
> Communication
>
> ( Bur. : 1-514-343-6111 poste 16568
>
>
>
> Université de Montréal
>
> PAVILLON ROGER-GAUDRY, bureau X-218
>
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this and other
> Galaxy lists, please use the interface at:
> https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
> http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/