Re: [galaxy-dev] setup grid engine

9 Apr 2013

      Hi Nate,

Yes, the directory does exist in the NFS share. I can see galaxy_15.sh exists in this directory, but not the galaxy_15.o and galaxy_15.e file.

$ ls -l /net/rowley/ifs/data/love/love-galaxy/job_working_directory/000/15/
total 129
-rwxrwxr-x 1 love-galaxy love-galaxy 1889 Apr  8 21:15 galaxy_15.sh
-rwxrwx--- 1 love-galaxy love-galaxy 2543 Apr  8 21:15 metadata_in_HistoryDatasetAssociation_15_cJgqA4
-rwxrwx--- 1 love-galaxy love-galaxy   20 Apr  8 21:15 metadata_kwds_HistoryDatasetAssociation_15_KHQqa8
-rwxrwx--- 1 love-galaxy love-galaxy    0 Apr  8 21:15 metadata_out_HistoryDatasetAssociation_15_CztZ4c
-rwxrwx--- 1 love-galaxy love-galaxy    2 Apr  8 21:15 metadata_override_HistoryDatasetAssociation_15_3UAQKB
-rwxrwx--- 1 love-galaxy love-galaxy   41 Apr  8 21:15 metadata_results_HistoryDatasetAssociation_15_vvt6Jv

Thanks,

Jingzhi

On Apr 9, 2013, at 8:25 AM, Nate Coraor wrote:

On Apr 8, 2013, at 9:24 PM, Jingzhi Zhu wrote:

Hi Nate,

That's it! After I made that change, I was able to see jobs are successfully submitted to the SGE queue and then run. So this is great!

I now run into another issue.

galaxy.jobs.runners.drmaa DEBUG 2013-04-08 21:16:56,386 (15/530948) state change: job finished, but failed
galaxy.jobs.runners ERROR 2013-04-08 21:16:56,526 (15/530948) Job output not returned from cluster: [Errno 2] No such file or directory: '/net/rowley/ifs/data/love/love-galaxy/job_working_directory/000/15/galaxy_15.o'
galaxy.jobs DEBUG 2013-04-08 21:16:56,559 Tool did not define exit code or stdio handling; checking stderr for success
galaxy.jobs DEBUG 2013-04-08 21:16:56,603 setting dataset state to ERROR
galaxy.jobs DEBUG 2013-04-08 21:16:56,796 job 15 ended

Google search shows a lot of similar errors, but it is never clear how to resolve it. When this job is run, qstat -j shows the following

stderr_path_list:           NONE:KI-GALAXY:/net/rowley/ifs/data/love/love-galaxy/job_working_directory/000/15/galaxy_15.e

stdout_path_list:           NONE:KI-GALAXY:/net/rowley/ifs/data/love/love-galaxy/job_working_directory/000/15/galaxy_15.o

script_file:                /net/rowley/ifs/data/love/love-galaxy/job_working_directory/000/15/galaxy_15.sh

I can see the galaxy_15.sh exists in the above directory, but the galaxy_15.o and galaxy_15.e do not exist during the job run. Where are those files saved? Do you have any idea how to fix it? Thank you so much!

Hi Jingzhi,

Please use "reply all" to keep replies on the list.

Does that directory exist on the cluster?  A shared filesystem mounted at the same path is (currently) required between Galaxy and the cluster.

--nate

Jingzhi

On Apr 8, 2013, at 4:46 PM, Nate Coraor wrote:

Hi Jingzhi,

Unfortunately, there's a mistake in the sample config for that runner plugin, the line to load the plugin should be:

  <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/>

The sample config was fixed in a later commit to the stable branch, and seeing as there was a security release today I would suggest just updating your Galaxy installation to the latest stable commit anyway:

% hg pull
% hg update stable

--nate

On Apr 8, 2013, at 4:14 PM, Jingzhi Zhu wrote:

I have downloaded the 04_01 release and tried to configure the Sun Grid Engine so that the job can run on our cluster.

I have export the DRMAA_LIBRARY_PATH environment variable (export DRMAA_LIBRARY_PATH=/home/love-galaxy/bin/libdrmaa.so.1.0). Then copy job_conf.xml.sample_advanced to job_conf.xml. I have deleted some lines on the plugins section so it looks like this in job_conf.xml

  <plugins workers="4">
      <!-- "workers" is the number of threads for the runner's work queue.
           The default from <plugins> is used if not defined for a <plugin>.
        -->
      <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner"/>
      <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAARunner"/>
   </plugins>

The run.sh returns the following error

galaxy.tools.imp_exp DEBUG 2013-04-08 15:57:12,465 Loaded history export tool: __EXPORT_HISTORY__
galaxy.tools.imp_exp DEBUG 2013-04-08 15:57:12,466 Loaded history import tool: __IMPORT_HISTORY__
galaxy.tools.genome_index DEBUG 2013-04-08 15:57:12,472 Loaded genome index tool: __GENOME_INDEX__
galaxy.jobs.manager DEBUG 2013-04-08 15:57:12,474 Starting job handler
galaxy.jobs.runners DEBUG 2013-04-08 15:57:12,475 Starting 4 LocalRunner workers
galaxy.jobs DEBUG 2013-04-08 15:57:12,477 Loaded job runner 'galaxy.jobs.runners.local:LocalJobRunner' as 'local'
Traceback (most recent call last):
File "/net/rowley/ifs/data/love/love-galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/buildapp.py", line 37, in app_factory
  app = UniverseApplication( global_conf = global_conf, **kwargs )
File "/net/rowley/ifs/data/love/love-galaxy/galaxy-dist/lib/galaxy/app.py", line 159, in __init__
  self.job_manager = manager.JobManager( self )
File "/net/rowley/ifs/data/love/love-galaxy/galaxy-dist/lib/galaxy/jobs/manager.py", line 31, in __init__
  self.job_handler = handler.JobHandler( app )
File "/net/rowley/ifs/data/love/love-galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 29, in __init__
  self.dispatcher = DefaultJobDispatcher( app )
File "/net/rowley/ifs/data/love/love-galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 543, in __init__
  self.job_runners = self.app.job_config.get_job_runner_plugins()
File "/net/rowley/ifs/data/love/love-galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 476, in get_job_runner_plugins
  runner_class = getattr( module, class_name )
AttributeError: 'module' object has no attribute 'DRMAARunner'

Can someone point to me what is going on here? If you have configured the SGE successfully with the latest 4.1 release, can you show me what the file job_conf.xml should look like? There are a lot of tags in this new XML file and I found it is hard to  do it right.

Thanks!

Jingzhi

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/