---------- Forwarded message ----------
From: Wenkai Wang (Kevin) <wenkai_wang@shbiochip.com>
Date: Thu, May 11, 2017 at 8:01 PM
Subject: Galaxy Error while Using PBS


I am an engineer at a National Research and Engineering Center in Shanghai. Recently I am trying the outstanding Galaxy system developed by your team. The Galaxy system is great; however, there are some errors when Galaxy is connected to Torque PBS, about which errors I have to ask for help and advice from you.

When I tried to execute a tool for output like "Hello Galaxy" from within the webpage, it was shown in webpage that "Unable to run this job due to a cluster error, please retry it later" (ref.: line 321 of file galaxy/lib/galaxy/jobs/runners/pbs.py). Meanwhile, the STDOUT of Linux shell displays information including,

galaxy.jobs.runners.pbs DEBUG 2017-05-12 11:41:47,459 (25) submitting file /data3/wangwk/app/galaxy/database/pbs/25.sh
galaxy.jobs.runners.pbs WARNING 2017-05-12 11:41:47,460 (25) pbs_submit failed (try 1/5), PBS error 15033: No free connections
......
galaxy.jobs.runners.pbs WARNING 2017-05-12 11:41:55,470 (25) pbs_submit failed (try 5/5), PBS error 15033: No free connections
galaxy.jobs.runners.pbs ERROR 2017-05-12 11:41:57,473 (25) All attempts to submit job failed
galaxy.model.metadata DEBUG 2017-05-12 11:41:57,574 Cleaning up external metadata files
galaxy.model.metadata DEBUG 2017-05-12 11:41:57,604 Failed to cleanup MetadataTempFile temp files from /data3/wangwk/app/galaxy/database/jobs_directory/000/25/metadata_out_HistoryDatasetAssociation_26_wrVwaf: No JSON object could be decoded

Further information: (1) Linux version: RHEL 6.2, 2.6.32-220.el6.x86_64; (2) Torque PBS 4.2; (3) pbs-python: 4.4.1.2 (pbs-python 4.4.2.1 requires CentOS 7 with PBS Torque 5, and thus was not chosen); (4) glibc 2.12; (5) munge service has been installed, configured, and started on the master node and compute nodes of the cluster; (6) job_conf.xml file is attached with this email.

Also, I tried sara_nodes.py from webpage https://oss.trac.surfsara.nl/pbs_python/wiki/TorqueExamples . The error information is,

Traceback (most recent call last):
  File "./sara_nodes.py", line 644, in <module>
    print_overview_normal(args.nodes)
  File "./sara_nodes.py", line 298, in print_overview_normal
    matched, rest = print_get_nodes(hosts)
  File "./sara_nodes.py", line 199, in print_get_nodes
    pbsq         = PBSQuery.PBSQuery()
  File "/home/wangwk/app/anaconda2/lib/python2.7/site-packages/pbs/PBSQuery.py", line 137, in __init__
    self.job_server_id = list(self.get_serverinfo())[0]
IndexError: list index out of range

The error above means function self.get_serverinfo() returns nothing and thus the size of list created based on it is zero. This might be useful for analysis.

It would be deeply appreciated if you could offer some guide regarding how to further track or address this problem.

Thanks a lot!


Best Wishes,
Wenkai (Kevin)

------------------------------
Wenkai Wang, National Research and Engineering Center for Biochip
Email: wenkai_wang@shbiochip.com
Email: ww288@cantab.net