Hi guys,

Since the last upgrade we have observed this error. When submitting a job it sometimes come back failed with the following error 

tool error
An error occurred with this dataset: Unable to run this job due to a cluster error, please retry it later

It is not reproducible as when you relaunch the job it works but still very annoying when demonstrating Galaxy.

It looks to me that there is something wrong with torque/pbs and that when one submit a number of jobs, it reaches an internal limit of some kind, and then stops communicating with torque (the batch system).
 

galaxy.jobs DEBUG 2013-09-24 14:05:08,692 (4264) Working directory for job is: /data/galaxyTools/galaxydev/database/job_working_directory/004/4264

galaxy.jobs.handler DEBUG 2013-09-24 14:05:08,700 (4264) Dispatching to pbs runner

galaxy.jobs.runners.pbs DEBUG 2013-09-24 14:05:08,885 (4263/9034.galaxy-compute) PBS job state changed from N to R

galaxy.jobs DEBUG 2013-09-24 14:05:08,925 (4264) Persisting job destination (destination id: pbs:///)

galaxy.jobs.handler INFO 2013-09-24 14:05:09,014 (4264) Job dispatched

galaxy.jobs.runners.pbs ERROR 2013-09-24 14:05:09,524 (4262) All attempts to submit job failed

galaxy.jobs.runners.pbs DEBUG 2013-09-24 14:05:15,795 (4264) submitting file /data/galaxyTools/galaxydev/database/pbs/4264.sh

galaxy.jobs.runners.pbs DEBUG 2013-09-24 14:05:15,795 (4264) command is: python /data/galaxyTools/galaxydev/tools/fastq/fastq_groomer.py '/data/galaxyTools/galaxydev/database/files/007/dataset_7546.dat' 'sanger' '/data/galaxyTools/galaxydev/database/files/007/dataset_7610.dat' 'sanger' 'ascii' 'summarize_input'; cd /data/galaxyTools/galaxydev; /data/galaxyTools/galaxydev/set_metadata.sh ./database/files /data/galaxyTools/galaxydev/database/job_working_directory/004/4264 . /data/galaxyTools/galaxydev/universe_wsgi.ini /data/galaxyTools/galaxydev/database/tmp/tmpcdfNIZ /data/galaxyTools/galaxydev/database/job_working_directory/004/4264/galaxy.json /data/galaxyTools/galaxydev/database/job_working_directory/004/4264/metadata_in_HistoryDatasetAssociation_9064_Zk1kgC,/data/galaxyTools/galaxydev/database/job_working_directory/004/4264/metadata_kwds_HistoryDatasetAssociation_9064_yBjKtV,/data/galaxyTools/galaxydev/database/job_working_directory/004/4264/metadata_out_HistoryDatasetAssociation_9064_RBQTv6,/data/galaxyTools/galaxydev/database/job_working_directory/004/4264/metadata_results_HistoryDatasetAssociation_9064_zBFTMS,,/data/galaxyTools/galaxydev/database/job_working_directory/004/4264/metadata_override_HistoryDatasetAssociation_9064_0W_zYn

galaxy.jobs.runners.pbs WARNING 2013-09-24 14:05:15,796 (4264) pbs_submit failed (try 1/5), PBS error 15033: No free connections

galaxy.jobs.runners.pbs WARNING 2013-09-24 14:05:17,798 (4264) pbs_submit failed (try 2/5), PBS error 15033: No free connections

galaxy.jobs.runners.pbs WARNING 2013-09-24 14:05:19,800 (4264) pbs_submit failed (try 3/5), PBS error 15033: No free connections


Did you guys have ever experienced the same problem and if so how did you solve it ?

Regards,

Philippe