Issue when enabling use_tasked_jobs with torque and nfs
Hi, In my environment when I activate 'use_tasked_jobs' I'm getting this message in "Info" with ncbi_blast wrappers: /local/opt/galaxy/galaxy-dist.8082/database/job_working_directory/000/9/task_0: Job output not returned from cluster Still the dataset with the blast result is green and contains the expected information. In the log I see: galaxy.jobs.runners.drmaa WARNING 2012-11-30 13:45:32,760 Job output not returned from cluster: [Errno 2] No such file or directory: '/local/opt/galaxy/galaxy-dist.8082/database/job_working_directory/000/9/task_0/9:1.drmout' And from torque I get this email: PBS Job Id: 1013.head Job Name: g9_toolshed_g2_bx_psu_edu_repos_devteam_ncbi_blast_plus_ncbi_blastn_wrapper_0_0_14_carlos_borroto_gmail_com Exec host: node01/0 An error has occurred processing your job, see below. Post job file processing error; job 1013.head on host node01/0 Unable to copy file /var/spool/torque/spool/1013.head.OU to galaxy@/local/opt/galaxy/galaxy-dist.8082/database/job_working_directory/000/9 /task_0/9:1.drmout *** error from copy cp: cannot create regular file `galaxy@/local/opt/galaxy/galaxy-dist.8082/database/job_working_directory/000/9/task_0/9:1.drmout': No such file or directory *** end error output Output retained on that host in: /var/spool/torque/undelivered/1013.head.OU Unable to copy file /var/spool/torque/spool/1013.head.ER to galaxy@/local/opt/galaxy/galaxy-dist.8082/database/job_working_directory/000/9 /task_0/9:1.drmerr *** error from copy cp: cannot create regular file `galaxy@/local/opt/galaxy/galaxy-dist.8082/database/job_working_directory/000/9/task_0/9:1.drmerr': No such file or directory *** end error output Output retained on that host in: /var/spool/torque/undelivered/1013.head.ER I think the problem is I'm forcing torque to use my nfs mounted shared file system: # cat /var/spool/torque/mom_priv/config $pbsserver head $usecp *:/home /home $usecp *:/local /local I can confirm this configuration is working fine with any other job not submitted through galaxy, both files are correctly copied to the working dir in that case. I'm using drmaa runner. I tried pbs, but in that case even without enabling tasked_jobs, while the err and out files are correctly copied and everything seems fine as far as I can tell, galaxy complains it can not find them. I tried increasing "retry_job_output_collection", but that didn't help. Cheers, Carlos
participants (1)
-
Carlos Borroto