History reports misconfiguration of the Galaxy job running system
Hello, I recently encountered a problem when trying to run Cufflinks on eight BAM files on our galaxy instance (via the multi-input toggle) and received the error: "Unable to run job due to a misconfiguration of the Galaxy job running system" for some, but not all, of the cufflinks jobs that appear in the history. These particular BAM files were copied over from the history of a larger workflow where they were successfully run through cufflinks. In the case of the problem workflow run, the cufflinks jobs are all successfully submitted to Torque PBS and continue to run and finish but many have this error displayed in the History. The jobs with the error displayed fail to copy files from the working directory to the database directory despite running to completion. We recently updated to galaxy-central changeset 6131:be6c89c33639. I have seen this error multiple times for this workflow, even after restarting galaxy. Does anyone have any ideas of what might be going wrong? Thanks for any help, Andrew Warren
On Oct 25, 2011, at 2:43 AM, Andrew Warren wrote:
Hello, I recently encountered a problem when trying to run Cufflinks on eight BAM files on our galaxy instance (via the multi-input toggle) and received the error: "Unable to run job due to a misconfiguration of the Galaxy job running system" for some, but not all, of the cufflinks jobs that appear in the history. These particular BAM files were copied over from the history of a larger workflow where they were successfully run through cufflinks. In the case of the problem workflow run, the cufflinks jobs are all successfully submitted to Torque PBS and continue to run and finish but many have this error displayed in the History. The jobs with the error displayed fail to copy files from the working directory to the database directory despite running to completion. We recently updated to galaxy-central changeset 6131:be6c89c33639. I have seen this error multiple times for this workflow, even after restarting galaxy. Does anyone have any ideas of what might be going wrong?
Hi Andrew, The error you're receiving indicates that there should also be a traceback logged to the Galaxy server log file or output when this occurs. Could you check the log/output for such a traceback? Thanks, --nate
Thanks for any help, Andrew Warren ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Nate, I am running in daemon mode so this out of paster.log. The job gets submitted to the PBS queue normally but the error shows up in history panel right away (also the working directory shows up during the cufflinks run): galaxy.jobs DEBUG 2011-10-24 17:51:25,567 dispatching job 595 to pbs runner galaxy.jobs INFO 2011-10-24 17:51:25,772 job 595 dispatched galaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,153 (595) submitting file /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/pbs/595.sh galaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,155 (595) command is: python /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/tools/ngs_rna/cufflinks_wrapper.py --input=/opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/001/dataset_1866.dat --assembled-isoforms-output=/opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/002/dataset_2161.dat --num-threads="6" -I 300000 -F 0.05 -j 0.05 -g /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/001/dataset_1699.dat -b --ref_file=/opt/rnaseq_data/indices/bowtie/Salmonella/14028S/Salmonella_enterica_subsp_enterica_serovar_Typhimurium_str_14028S.fna --dbkey=14028S --index_dir=/opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/tool-data galaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,157 (595) queued in batch queue as 303.localhost galaxy.jobs DEBUG 2011-10-24 17:51:26,599 dispatching job 596 to pbs runner galaxy.jobs INFO 2011-10-24 17:51:26,738 job 596 dispatchedgalaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,767 (595/303.localhost) PBS job state changed from N to R Then later with no errors in paster.log when the cufflinks job is finishing (notice that it doesn't try to copy the contents of the working directory for this job like it normally does because galaxy thinks the job wasn't submitted to the queue even though it was): galaxy.jobs.runners.pbs DEBUG 2011-10-24 19:05:57,402 (595/303.localhost) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-10-24 19:05:57,402 (596/304.localhost) PBS job state changed from Q to R galaxy.jobs.runners.pbs DEBUG 2011-10-24 21:29:01,995 (596/304.localhost) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-10-24 21:29:01,995 (597/305.localhost) PBS job state changed from Q to R galaxy.jobs DEBUG 2011-10-24 21:29:02,999 finish(): Moved /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/job_working_directory/596/isoforms.fpkm_tracking to /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/002/dataset_2163.dat as directed by from_work_dir galaxy.jobs DEBUG 2011-10-24 21:29:03,555 finish(): Moved /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/job_working_directory/596/genes.fpkm_tracking to /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/002/dataset_2162.dat as directed by from_work_dir Thanks, Andrew On Mon, Nov 7, 2011 at 4:00 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Oct 25, 2011, at 2:43 AM, Andrew Warren wrote:
Hello, I recently encountered a problem when trying to run Cufflinks on eight BAM files on our galaxy instance (via the multi-input toggle) and received the error: "Unable to run job due to a misconfiguration of the Galaxy job running system" for some, but not all, of the cufflinks jobs that appear in the history. These particular BAM files were copied over from the history of a larger workflow where they were successfully run through cufflinks. In the case of the problem workflow run, the cufflinks jobs are all successfully submitted to Torque PBS and continue to run and finish but many have this error displayed in the History. The jobs with the error displayed fail to copy files from the working directory to the database directory despite running to completion. We recently updated to galaxy-central changeset 6131:be6c89c33639. I have seen this error multiple times for this workflow, even after restarting galaxy. Does anyone have any ideas of what might be going wrong?
Hi Andrew,
The error you're receiving indicates that there should also be a traceback logged to the Galaxy server log file or output when this occurs. Could you check the log/output for such a traceback?
Thanks, --nate
Thanks for any help, Andrew Warren ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Nov 8, 2011, at 6:34 PM, Andrew Warren wrote:
Hi Nate,
I am running in daemon mode so this out of paster.log.
The job gets submitted to the PBS queue normally but the error shows up in history panel right away (also the working directory shows up during the cufflinks run):
Hi Andrew, The job should not even be able to be submitted to the PBS queue. The error you're seeing in the history (Unable to run job due to a misconfiguration of the Galaxy job running system. Please contact a site administrator.) is logged when the job manager tries to place the job in the queue of a job runner which has been defined in the Galaxy configuration but has not properly loaded. This is right after a statement which should log the failure: log.error( 'put(): (%s) Invalid job runner: %s' % ( job_wrapper.job_id, runner_name ) ) I'm not sure how that message could not appear in the log when the associated message is appearing in the history item. Can you confirm that the output below was for a job that immediately failed with the above error message? By any chance, are you running multiple Galaxy servers simultaneously with job running enabled? --nate
galaxy.jobs DEBUG 2011-10-24 17:51:25,567 dispatching job 595 to pbs runner galaxy.jobs INFO 2011-10-24 17:51:25,772 job 595 dispatched galaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,153 (595) submitting file /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/pbs/595.sh galaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,155 (595) command is: python /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/tools/ngs_rna/cufflinks_wrapper.py --input=/opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/001/dataset_1866.dat --assembled-isoforms-output=/opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/002/dataset_2161.dat --num-threads="6" -I 300000 -F 0.05 -j 0.05 -g /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/001/dataset_1699.dat -b --ref_file=/opt/rnaseq_data/indices/bowtie/Salmonella/14028S/Salmonella_enterica_subsp_enterica_serovar_Typhimurium_str_14028S.fna --dbkey=14028S --index_dir=/opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/tool-data galaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,157 (595) queued in batch queue as 303.localhost galaxy.jobs DEBUG 2011-10-24 17:51:26,599 dispatching job 596 to pbs runner galaxy.jobs INFO 2011-10-24 17:51:26,738 job 596 dispatchedgalaxy.jobs.runners.pbs DEBUG 2011-10-24 17:51:26,767 (595/303.localhost) PBS job state changed from N to R
Then later with no errors in paster.log when the cufflinks job is finishing (notice that it doesn't try to copy the contents of the working directory for this job like it normally does because galaxy thinks the job wasn't submitted to the queue even though it was):
galaxy.jobs.runners.pbs DEBUG 2011-10-24 19:05:57,402 (595/303.localhost) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-10-24 19:05:57,402 (596/304.localhost) PBS job state changed from Q to R galaxy.jobs.runners.pbs DEBUG 2011-10-24 21:29:01,995 (596/304.localhost) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-10-24 21:29:01,995 (597/305.localhost) PBS job state changed from Q to R galaxy.jobs DEBUG 2011-10-24 21:29:02,999 finish(): Moved /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/job_working_directory/596/isoforms.fpkm_tracking to /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/002/dataset_2163.dat as directed by from_work_dir galaxy.jobs DEBUG 2011-10-24 21:29:03,555 finish(): Moved /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/job_working_directory/596/genes.fpkm_tracking to /opt/hts_software/galaxy-parent/galaxy_server/galaxy-dist/database/files/002/dataset_2162.dat as directed by from_work_dir
Thanks, Andrew
On Mon, Nov 7, 2011 at 4:00 PM, Nate Coraor <nate@bx.psu.edu> wrote: On Oct 25, 2011, at 2:43 AM, Andrew Warren wrote:
Hello, I recently encountered a problem when trying to run Cufflinks on eight BAM files on our galaxy instance (via the multi-input toggle) and received the error: "Unable to run job due to a misconfiguration of the Galaxy job running system" for some, but not all, of the cufflinks jobs that appear in the history. These particular BAM files were copied over from the history of a larger workflow where they were successfully run through cufflinks. In the case of the problem workflow run, the cufflinks jobs are all successfully submitted to Torque PBS and continue to run and finish but many have this error displayed in the History. The jobs with the error displayed fail to copy files from the working directory to the database directory despite running to completion. We recently updated to galaxy-central changeset 6131:be6c89c33639. I have seen this error multiple times for this workflow, even after restarting galaxy. Does anyone have any ideas of what might be going wrong?
Hi Andrew,
The error you're receiving indicates that there should also be a traceback logged to the Galaxy server log file or output when this occurs. Could you check the log/output for such a traceback?
Thanks, --nate
Thanks for any help, Andrew Warren ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
Andrew Warren
-
Nate Coraor