cancelled by slurm -> job is fine
Hi, I have a big problem here. Jobs that are cancelled by slurm appear to galaxy as finished normally. For me this is especially bad because all following workflow steps go on working with corrupted/empty/whatever data. in the job stderr I can find: slurmd[w4]: *** JOB 194 CANCELLED AT 2015-08-06T04:04:38 *** slurmd[w4]: Unable to unlink domain socket: No such file or directory slurmd[w4]: unlink(/tmp/slurm/slurmd_spool/job00194/slurm_script): No such file or directory slurmd[w4]: rmdir(/tmp/slurm/slurmd_spool/job00194): No such file or directory In the galaxy log for that job: galaxy.jobs.runners.drmaa DEBUG 2015-08-06 04:02:10,050 (4278) submitting file /mnt/galaxy/tmp/job_working_directory/004/4278/galaxy_4278.sh galaxy.jobs.runners.drmaa INFO 2015-08-06 04:02:10,056 (4278) queued as 192galaxy.jobs DEBUG 2015-08-06 04:02:10,185 (4278) Persisting job destination (destination id: slurm_cluster) [...] galaxy.jobs.runners.drmaa DEBUG 2015-08-06 04:04:39,525 (4278/192) state change: job finished normallygalaxy.jobs DEBUG 2015-08-06 04:04:45,806 job 4278 ended (finish() executed in (5290.522 ms)) galaxy.datatypes.metadata DEBUG 2015-08-06 04:04:45,837 Cleaning up external metadata files galaxy.datatypes.metadata DEBUG 2015-08-06 04:04:46,100 Failed to cleanup MetadataTempFile temp files from /mnt/galaxy/tmp/job_working_directory/004/4278/metadata_out_HistoryDatasetAssociation_9426_Ib1Niz: No JSON object could be decoded galaxy.datatypes.metadata DEBUG 2015-08-06 04:04:46,397 Failed to cleanup MetadataTempFile temp files from /mnt/galaxy/tmp/job_working_directory/004/4278/metadata_out_HistoryDatasetAssociation_9427_8X77j4: No JSON object could be decoded Can someone please check that? Best, Alexander
participants (1)
-
Alexander Vowinkel