On Jan 5, 2012, at 11:46 AM, Ryan Golhar wrote:
I have three bam files that I'm trying to merge. The job runs for a bit then errors saying there is a problem. When I look at runner0.log I see:
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) submitting file /home/galaxy/galaxy-dist-9/database/pbs/19.sh galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) command is: java -Xmx2G -jar /home/galaxy/galaxy-dist-9/tool-data/shared/jars/picard/MergeSamFiles.jar MSD=true VALIDATION_STRINGENCY=LENIENT O=/home/galaxy/galaxy-dist-9/database/files/000/dataset_96.dat I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_04-1-Idx_1-1.bam I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_05-2-Idx_1-1.bam
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,206 (19) queued in default queue as 29.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,253 (19/29.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job state changed from R to C galaxy.jobs.runners.pbs ERROR 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job failed: Unknown error: -10 galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,272 Cleaning up external metadata files galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o: No JSON object could be decoded: line 1 column 0 (char 0) galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4: No JSON object could be decoded: line 1 column 0 (char 0)
It looks like the job failed with "PBS job failed: Unknown error: -10". Is there a way to set up Galaxy to keep job submission files? I thought
I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_06-3-Idx_1-1.bam 2> /home/galaxy/galaxy-dist-9/database/files/000/dataset_97.dat || echo "Error running Picard MergeSamFiles" >&2; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/19/galaxy.json database/tmp/metadata_in_HistoryDatasetAssociation_16_MuM7TP,database/tmp/metadata_kwds_HistoryDatasetAssociation_16_7w3ZvM,database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4,database/tmp/metadata_results_HistoryDatasetAssociation_16_cRV8Uj,,database/tmp/metadata_override_HistoryDatasetAssociation_16_5n8tqL database/tmp/metadata_in_HistoryDatasetAssociation_15_SUEdLf,database/tmp/metadata_kwds_HistoryDatasetAssociation_15_nt8MTH,database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o,database/tmp/metadata_results_HistoryDatasetAssociation_15_7PhiXX,,database/tmp/metadata_override_HistoryDatasetAssociation_15_e_bdeW there was but don't see the option in universe_wsgi.ini for this.
Hrm, there's no PBS exit code for -10:
http://www.clusterresources.com/torquedocs/2.7jobexitstatus.shtml
In recent versions, the setting in the Galaxy config to keep job files is:
cleanup_job = never
I discovered the problem. My pbs queue has a wall time restriction of 3600 seconds. Is there a way to configure Galaxy to keep the job files for only failed jobs? I'd like to keep these two settings on, but find it unnecessary if jobs successfully complete.