I have three bam files that I'm trying to merge. The job runs for a bit then errors saying there is a problem. When I look at runner0.log I see:
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) submitting file /home/galaxy/galaxy-dist-9/database/pbs/19.sh galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) command is: java -Xmx2G -jar /home/galaxy/galaxy-dist-9/tool-data/shared/jars/picard/MergeSamFiles.jar MSD=true VALIDATION_STRINGENCY=LENIENT O=/home/galaxy/galaxy-dist-9/database/files/000/dataset_96.dat I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_04-1-Idx_1-1.bam I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_05-2-Idx_1-1.bam
I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_06-3-Idx_1-1.bam 2> /home/galaxy/galaxy-dist-9/database/files/000/dataset_97.dat || echo "Error running Picard MergeSamFiles" >&2; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/19/galaxy.json database/tmp/metadata_in_HistoryDatasetAssociation_16_MuM7TP,database/tmp/metadata_kwds_HistoryDatasetAssociation_16_7w3ZvM,database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4,database/tmp/metadata_results_HistoryDatasetAssociation_16_cRV8Uj,,database/tmp/metadata_override_HistoryDatasetAssociation_16_5n8tqL database/tmp/metadata_in_HistoryDatasetAssociation_15_SUEdLf,database/tmp/metadata_kwds_HistoryDatasetAssociation_15_nt8MTH,database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o,database/tmp/metadata_results_HistoryDatasetAssociation_15_7PhiXX,,database/tmp/metadata_override_HistoryDatasetAssociation_15_e_bdeW galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,206 (19) queued in default queue as 29.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,253 (19/29.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job state changed from R to C galaxy.jobs.runners.pbs ERROR 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job failed: Unknown error: -10 galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,272 Cleaning up external metadata files galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o: No JSON object could be decoded: line 1 column 0 (char 0) galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4: No JSON object could be decoded: line 1 column 0 (char 0)
It looks like the job failed with "PBS job failed: Unknown error: -10". Is there a way to set up Galaxy to keep job submission files? I thought there was but don't see the option in universe_wsgi.ini for this.
Do you mean this (In universe_wsgi.ini)?
... # Debug enables access to various config options useful for development and # debugging: use_lint, use_profile, use_printdebug and use_interactive. It # also causes the files used by PBS/SGE (submission script, output, and error) # to remain on disk after the job is complete. Debug mode is disabled if # commented, but is uncommented by default in the sample config. debug = True
chris
________________________________ From: galaxy-dev-bounces@lists.bx.psu.edu [galaxy-dev-bounces@lists.bx.psu.edu] on behalf of Ryan Golhar [ngsbioinformatics@gmail.com] Sent: Thursday, January 05, 2012 10:46 AM To: galaxy-dev@lists.bx.psu.edu Subject: [galaxy-dev] PBS job failed: Unknown error: -10
I have three bam files that I'm trying to merge. The job runs for a bit then errors saying there is a problem. When I look at runner0.log I see:
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) submitting file /home/galaxy/galaxy-dist-9/database/pbs/19.sh galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) command is: java -Xmx2G -jar /home/galaxy/galaxy-dist-9/tool-data/shared/jars/picard/MergeSamFiles.jar MSD=true VALIDATION_STRINGENCY=LENIENT O=/home/galaxy/galaxy-dist-9/database/files/000/dataset_96.dat I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_04-1-Idx_1-1.bam I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_05-2-Idx_1-1.bam I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_06-3-Idx_1-1.bam 2> /home/galaxy/galaxy-dist-9/database/files/000/dataset_97.dat || echo "Error running Picard MergeSamFiles" >&2; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/19/galaxy.json database/tmp/metadata_in_HistoryDatasetAssociation_16_MuM7TP,database/tmp/metadata_kwds_HistoryDatasetAssociation_16_7w3ZvM,database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4,database/tmp/metadata_results_HistoryDatasetAssociation_16_cRV8Uj,,database/tmp/metadata_override_HistoryDatasetAssociation_16_5n8tqL database/tmp/metadata_in_HistoryDatasetAssociation_15_SUEdLf,database/tmp/metadata_kwds_HistoryDatasetAssociation_15_nt8MTH,database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o,database/tmp/metadata_results_HistoryDatasetAssociation_15_7PhiXX,,database/tmp/metadata_override_HistoryDatasetAssociation_15_e_bdeW galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,206 (19) queued in default queue as 29.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,253 (19/29.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job state changed from R to C galaxy.jobs.runners.pbs ERROR 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job failed: Unknown error: -10 galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,272 Cleaning up external metadata files galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o: No JSON object could be decoded: line 1 column 0 (char 0) galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4: No JSON object could be decoded: line 1 column 0 (char 0)
It looks like the job failed with "PBS job failed: Unknown error: -10". Is there a way to set up Galaxy to keep job submission files? I thought there was but don't see the option in universe_wsgi.ini for this.
On Jan 5, 2012, at 11:46 AM, Ryan Golhar wrote:
I have three bam files that I'm trying to merge. The job runs for a bit then errors saying there is a problem. When I look at runner0.log I see:
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) submitting file /home/galaxy/galaxy-dist-9/database/pbs/19.sh galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) command is: java -Xmx2G -jar /home/galaxy/galaxy-dist-9/tool-data/shared/jars/picard/MergeSamFiles.jar MSD=true VALIDATION_STRINGENCY=LENIENT O=/home/galaxy/galaxy-dist-9/database/files/000/dataset_96.dat I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_04-1-Idx_1-1.bam I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_05-2-Idx_1-1.bam I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_06-3-Idx_1-1.bam 2> /home/galaxy/galaxy-dist-9/database/files/000/dataset_97.dat || echo "Error running Picard MergeSamFiles" >&2; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/19/galaxy.json database/tmp/metadata_in_HistoryDatasetAssociation_16_MuM7TP,database/tmp/metadata_kwds_HistoryDatasetAssociation_16_7w3ZvM,database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4,database/tmp/metadata_results_HistoryDatasetAssociation_16_cRV8Uj,,database/tmp/metadata_override_HistoryDatasetAssociation_16_5n8tqL database/tmp/metadata_in_HistoryDatasetAssociation_15_SUEdLf,database/tmp/metadata_kwds_HistoryDatasetAssociation_15_nt8MTH,database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o,database/tmp/metadata_results_HistoryDatasetAssociation_15_7PhiXX,,database/tmp/metadata_override_HistoryDatasetAssociation_15_e_bdeW galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,206 (19) queued in default queue as 29.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,253 (19/29.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job state changed from R to C galaxy.jobs.runners.pbs ERROR 2012-01-05 01:13:28,111 (19/29.localhost.localdomain) PBS job failed: Unknown error: -10 galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,272 Cleaning up external metadata files galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o: No JSON object could be decoded: line 1 column 0 (char 0) galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4: No JSON object could be decoded: line 1 column 0 (char 0)
It looks like the job failed with "PBS job failed: Unknown error: -10". Is there a way to set up Galaxy to keep job submission files? I thought there was but don't see the option in universe_wsgi.ini for this.
Hrm, there's no PBS exit code for -10:
http://www.clusterresources.com/torquedocs/2.7jobexitstatus.shtml
In recent versions, the setting in the Galaxy config to keep job files is:
cleanup_job = never
--nate
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Jan 5, 2012, at 11:46 AM, Ryan Golhar wrote:
I have three bam files that I'm trying to merge. The job runs for a bit
then errors saying there is a problem. When I look at runner0.log I see:
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) submitting
file /home/galaxy/galaxy-dist-9/database/pbs/19.sh
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,193 (19) command is:
java -Xmx2G -jar /home/galaxy/galaxy-dist-9/tool-data/shared/jars/picard/MergeSamFiles.jar MSD=true VALIDATION_STRINGENCY=LENIENT O=/home/galaxy/galaxy-dist-9/database/files/000/dataset_96.dat I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_04-1-Idx_1-1.bam I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_05-2-Idx_1-1.bam
I=/mnt/isilon/cag/ngs/hiseq/golharr/FALK05/solid5500_2_Falk05_JMF3_5_20111024_1_06-3-Idx_1-1.bam 2> /home/galaxy/galaxy-dist-9/database/files/000/dataset_97.dat || echo "Error running Picard MergeSamFiles" >&2; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/19/galaxy.json database/tmp/metadata_in_HistoryDatasetAssociation_16_MuM7TP,database/tmp/metadata_kwds_HistoryDatasetAssociation_16_7w3ZvM,database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4,database/tmp/metadata_results_HistoryDatasetAssociation_16_cRV8Uj,,database/tmp/metadata_override_HistoryDatasetAssociation_16_5n8tqL database/tmp/metadata_in_HistoryDatasetAssociation_15_SUEdLf,database/tmp/metadata_kwds_HistoryDatasetAssociation_15_nt8MTH,database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o,database/tmp/metadata_results_HistoryDatasetAssociation_15_7PhiXX,,database/tmp/metadata_override_HistoryDatasetAssociation_15_e_bdeW
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,206 (19) queued in
default queue as 29.localhost.localdomain
galaxy.jobs.runners.pbs DEBUG 2012-01-05 00:12:50,253
(19/29.localhost.localdomain) PBS job state changed from N to R
galaxy.jobs.runners.pbs DEBUG 2012-01-05 01:13:28,111
(19/29.localhost.localdomain) PBS job state changed from R to C
galaxy.jobs.runners.pbs ERROR 2012-01-05 01:13:28,111
(19/29.localhost.localdomain) PBS job failed: Unknown error: -10
galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,272 Cleaning up
external metadata files
galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to
cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_15_EEIJ_o: No JSON object could be decoded: line 1 column 0 (char 0)
galaxy.datatypes.metadata DEBUG 2012-01-05 01:13:28,286 Failed to
cleanup MetadataTempFile temp files from database/tmp/metadata_out_HistoryDatasetAssociation_16_m9A_X4: No JSON object could be decoded: line 1 column 0 (char 0)
It looks like the job failed with "PBS job failed: Unknown error: -10".
Is there a way to set up Galaxy to keep job submission files? I thought there was but don't see the option in universe_wsgi.ini for this.
Hrm, there's no PBS exit code for -10:
http://www.clusterresources.com/torquedocs/2.7jobexitstatus.shtml
In recent versions, the setting in the Galaxy config to keep job files is:
cleanup_job = never
I discovered the problem. My pbs queue has a wall time restriction of 3600 seconds.
Is there a way to configure Galaxy to keep the job files for only failed jobs? I'd like to keep these two settings on, but find it unnecessary if jobs successfully complete.
I discovered the problem. My pbs queue has a wall time restriction of 3600 seconds.
Is there a way to configure Galaxy to keep the job files for only failed jobs? I'd like to keep these two settings on, but find it unnecessary if jobs successfully complete.
The other option I would ask is to include the output of failed jobs in the runner log...
On Jan 5, 2012, at 1:10 PM, Ryan Golhar wrote:
I discovered the problem. My pbs queue has a wall time restriction of 3600 seconds.
Is there a way to configure Galaxy to keep the job files for only failed jobs? I'd like to keep these two settings on, but find it unnecessary if jobs successfully complete.
The other option I would ask is to include the output of failed jobs in the runner log...
This could be extremely large, depending on the output, and might make it difficult to parse logs. Hopefully the cleanup_job = onsuccess as provided by Dannon would be a suitable replacement, however.
--nate
In recent versions, the setting in the Galaxy config to keep job files is:
cleanup_job = never
I discovered the problem. My pbs queue has a wall time restriction of 3600 seconds.
Is there a way to configure Galaxy to keep the job files for only failed jobs? I'd like to keep these two settings on, but find it unnecessary if jobs successfully complete
For this, in your config you can use:
cleanup_job = onsuccess
-Dannon
galaxy-dev@lists.galaxyproject.org