On Wed, Mar 5, 2014 at 8:06 PM, Pete Schmitt <Peter.R.Schmitt@dartmouth.edu> wrote:

In trying something simple, using galaxy I downloaded data from USCS main.   The data gets downloaded but the job errors out.   I verified that the job actually ran, and completed successfully according to the scheduler but I get errors like this:

galaxy.jobs.runners.drmaa DEBUG 2014-03-05 18:17:35,941 (624/46.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 18:17:36,060 (624/46.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/624/galaxy_624.o'

There are no directories being created below the 000 directory.   I verified that the directory tree is owned by galaxy and that the galaxy user can run jobs from the command line as a normal user.

I set the parameter "cleanup_job = never". It was set to "always" which is probably why the files were never there. Now the files are there, including the galaxy_###.o file but galaxy still errors like above.

I had set the parameter "cluster_files_directory = database/pbs", but that doesn't seem to work any longer. The .o and .e files used to end up there.

Here is an example:

(galaxyvenv)[galaxy@dirigo 630]$ ll
total 16
-rw------- 1 galaxy galaxy    0 Mar 5 19:29 galaxy_630.e
-rw-rw-r-- 1 galaxy galaxy    2 Mar 5 19:29 galaxy_630.ec
-rw------- 1 galaxy galaxy 940 Mar 5 19:29 galaxy_630.o
-rwxr-xr-x 1 galaxy galaxy 2429 Mar 5 19:29 galaxy_630.sh
-rw-rw-r-- 1 galaxy galaxy 138 Mar 5 19:29 galaxy.json
-rw-rw-r-- 1 galaxy galaxy 2139 Mar 5 19:29 metadata_in_HistoryDatasetAssociation_1182_o830e3
-rw-rw-r-- 1 galaxy galaxy   20 Mar 5 19:29 metadata_kwds_HistoryDatasetAssociation_1182_hOhPp7
-rw-rw-r-- 1 galaxy galaxy   55 Mar 5 19:29 metadata_out_HistoryDatasetAssociation_1182_Ynb70M
-rw-rw-r-- 1 galaxy galaxy    2 Mar 5 19:29 metadata_override_HistoryDatasetAssociation_1182_HsMljG
-rw-rw-r-- 1 galaxy galaxy   44 Mar 5 19:29 metadata_results_HistoryDatasetAssociation_1182_LxdsAZ
(galaxyvenv)[galaxy@dirigo 630]$ pwd
/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630

Here is the error from this:

galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:37,731 (630/51.dirigo.mdibl.org) state change: job is running
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:49,119 (630/51.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 19:31:50,225 (630/51.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_630.o'
galaxy.jobs DEBUG 2014-03-05 19:31:50,252 finish(): Moved /nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_dataset_856.dat to /nextgen3/galaxy/galaxy-dist/database/files/000/dataset_856.dat
galaxy.jobs DEBUG 2014-03-05 19:31:50,351 job 630 ended

On the galaxy page in the history you get in pink:

1 UCSC Main on Human: knownGene (chr22:1-51304566)

error
An error occurred with this dataset:
Job output not returned from cluster

But the dataset is there.