Missing stdout and stderr files from SGE cluster
Hi all, I'm trying to configure Galaxy to talk to our SGE cluster via DRMAA. I've got jobs submitting, and can verify they are in the queue and then run via qstat (from the cluster). However, something isn't right. I'm testing with blastp using the following in universe_wsgi.ini [galaxy:tool_runners] ncbi_blastp_wrapper = drmaa:/// ... Job output from run.sh, galaxy.jobs DEBUG 2012-02-02 11:49:16,707 (25) Working directory for job is: /mnt/galaxy/galaxy-central/database/job_working_directory/000/25 galaxy.jobs DEBUG 2012-02-02 11:49:16,710 dispatching job 25 to drmaa runner galaxy.jobs INFO 2012-02-02 11:49:16,818 job 25 dispatched galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:17,174 (25) submitting file /mnt/galaxy/galaxy-central/database/pbs/galaxy_25.sh galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:17,174 (25) command is: blastp -version &> /mnt/galaxy/galaxy-central/database/tmp/GALAXY_VERSION_STRING_25; python /mnt/galaxy/galaxy-central/tools/ncbi_blast_plus/hide_stderr.py blastp -query "/mnt/galaxy/galaxy-central/database/files/000/dataset_5.dat" -subject "/mnt/galaxy/galaxy-central/database/files/000/dataset_2.dat" -task blastp -evalue 0.001 -out /mnt/galaxy/galaxy-central/database/files/000/dataset_25.dat -outfmt 6 -num_threads 8 galaxy.jobs.runners.drmaa INFO 2012-02-02 11:49:17,178 (25) queued as 223 galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:18,175 (25/223) state change: job is queued and active galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:32,176 (25/223) state change: job is running galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:43,367 (25/223) state change: job finished normally galaxy.jobs DEBUG 2012-02-02 11:49:43,679 job 25 ended galaxy.jobs.runners.drmaa WARNING 2012-02-02 11:49:43,699 Unable to cleanup: [Errno 2] No such file or directory: '/mnt/galaxy/galaxy-central/database/job_working_directory/000/25/25.drmout' galaxy.jobs.runners.drmaa WARNING 2012-02-02 11:49:43,699 Unable to cleanup: [Errno 2] No such file or directory: '/mnt/galaxy/galaxy-central/database/job_working_directory/000/25/25.drmerr' Interestingly, via the Galaxy interface it does seem to have captured stdout (emtpy) and stderr anyway: <quote> Error invoking command: blastp -query /mnt/galaxy/galaxy-central/database/files/000/dataset_5.dat -subject /mnt/galaxy/galaxy-central/database/files/000/dataset_2.dat -task blastp -evalue 0.001 -out /mnt/galaxy/galaxy-central/database/files/000/dataset_25.dat -outfmt 6 -num_threads 8 [Errno 2] No such file or directory </quote> The error in this case is some kind of path problem, which I could 'fix' by adding 'source ~/.bashrc' to the drm_template defined in lib/galaxy/jobs/runners/drmaa.py (although there should be a more elegant solution to this!). Anyway, why is the DRMAA code giving me these warnings about missing the stdout and stderr files - even when the run runs fine? Thanks, Peter
Peter: I have a working sge setup. Two ideas: 1 - look in the sge logs on the worker node (on my system they are in /var/spool...) 2 - maybe a permission problem? brad -- Brad Langhorst New England Biolabs langhorst@neb.com On 2/2/12 8:50 AM, "Peter Cock" <p.j.a.cock@googlemail.com> wrote:
Hi all,
I'm trying to configure Galaxy to talk to our SGE cluster via DRMAA. I've got jobs submitting, and can verify they are in the queue and then run via qstat (from the cluster). However, something isn't right. I'm testing with blastp using the following in universe_wsgi.ini
[galaxy:tool_runners] ncbi_blastp_wrapper = drmaa:/// ...
Job output from run.sh,
galaxy.jobs DEBUG 2012-02-02 11:49:16,707 (25) Working directory for job is: /mnt/galaxy/galaxy-central/database/job_working_directory/000/25 galaxy.jobs DEBUG 2012-02-02 11:49:16,710 dispatching job 25 to drmaa runner galaxy.jobs INFO 2012-02-02 11:49:16,818 job 25 dispatched galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:17,174 (25) submitting file /mnt/galaxy/galaxy-central/database/pbs/galaxy_25.sh galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:17,174 (25) command is: blastp -version &> /mnt/galaxy/galaxy-central/database/tmp/GALAXY_VERSION_STRING_25; python /mnt/galaxy/galaxy-central/tools/ncbi_blast_plus/hide_stderr.py blastp -query "/mnt/galaxy/galaxy-central/database/files/000/dataset_5.dat" -subject "/mnt/galaxy/galaxy-central/database/files/000/dataset_2.dat" -task blastp -evalue 0.001 -out /mnt/galaxy/galaxy-central/database/files/000/dataset_25.dat -outfmt 6 -num_threads 8 galaxy.jobs.runners.drmaa INFO 2012-02-02 11:49:17,178 (25) queued as 223 galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:18,175 (25/223) state change: job is queued and active galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:32,176 (25/223) state change: job is running galaxy.jobs.runners.drmaa DEBUG 2012-02-02 11:49:43,367 (25/223) state change: job finished normally galaxy.jobs DEBUG 2012-02-02 11:49:43,679 job 25 ended galaxy.jobs.runners.drmaa WARNING 2012-02-02 11:49:43,699 Unable to cleanup: [Errno 2] No such file or directory: '/mnt/galaxy/galaxy-central/database/job_working_directory/000/25/25.drmou t' galaxy.jobs.runners.drmaa WARNING 2012-02-02 11:49:43,699 Unable to cleanup: [Errno 2] No such file or directory: '/mnt/galaxy/galaxy-central/database/job_working_directory/000/25/25.drmer r'
Interestingly, via the Galaxy interface it does seem to have captured stdout (emtpy) and stderr anyway:
<quote> Error invoking command: blastp -query /mnt/galaxy/galaxy-central/database/files/000/dataset_5.dat -subject /mnt/galaxy/galaxy-central/database/files/000/dataset_2.dat -task blastp -evalue 0.001 -out /mnt/galaxy/galaxy-central/database/files/000/dataset_25.dat -outfmt 6 -num_threads 8
[Errno 2] No such file or directory </quote>
The error in this case is some kind of path problem, which I could 'fix' by adding 'source ~/.bashrc' to the drm_template defined in lib/galaxy/jobs/runners/drmaa.py (although there should be a more elegant solution to this!).
Anyway, why is the DRMAA code giving me these warnings about missing the stdout and stderr files - even when the run runs fine?
Thanks,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Thu, Feb 2, 2012 at 1:54 PM, Langhorst, Brad <Langhorst@neb.com> wrote:
Peter:
I have a working sge setup. Two ideas: 1 - look in the sge logs on the worker node (on my system they are in /var/spool...)
No sign of any SGE logs there - could be system configuration dependent, but thanks.
2 - maybe a permission problem?
Maybe. Galaxy *is* collecting the stdout/stderr text, and the files don't exist after job completion, so my hunch is this is a redundant warning in the cleanup code, and the *.drmout and *.drmerr files were already deleted by Galaxy. Regards, Peter
On Feb 2, 2012, at 8:21 AM, Peter Cock wrote: On Thu, Feb 2, 2012 at 1:54 PM, Langhorst, Brad <Langhorst@neb.com<mailto:Langhorst@neb.com>> wrote: Peter: I have a working sge setup. Two ideas: 1 - look in the sge logs on the worker node (on my system they are in /var/spool...) No sign of any SGE logs there - could be system configuration dependent, but thanks. 2 - maybe a permission problem? Maybe. Galaxy *is* collecting the stdout/stderr text, and the files don't exist after job completion, so my hunch is this is a redundant warning in the cleanup code, and the *.drmout and *.drmerr files were already deleted by Galaxy. Peter, I am having similar issue and it started happening only after the recent update. See: http://galaxy-development-list-archive.2308389.n4.nabble.com/job-cleanup-fil... I haven't had chance to follow up on my thread yet. But I do think these are redundant warning messages probably getting triggered as file deletion has occurred somewhere before this particular cleanup action. -- Shantanu
On Feb 2, 2012, at 10:35 AM, Shantanu Pavgi wrote:
Peter,
I am having similar issue and it started happening only after the recent update. See: http://galaxy-development-list-archive.2308389.n4.nabble.com/job-cleanup-fil...
I haven't had chance to follow up on my thread yet. But I do think these are redundant warning messages probably getting triggered as file deletion has occurred somewhere before this particular cleanup action.
Ahh, I see what's going on here. The drm out/err files are now being written to the job working directory, which is cleaned up at the end of the job wrapper's finish method, before the drmaa runner's cleanup method executes. The warnings are harmless and I'll have a fix for them shortly. --nate
-- Shantanu
On Feb 2, 2012, at 12:22 PM, Nate Coraor wrote:
On Feb 2, 2012, at 10:35 AM, Shantanu Pavgi wrote:
Peter,
I am having similar issue and it started happening only after the recent update. See: http://galaxy-development-list-archive.2308389.n4.nabble.com/job-cleanup-fil...
I haven't had chance to follow up on my thread yet. But I do think these are redundant warning messages probably getting triggered as file deletion has occurred somewhere before this particular cleanup action.
Ahh, I see what's going on here. The drm out/err files are now being written to the job working directory, which is cleaned up at the end of the job wrapper's finish method, before the drmaa runner's cleanup method executes. The warnings are harmless and I'll have a fix for them shortly.
Fixed in changeset 238207122b68. --nate
--nate
-- Shantanu
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Thu, Feb 2, 2012 at 6:29 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Feb 2, 2012, at 12:22 PM, Nate Coraor wrote:
Ahh, I see what's going on here. The drm out/err files are now being written to the job working directory, which is cleaned up at the end of the job wrapper's finish method, before the drmaa runner's cleanup method executes. The warnings are harmless and I'll have a fix for them shortly.
Fixed in changeset 238207122b68.
--nate
Confirmed, thanks. Peter https://bitbucket.org/galaxy/galaxy-central/changeset/238207122b68
participants (4)
-
Langhorst, Brad
-
Nate Coraor
-
Peter Cock
-
Shantanu Pavgi