SLURM and hidden success
Hello all, We are in the process of switching from SGE to SLURM for our galaxy setup. We are currently experiencing a problem where jobs that are completely successful (no text in their stderr file and the proper exit code) are being hidden after the job completes. Any job that fails or has some text in the stderr file is not hidden (note: hidden not deleted; they can be viewed by selecting 'Unhide Hidden Datasets'). Our drmaa.py is at changeset 10961:432999eabbaa Our drmaa egg is at drmaa = 0.6 And our SLURM version is 2.3.5 And we are currently passing no parameters for default_cluster_job_runner = drmaa:/// We have the same code base on both clusters but only observe this behavior when using SLURM. Any pointers or advice would be greatly appreciated. Thanks, Andrew
This is really odd. I see no code in the job runner stuff at all that could cause this behavior outside the context of the dataset being marked hidden as part of a workflow - let alone something DRM specific that could cause this. Are you rerunning an existing job that has been marked this way in a workflow. Does this happen if you click new tools outside the context of workflows or past jobs. Can you find the corresponding dataset via the history API or in the database and determine if they indeed are having visible set to False - this I guess is what should cause a dataset to become "hidden"? -John On Fri, Nov 8, 2013 at 11:40 AM, Andrew Warren <anwarren@vbi.vt.edu> wrote:
Hello all,
We are in the process of switching from SGE to SLURM for our galaxy setup. We are currently experiencing a problem where jobs that are completely successful (no text in their stderr file and the proper exit code) are being hidden after the job completes. Any job that fails or has some text in the stderr file is not hidden (note: hidden not deleted; they can be viewed by selecting 'Unhide Hidden Datasets').
Our drmaa.py is at changeset 10961:432999eabbaa Our drmaa egg is at drmaa = 0.6 And our SLURM version is 2.3.5
And we are currently passing no parameters for default_cluster_job_runner = drmaa:///
We have the same code base on both clusters but only observe this behavior when using SLURM. Any pointers or advice would be greatly appreciated.
Thanks, Andrew
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi John, Thanks so much for the reply. After investigating this more today it turns out, as you might have suspected, SLURM was a red herring. Despite our attempts to faithfully rsync everything between the two servers it looks like there was a problem with our workflows in the new database. Strangely every single workflow that was previously created had a "hide action" set for their outputs despite the past and present configuration of the tool wrappers. Any newly created workflow does not display this behavior. It happens that the steps with errors were being displayed despite the workflow weirdness thanks to an error check in lib/galaxy/jobs/actions/post.py Thanks again, Andrew On Mon, Nov 11, 2013 at 1:39 PM, John Chilton <chilton@msi.umn.edu> wrote:
This is really odd. I see no code in the job runner stuff at all that could cause this behavior outside the context of the dataset being marked hidden as part of a workflow - let alone something DRM specific that could cause this. Are you rerunning an existing job that has been marked this way in a workflow. Does this happen if you click new tools outside the context of workflows or past jobs.
Can you find the corresponding dataset via the history API or in the database and determine if they indeed are having visible set to False - this I guess is what should cause a dataset to become "hidden"?
-John
Hello all,
We are in the process of switching from SGE to SLURM for our galaxy setup. We are currently experiencing a problem where jobs that are completely successful (no text in their stderr file and the proper exit code) are being hidden after the job completes. Any job that fails or has some text in
On Fri, Nov 8, 2013 at 11:40 AM, Andrew Warren <anwarren@vbi.vt.edu> wrote: the
stderr file is not hidden (note: hidden not deleted; they can be viewed by selecting 'Unhide Hidden Datasets').
Our drmaa.py is at changeset 10961:432999eabbaa Our drmaa egg is at drmaa = 0.6 And our SLURM version is 2.3.5
And we are currently passing no parameters for default_cluster_job_runner = drmaa:///
We have the same code base on both clusters but only observe this behavior when using SLURM. Any pointers or advice would be greatly appreciated.
Thanks, Andrew
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Andrew Warren
-
John Chilton