Il 2013-10-11 17:21 Sytchev, Ilya ha scritto:
On 9/12/13 10:35 AM, "Peter Cock"
<p.j.a.cock(a)googlemail.com> wrote:
>On Thu, Sep 12, 2013 at 2:01 PM, Mathieu Bahin
> <mathieu.bahin(a)irisa.fr>
>wrote:
>> Hi all,
>>
>> We have been developing our own Galaxy instance for a while now. We
>>have a
>> cluster on which the job are sent to be executed, it is managed
>> through
>>SGE.
>> Usually, communication between SGE and DRMAA is ok and we don't
>> have any
>> problem with that.
>>
>> When a job is deleted by the user, most of the times, the job
>>disappears but
>> sometimes, we don't know why, the job stays and has the status 'dr'
>>within
>> SGE. If we don't kill it 'manually', it stays forever. It is not
>> always
>>the
>> same tools which produces this error.
>> Have you any idea why how manage it ?
>
>I have noticed problem with our DRMMA/SGE setup where a
>user can cancel a large job (using the job splitter in at least some
>cases), but Galaxy does not seem to cancel the jobs on the cluster.
>I've not tried to diagnose this yet - it could be a similar issue
> though.
Also, in our DRMAA/LSF setup (using a fork of the latest galaxy-dist)
jobs
generated by the current workflow step continue running on the
cluster
after history is deleted.
Ilya
Hi Ilya,
I also see this behaviour with DRMAA/GridEngine.
I think this has been already reported:
https://trello.com/c/1whC9did/245-currently-running-jobs-in-deleted-histo...
Please upvote it!
Best,
Nicola