Hi, when I run a tool on a big collection with cloudman, I get the following error in at least one of the tasks: Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/drmaa.py", line 151, in queue_job fh = file( ajs.job_file, "w" ) IOError: [Errno 11] Resource temporarily unavailable: '/mnt/galaxy/tmp/job_working_directory/000/714/galaxy_714.sh' What can I do here? Also now the collection of the results is missing the result of this collection element, correct? How can I 'repair' my collection? Thanks, Alexander
Are you running jobs on the head node or just Galaxy? If this is a consistent problem and you are running jobs on the head ndoe I would disable that. As to resume just the failed jobs - this is not currently possible but really should be ideally. https://trello.com/c/lxVJy7fs -John On Mon, Jul 27, 2015 at 11:32 PM, Alexander Vowinkel <vowinkel.alexander@gmail.com> wrote:
Hi,
when I run a tool on a big collection with cloudman, I get the following error in at least one of the tasks:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/drmaa.py", line 151, in queue_job fh = file( ajs.job_file, "w" ) IOError: [Errno 11] Resource temporarily unavailable: '/mnt/galaxy/tmp/job_working_directory/000/714/galaxy_714.sh'
What can I do here? Also now the collection of the results is missing the result of this collection element, correct? How can I 'repair' my collection?
Thanks, Alexander
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Cluster has workers, jobs running on main node is disabled. 2015-08-03 14:44 GMT-05:00 John Chilton <jmchilton@gmail.com>:
Are you running jobs on the head node or just Galaxy? If this is a consistent problem and you are running jobs on the head ndoe I would disable that.
As to resume just the failed jobs - this is not currently possible but really should be ideally.
-John
On Mon, Jul 27, 2015 at 11:32 PM, Alexander Vowinkel <vowinkel.alexander@gmail.com> wrote:
Hi,
when I run a tool on a big collection with cloudman, I get the following error in at least one of the tasks:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/drmaa.py", line 151, in queue_job fh = file( ajs.job_file, "w" ) IOError: [Errno 11] Resource temporarily unavailable: '/mnt/galaxy/tmp/job_working_directory/000/714/galaxy_714.sh'
What can I do here? Also now the collection of the results is missing the result of this collection element, correct? How can I 'repair' my collection?
Thanks, Alexander
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
15.10 does checking that the job script written can be executed before the job is submitted and some file system syncing that really may help this issue. Hopefully this helps this particular case, in general though Galaxy does need to be better about resuming workflows over collections and retrying jobs that fail due to transient reasons. -John On Mon, Aug 3, 2015 at 11:23 PM, Alexander Vowinkel <vowinkel.alexander@gmail.com> wrote:
Cluster has workers, jobs running on main node is disabled.
2015-08-03 14:44 GMT-05:00 John Chilton <jmchilton@gmail.com>:
Are you running jobs on the head node or just Galaxy? If this is a consistent problem and you are running jobs on the head ndoe I would disable that.
As to resume just the failed jobs - this is not currently possible but really should be ideally.
-John
On Mon, Jul 27, 2015 at 11:32 PM, Alexander Vowinkel <vowinkel.alexander@gmail.com> wrote:
Hi,
when I run a tool on a big collection with cloudman, I get the following error in at least one of the tasks:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/drmaa.py", line 151, in queue_job fh = file( ajs.job_file, "w" ) IOError: [Errno 11] Resource temporarily unavailable: '/mnt/galaxy/tmp/job_working_directory/000/714/galaxy_714.sh'
What can I do here? Also now the collection of the results is missing the result of this collection element, correct? How can I 'repair' my collection?
Thanks, Alexander
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Alexander Vowinkel
-
John Chilton