15.10 does checking that the job script written can be executed before
the job is submitted and some file system syncing that really may help
this issue. Hopefully this helps this particular case, in general
though Galaxy does need to be better about resuming workflows over
collections and retrying jobs that fail due to transient reasons.
On Mon, Aug 3, 2015 at 11:23 PM, Alexander Vowinkel
Cluster has workers, jobs running on main node is disabled.
2015-08-03 14:44 GMT-05:00 John Chilton <jmchilton(a)gmail.com>:
> Are you running jobs on the head node or just Galaxy? If this is a
> consistent problem and you are running jobs on the head ndoe I would
> disable that.
> As to resume just the failed jobs - this is not currently possible but
> really should be ideally.
> On Mon, Jul 27, 2015 at 11:32 PM, Alexander Vowinkel
> <vowinkel.alexander(a)gmail.com> wrote:
> > Hi,
> > when I run a tool on a big collection with cloudman,
> > I get the following error in at least one of the tasks:
> > Traceback (most recent call last):
> > File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/drmaa.py",
> > 151,
> > in queue_job
> > fh = file( ajs.job_file, "w" )
> > IOError: [Errno 11] Resource temporarily unavailable:
> > '/mnt/galaxy/tmp/job_working_directory/000/714/galaxy_714.sh'
> > What can I do here?
> > Also now the collection of the results is missing the
> > result of this collection element, correct?
> > How can I 'repair' my collection?
> > Thanks,
> > Alexander
> > ___________________________________________________________
> > Please keep all replies on the list by using "reply all"
> > in your mail client. To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> > https://lists.galaxyproject.org/
> > To search Galaxy mailing lists use the unified search at:
> > http://galaxyproject.org/search/mailinglists/