Re: [galaxy-dev] Problems with Galaxy on a mapped drive

4 Aug 2011

      On Fri, Jul 29, 2011 at 5:09 PM, Duddy, John <jduddy@illumina.com> wrote:
...
We had similar problems on NFS mounts to Isilon. We traced it to
the default timeout for attribute caching on NFS mounts, which
does not force a re-read of directory contents (hence file existence
or size) for up to 30 seconds.
We worked around it by adding no-ac to the mount, but this can
drastically increase the network traffic to the isilon, so there are
tradeoffs to be made.
Even when you solve this, nfsv2 does not have open-close write
consistency, so it is possible for a job to complete on a node and
Galaxy to try to read the output files while the compute node is
still flushing its write cache to the file.
All of these scenarios are unlikely on a busy cluster, on which
job<->Galaxy interactions will likely occur far enough apart in
time for the caches to clear on their own.
John Duddy
Thanks for your comments John, it's good to know others
have run into similar issues.

You may be right that on a real test load many of these issues
would go away - but at least some of the problems I was seeing
were at start-up or job submission time (and thus prior to the
cluster actually running the job).

We may need to re-organise our network topology, right now
there are probably too many routers/hubs/switches between
the Galaxy server and the cluster and associated storage,
making the mapped drive less responsive than it could be.

Regards,

Peter

Re: [galaxy-dev] Problems with Galaxy on a mapped drive

Peter Cock