Staged Method for cluster running SGE?
Hi all, So far we've been running our local Galaxy instance on a single machine, but I would like to be able to offload (some) jobs onto our local SGE cluster. I've been reading https://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster Unfortunately in our setup the SGE cluster head node is a different machine to the Galaxy server, and they do not (currently) have a shared file system. Once on the cluster, the head node and the compute nodes do have a shared file system. Therefore we will need some way of copying input data from the Galaxy server to the cluster, running the job, and once the job is done, copying the results back to the Galaxy server. The "Staged Method" on the wiki sounds relevant, but appears to be for TORQUE only (via pbs_python), not any of the other back ends (via DRMAA). Have I overlooked anything on the "Cluster" wiki page? Has anyone attempted anything similar, and could you offer any guidance or tips? Thanks, Peter
On Tue, Apr 26, 2011 at 5:11 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi all,
So far we've been running our local Galaxy instance on a single machine, but I would like to be able to offload (some) jobs onto our local SGE cluster. I've been reading https://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster
Unfortunately in our setup the SGE cluster head node is a different machine to the Galaxy server, and they do not (currently) have a shared file system. Once on the cluster, the head node and the compute nodes do have a shared file system.
Therefore we will need some way of copying input data from the Galaxy server to the cluster, running the job, and once the job is done, copying the results back to the Galaxy server.
The "Staged Method" on the wiki sounds relevant, but appears to be for TORQUE only (via pbs_python), not any of the other back ends (via DRMAA).
Have I overlooked anything on the "Cluster" wiki page?
Has anyone attempted anything similar, and could you offer any guidance or tips?
Hi, Peter. You might consider setting up a separate queue for SGE jobs. Then, you could specify a prolog and epilog script that will copy files from the galaxy machine into the cluster (in the prolog) and back to galaxy (in the epilog). This assumes that there is a way to map from one file system to the other, but for Galaxy, that is probably the case (galaxy files on the galaxy server are "under" the galaxy instance and galaxy files on the cluster will probably be run as a single user in that home directory). I have not done this myself, but the advantage to using prolog and epilog scripts is that galaxy jobs then do not need any special configuration--all the work is done transparently by SGE. Sean
On Tue, Apr 26, 2011 at 12:10 PM, Sean Davis <sdavis2@mail.nih.gov> wrote:
On Tue, Apr 26, 2011 at 5:11 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi all,
So far we've been running our local Galaxy instance on a single machine, but I would like to be able to offload (some) jobs onto our local SGE cluster. I've been reading https://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster
Unfortunately in our setup the SGE cluster head node is a different machine to the Galaxy server, and they do not (currently) have a shared file system. Once on the cluster, the head node and the compute nodes do have a shared file system.
Therefore we will need some way of copying input data from the Galaxy server to the cluster, running the job, and once the job is done, copying the results back to the Galaxy server.
The "Staged Method" on the wiki sounds relevant, but appears to be for TORQUE only (via pbs_python), not any of the other back ends (via DRMAA).
Have I overlooked anything on the "Cluster" wiki page?
Has anyone attempted anything similar, and could you offer any guidance or tips?
Hi, Peter.
You might consider setting up a separate queue for SGE jobs. Then, you could specify a prolog and epilog script that will copy files from the galaxy machine into the cluster (in the prolog) and back to galaxy (in the epilog).
I see - the prolog and epilog scripts are per SGE queue, so in order to have Galaxy specific scripts, the simplest solution is a Galaxy specific queue.
This assumes that there is a way to map from one file system to the other, but for Galaxy, that is probably the case (galaxy files on the galaxy server are "under" the galaxy instance and galaxy files on the cluster will probably be run as a single user in that home directory).
We have a user account "galaxy" on the Galaxy Server, and it would make sense to have a matching user account "galaxy" on the Cluster which would submit the SGE jobs and own their data files.
I have not done this myself, but the advantage to using prolog and epilog scripts is that galaxy jobs then do not need any special configuration--all the work is done transparently by SGE.
Sean
Thanks for the pointers - I have some reading ahead of me... Peter
participants (2)
-
Peter Cock
-
Sean Davis