On Dec 23, 2010, at 4:09 AM, Mattias de Hollander wrote:
Hi,
Is it possible to configure Galaxy so it uses a remote cluster? In our setup it is not possible to install the Galaxy frontend on the cluster head/main node. I would like to know if you can connect to a compute cluster on a remote network from the main Galaxy application.
Thanks in advance!
the exact configuration depends on what resource manager you are using. We have a setup like this with TORQUE. In our case our galaxy application server is a virtual machine, and it offloads nearly everything to our cluster. We had to configure the VM as a "submit host" in TORQUE. Here is a quick run down of what we did: 1) install galaxy into a location that we shared with both the galaxy VM and the cluster 2) install TORQUE libraries and client commands on galaxy VM (the client commands aren't really needed, but useful for testing the connection). We used the exact version number that was running on our cluster. We set the default server to the external hostname of our cluster. 3) set the VM as a "submit host" on the cluster (as a torque manager): qmgr -c "s s submit_host += galaxy_vm_hostname" 4) configure pbs_mom "usecp" since galaxy has the stdout/stderr files from torque delivered to .../galaxy-dist/database/pbs and this directory is mounted on all of our compute nodes we want to tell pbs_mom to do a local copy instead of trying to scp the files to the galaxy vm (which would require that passwordless ssh be setup for the galaxy user from the compute nodes to our galaxy server) in the TORQUE mompriv/config file on each compute node we add something like $usecp galaxy_vm_hostname:/.../galaxy-dist/ /../galaxy-dist/ now submitting jobs from the galaxy server to the cluster should work (with stdout/stderr written to the database/pbs folder) We have one little oddity with our setup: our cluster has two ethernet networks, 10GigE, and a gigabit management network. TORQUE runs over the management network and the job IDs are in the form jobnumber.managment_primary_hostname. The primary hostname on the management network is different from the primary hostname on the external network and does not resolve outside the cluster. This causes a few problems if you try to use the client commands that take a job id as an argument from the galaxy vm (I won't go into to detail since this isn't a galaxy issue), but fortunately since galaxy uses the pbspython library it isn't a problem (and we don't normally interact with TORQUE outside of galaxy from the galaxy server) -- Glen L. Beane Software Engineer The Jackson Laboratory Phone (207) 288-6153