Using $NSLOTS in tools to control thread number
Hello all, I'm wondering if it is sensible to make Galaxy tools automatically use the environment variable $NSLOTS to automatically adjust their number of threads? Using $NSLOTS works on SGE, but is it generally used on other clusters? The idea here is rather than hard coding the number of threads in a tool or its XML file, which may need to be altered for different local setups, and it can be specified in universe_wsgi.ini under [galaxy:tool_runners] e.g. By default our SGE allocates one slot, so with the following BWA should use one thread: [galaxy:tool_runners] bwa_wrapper = drmaa://-V/ However, if we ask SGE for 8 slots, the tool should use eight threads: [galaxy:tool_runners] bwa_wrapper = drmaa://-V -pe smp 8/ For this to be truly general, we would need a way to set environement variables for local:// runners. However, we can cope with $NSLOTS being undefined with a little magic in the XML definitions. .e.g For the BWA wrapper this is currently hard coded to use 4 threads: <command interpreter="python"> bwa_wrapper.py --threads="4" ... </command> Instead, this could be something like this: <command interpreter="python"> bwa_wrapper.py #if "$NSLOTS"=="": --threads="4" #else: --threads="$NSLOTS" #end if ... </command> Likewise for the BLAST+ wrappers etc. i.e. If the environment variable is set (e.g. via the cluster settings) use that, otherwise keep the current hard coded default. Would this work in principle on other cluster setups? i.e. Is $NSLOTS sufficiently general? It would be messy but the XML if statement could be expanded to handle a second environment variable as well if needs be. Would the Galaxy team accept a pull request implementing this for the BWA and BLAST+ wrappers? Thanks, Peter
On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
I'm wondering if it is sensible to make Galaxy tools automatically use the environment variable $NSLOTS to automatically adjust their number of threads?
Using $NSLOTS works on SGE, but is it generally used on other clusters?
The idea here is rather than hard coding the number of threads in a tool or its XML file, which may need to be altered for different local setups, and it can be specified in universe_wsgi.ini under [galaxy:tool_runners]
Actually thinking about this over lunch, you wouldn't want to evaluate the $NSLOTS variable when the XML <command> is processed, as that would be done on the server not the cluster node. In some cases then embedding $NSLOTS in the command string (suitably escaped) should work, otherwise doing it in a wrapper script seems best.
Would this work in principle on other cluster setups? i.e. Is $NSLOTS sufficiently general?
Peter
On Jun 15, 2012, at 9:05 AM, Peter Cock wrote:
On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
I'm wondering if it is sensible to make Galaxy tools automatically use the environment variable $NSLOTS to automatically adjust their number of threads?
Using $NSLOTS works on SGE, but is it generally used on other clusters?
The idea here is rather than hard coding the number of threads in a tool or its XML file, which may need to be altered for different local setups, and it can be specified in universe_wsgi.ini under [galaxy:tool_runners]
Actually thinking about this over lunch, you wouldn't want to evaluate the $NSLOTS variable when the XML <command> is processed, as that would be done on the server not the cluster node. In some cases then embedding $NSLOTS in the command string (suitably escaped) should work, otherwise doing it in a wrapper script seems best.
Hi Peter, $NSLOTS is SGE-specific. Torque uses a file whose path is set in $PBS_NODEFILE to list out the nodes you've been allocated (the node name is repeated for each slot you have on it). A couple of DRM-agnostic solutions: A common variable set by the job template before the tool runs. Or, the ability to set tool parameters from the runner URL in universe_wsgi.ini. --nate
Would this work in principle on other cluster setups? i.e. Is $NSLOTS sufficiently general?
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Fri, Jun 15, 2012 at 4:06 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Jun 15, 2012, at 9:05 AM, Peter Cock wrote:
On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
I'm wondering if it is sensible to make Galaxy tools automatically use the environment variable $NSLOTS to automatically adjust their number of threads?
Using $NSLOTS works on SGE, but is it generally used on other clusters?
The idea here is rather than hard coding the number of threads in a tool or its XML file, which may need to be altered for different local setups, and it can be specified in universe_wsgi.ini under [galaxy:tool_runners]
Actually thinking about this over lunch, you wouldn't want to evaluate the $NSLOTS variable when the XML <command> is processed, as that would be done on the server not the cluster node. In some cases then embedding $NSLOTS in the command string (suitably escaped) should work, otherwise doing it in a wrapper script seems best.
Hi Peter,
$NSLOTS is SGE-specific.
That is a shame, it is working nicely for the tools I have tried it on - You just put "\$NSLOTS" (with a slash to escape the dollar) in the <command> tag.
Torque uses a file whose path is set in $PBS_NODEFILE to list out the nodes you've been allocated (the node name is repeated for each slot you have on it).
A couple of DRM-agnostic solutions: A common variable set by the job template before the tool runs.
By that do you mean Galaxy could do some magic in the shell scripts it generates and submits to the cluster? i.e. Setup an environment variable, e.g. $THREADS. In the case of Torque/PBS, it could parse the $PBS_NODEFILE which sounds nasty - or can you get this from the PBS runner URL?. In the case of SGE, all the DRMAA wrapper needs to do is: export THREADS="$NSLOTS"
Or, the ability to set tool parameters from the runner URL in universe_wsgi.ini.
Setting things via the runner URL in universe_wsgi.ini seems better, especially as it could be used for "local" runners too. Peter
On Jun 15, 2012, at 11:27 AM, Peter Cock wrote:
On Fri, Jun 15, 2012 at 4:06 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Jun 15, 2012, at 9:05 AM, Peter Cock wrote:
On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
I'm wondering if it is sensible to make Galaxy tools automatically use the environment variable $NSLOTS to automatically adjust their number of threads?
Using $NSLOTS works on SGE, but is it generally used on other clusters?
The idea here is rather than hard coding the number of threads in a tool or its XML file, which may need to be altered for different local setups, and it can be specified in universe_wsgi.ini under [galaxy:tool_runners]
Actually thinking about this over lunch, you wouldn't want to evaluate the $NSLOTS variable when the XML <command> is processed, as that would be done on the server not the cluster node. In some cases then embedding $NSLOTS in the command string (suitably escaped) should work, otherwise doing it in a wrapper script seems best.
Hi Peter,
$NSLOTS is SGE-specific.
That is a shame, it is working nicely for the tools I have tried it on - You just put "\$NSLOTS" (with a slash to escape the dollar) in the <command> tag.
Torque uses a file whose path is set in $PBS_NODEFILE to list out the nodes you've been allocated (the node name is repeated for each slot you have on it).
A couple of DRM-agnostic solutions: A common variable set by the job template before the tool runs.
By that do you mean Galaxy could do some magic in the shell scripts it generates and submits to the cluster?
Yes, exactly.
i.e. Setup an environment variable, e.g. $THREADS. In the case of Torque/PBS, it could parse the $PBS_NODEFILE which sounds nasty - or can you get this from the PBS runner URL?.
You could, but I think it'd be easier to read the $PBS_NODEFILE than attempt to parse PBS arguments.
In the case of SGE, all the DRMAA wrapper needs to do is:
export THREADS="$NSLOTS"
Or, the ability to set tool parameters from the runner URL in universe_wsgi.ini.
Setting things via the runner URL in universe_wsgi.ini seems better, especially as it could be used for "local" runners too.
Peter
participants (2)
-
Nate Coraor
-
Peter Cock