
On Nov 27, 2012, at 6:06 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Hi Peter,
thanks for your replies.
On 27.11.2012 11:44, Peter Cock wrote:
On Tue, Nov 27, 2012 at 10:38 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Dear Peter,
As the author of several tool wrappers, I've been asking for a Galaxy wide mechanism for Galaxy to tell the tool how many threads it can use, for example via an environment variable. The value could then be set with a general default, per runner default, or even per tool using the existing runner configuration under [galaxy:tool_runners] in universe_wsgi.ini
This would be a possibility. Another would be to communicate the number of threads the other way. So the tool tells the runner how many threads. And the runner knows how to handle this.
I can imagine universe_wsgi.ini having such lines:
ncbi_blastp_wrapper = drmaa://-V -pe smp $GALAXY_THREADS
and then $GALAXY_THREADS is changed for the value given by the wrapper. Thinking again this is probably not goint to work because the runner comes first and the wrapper after. My idea was that the wrapper could decide what recources to request. So I could use lower memory settings for small mapping jobs ...
There is some work on dynamic job allocation you might be interested in - have you seen this thread? http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-November/011759.html
This looks very promising. What I did not get from these messages is if that's already in galaxy-dist and where to put the dynamic job runner.
In your example, and others like the BWA and BLAST+ wrappers where the tool XML is hard coded to 8 threads, you would probably want to use a custom runner in universe_wsgi.ini setting the cluster submission to request that many slots/CPUs.
A list of all these wrappers on the Wiki would be nice.
With many tools on the Tool Shed, I'm not sure how easy that would be to co-ordinate. Doing it for the core tools would be more realistic.
I see the problem here. Especially since more and more tools are going into Tool Sheds. I was just looking for some way to reduce my workload ;-)
-- Andreas Kuntzagk
The "Right Way (TM)" I believe would be to have a universal resource request selector that could be plugged into any wrapper simply by including an appropriate element like say <resources proc=x pmem=y walltime=z />. Those variables could be exported, so the corresponding DRMAA call could be made in the dynamic runner and the data could be used in the wrapper to run the underlying tool as needed. Regards, Alex