
Hi, I'm wandering how galaxy supports tools that are multithreaded or multi-process. When working with lastz I noticed that it starts 4 parallel processes. Is that always so? Can this be adjusted? What other tools also are multi-process? regards, Andreas -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich

Andreas, Yes this is possible. You can also have a look at the ncbi blast+ tools written by Peter. The same is true. Usually the tool wrappers (xml) have an option preconfigured how many threads can be used... You can adjust these directly in the xml OR as we did we added the option to be adjustable by a parameter directly in galaxy. Default low number of cores, but advanced users (login email bound) can select higher thread numbers. No experience on cluster/grid tools. Hope this helps, Alex -----Oorspronkelijk bericht----- Van: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] Namens Andreas Kuntzagk Verzonden: maandag 26 november 2012 9:58 Aan: galaxy-dev@lists.bx.psu.edu Onderwerp: [galaxy-dev] multithreaded tools Hi, I'm wandering how galaxy supports tools that are multithreaded or multi-process. When working with lastz I noticed that it starts 4 parallel processes. Is that always so? Can this be adjusted? What other tools also are multi-process? regards, Andreas -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

Howdy, Andreas, The four processes started for a galaxy lastz job must involve post- processing the lastz output through some other shell tool. Lastz by itself doesn't support multiple threads or processes. Bob H On Nov 26, 2012, at 3:58 AM, Andreas Kuntzagk wrote:
Hi,
I'm wandering how galaxy supports tools that are multithreaded or multi-process. When working with lastz I noticed that it starts 4 parallel processes. Is that always so? Can this be adjusted? What other tools also are multi-process?
regards, Andreas
-- Andreas Kuntzagk
SystemAdministrator
Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany
http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

Hi, the four processes I saw where all called "lastz" and ran in parallel and consumed 100% of a core each. My guess is that the lastz_wrapper.py is responsible for this. Looking at it I see a some code regarding queuing and in the very beginning this line: WORKERS = 4 and further one the class BaseQueue which starts "threads". BTW. there seems to be no way to adjust this number other than editing the source file - bad. And this get's me wondering if there are other such surprises hidden in galaxy. regards, Andreas On 26.11.2012 15:55, Bob Harris wrote:
Howdy, Andreas,
The four processes started for a galaxy lastz job must involve post-processing the lastz output through some other shell tool. Lastz by itself doesn't support multiple threads or processes.
Bob H
On Nov 26, 2012, at 3:58 AM, Andreas Kuntzagk wrote:
Hi,
I'm wandering how galaxy supports tools that are multithreaded or multi-process. When working with lastz I noticed that it starts 4 parallel processes. Is that always so? Can this be adjusted? What other tools also are multi-process?
regards, Andreas
-- Andreas Kuntzagk
SystemAdministrator
Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany
http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich

Andreas, I am not sure if you can call these "surprises". Some tools (which I highly appreciate) of Peter have been "parallelised" to get the job done more quickly. I earlier mentioned the ncbi blast+ wrappers but there the tool by itself handles the multithreading. Other tools I am aware that use a python script/wrapper to chunk up the initial query and rejoin later are tools like signalp, TMHMM and such. Usually it also involves some parsing of output to data that galaxy can subsequently handle. In the latter examples its done using python scripts, but for some of our custom tools we did it in perl, some using bash parallel, or using R. I wouldn't have a solution to getting to know this without going through the initial wrappers... Alex -----Oorspronkelijk bericht----- Van: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] Namens Andreas Kuntzagk Verzonden: dinsdag 27 november 2012 9:58 Aan: Bob Harris CC: galaxy-dev@lists.bx.psu.edu Onderwerp: Re: [galaxy-dev] multithreaded tools Hi, the four processes I saw where all called "lastz" and ran in parallel and consumed 100% of a core each. My guess is that the lastz_wrapper.py is responsible for this. Looking at it I see a some code regarding queuing and in the very beginning this line: WORKERS = 4 and further one the class BaseQueue which starts "threads". BTW. there seems to be no way to adjust this number other than editing the source file - bad. And this get's me wondering if there are other such surprises hidden in galaxy. regards, Andreas On 26.11.2012 15:55, Bob Harris wrote:
Howdy, Andreas,
The four processes started for a galaxy lastz job must involve post-processing the lastz output through some other shell tool. Lastz by itself doesn't support multiple threads or processes.
Bob H
On Nov 26, 2012, at 3:58 AM, Andreas Kuntzagk wrote:
Hi,
I'm wandering how galaxy supports tools that are multithreaded or multi-process. When working with lastz I noticed that it starts 4 parallel processes. Is that always so? Can this be adjusted? What other tools also are multi-process?
regards, Andreas
-- Andreas Kuntzagk
SystemAdministrator
Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany
http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

Hi Alex,
I am not sure if you can call these "surprises".
Well at least it surprised me :-) Didn't want to sound to negative.
Some tools (which I highly appreciate) of Peter have been "parallelised" to get the job done more quickly. I earlier mentioned the ncbi blast+ wrappers but there the tool by itself handles the multithreading. Other tools I am aware that use a python script/wrapper to chunk up the initial query and rejoin later are tools like signalp, TMHMM and such. Usually it also involves some parsing of output to data that galaxy can subsequently handle. In the latter examples its done using python scripts, but for some of our custom tools we did it in perl, some using bash parallel, or using R. I wouldn't have a solution to getting to know this without going through the initial wrappers...
While I can read Python and bash fine it becomes more complicated with perl and R. Don't know if I could easily spot from the code what the number of threads is. So maybe somebody could setup a list of these tools? regards, Andreas.
Alex
-----Oorspronkelijk bericht----- Van: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] Namens Andreas Kuntzagk Verzonden: dinsdag 27 november 2012 9:58 Aan: Bob Harris CC: galaxy-dev@lists.bx.psu.edu Onderwerp: Re: [galaxy-dev] multithreaded tools
Hi,
the four processes I saw where all called "lastz" and ran in parallel and consumed 100% of a core each. My guess is that the lastz_wrapper.py is responsible for this. Looking at it I see a some code regarding queuing and in the very beginning this line:
WORKERS = 4
and further one the class BaseQueue which starts "threads". BTW. there seems to be no way to adjust this number other than editing the source file - bad.
And this get's me wondering if there are other such surprises hidden in galaxy.
regards, Andreas
On 26.11.2012 15:55, Bob Harris wrote:
Howdy, Andreas,
The four processes started for a galaxy lastz job must involve post-processing the lastz output through some other shell tool. Lastz by itself doesn't support multiple threads or processes.
Bob H
On Nov 26, 2012, at 3:58 AM, Andreas Kuntzagk wrote:
Hi,
I'm wandering how galaxy supports tools that are multithreaded or multi-process. When working with lastz I noticed that it starts 4 parallel processes. Is that always so? Can this be adjusted? What other tools also are multi-process?
regards, Andreas
-- Andreas Kuntzagk
SystemAdministrator
Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany
http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Andreas Kuntzagk
SystemAdministrator
Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany
http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich

On Tue, Nov 27, 2012 at 10:44 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Hi Alex,
I am not sure if you can call these "surprises".
Well at least it surprised me :-) Didn't want to sound to negative.
Some tools (which I highly appreciate) of Peter have been "parallelised" to get the job done more quickly. I earlier mentioned the ncbi blast+ wrappers but there the tool by itself handles the multithreading. ...
While I can read Python and bash fine it becomes more complicated with perl and R. Don't know if I could easily spot from the code what the number of threads is. So maybe somebody could setup a list of these tools?
regards, Andreas.
The short answer is *every* tool used in Galaxy may be multi-threaded. Sometimes this is done in the binary (e.g. BLAST), others do it in the wrapper when the underlying tool is single threaded (e.g. my SignalP and TMHMM wrappers which Alex mentioned). Sometimes the default is clearly defined in the XML (as a command line switch, e.g. BLAST), sometimes it is defined in a wrapper script, and sometimes it is defined in the tool binary itself (e.g. use all available CPUs). Peter

On Tue, Nov 27, 2012 at 8:58 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Hi,
the four processes I saw where all called "lastz" and ran in parallel and consumed 100% of a core each. My guess is that the lastz_wrapper.py is responsible for this. Looking at it I see a some code regarding queuing and in the very beginning this line:
WORKERS = 4
and further one the class BaseQueue which starts "threads". BTW. there seems to be no way to adjust this number other than editing the source file - bad.
And this get's me wondering if there are other such surprises hidden in galaxy.
regards, Andreas
As the author of several tool wrappers, I've been asking for a Galaxy wide mechanism for Galaxy to tell the tool how many threads it can use, for example via an environment variable. The value could then be set with a general default, per runner default, or even per tool using the existing runner configuration under [galaxy:tool_runners] in universe_wsgi.ini See: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-March/009037.html and: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-June/010153.html In your example, and others like the BWA and BLAST+ wrappers where the tool XML is hard coded to 8 threads, you would probably want to use a custom runner in universe_wsgi.ini setting the cluster submission to request that many slots/CPUs. For our local cluster, I modify the BLAST+ wrapper XML to use 4 threads, and have something like this in my universe_wsgi.ini file: [galaxy:tool_runners] ncbi_blastp_wrapper = drmaa://-V -pe smp 4/ ncbi_blastn_wrapper = drmaa://-V -pe smp 4/ ncbi_blastx_wrapper = drmaa://-V -pe smp 4/ ncbi_tblastn_wrapper = drmaa://-V -pe smp 4/ ncbi_tblastx_wrapper = drmaa://-V -pe smp 4/ Peter

Dear Peter,
As the author of several tool wrappers, I've been asking for a Galaxy wide mechanism for Galaxy to tell the tool how many threads it can use, for example via an environment variable. The value could then be set with a general default, per runner default, or even per tool using the existing runner configuration under [galaxy:tool_runners] in universe_wsgi.ini
This would be a possibility. Another would be to communicate the number of threads the other way. So the tool tells the runner how many threads. And the runner knows how to handle this. I can imagine universe_wsgi.ini having such lines: ncbi_blastp_wrapper = drmaa://-V -pe smp $GALAXY_THREADS and then $GALAXY_THREADS is changed for the value given by the wrapper. Thinking again this is probably not goint to work because the runner comes first and the wrapper after. My idea was that the wrapper could decide what recources to request. So I could use lower memory settings for small mapping jobs ...
In your example, and others like the BWA and BLAST+ wrappers where the tool XML is hard coded to 8 threads, you would probably want to use a custom runner in universe_wsgi.ini setting the cluster submission to request that many slots/CPUs.
A list of all these wrappers on the Wiki would be nice. regards, Andreas -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich

On Tue, Nov 27, 2012 at 10:38 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Dear Peter,
As the author of several tool wrappers, I've been asking for a Galaxy wide mechanism for Galaxy to tell the tool how many threads it can use, for example via an environment variable. The value could then be set with a general default, per runner default, or even per tool using the existing runner configuration under [galaxy:tool_runners] in universe_wsgi.ini
This would be a possibility. Another would be to communicate the number of threads the other way. So the tool tells the runner how many threads. And the runner knows how to handle this.
I can imagine universe_wsgi.ini having such lines:
ncbi_blastp_wrapper = drmaa://-V -pe smp $GALAXY_THREADS
and then $GALAXY_THREADS is changed for the value given by the wrapper. Thinking again this is probably not goint to work because the runner comes first and the wrapper after. My idea was that the wrapper could decide what recources to request. So I could use lower memory settings for small mapping jobs ...
There is some work on dynamic job allocation you might be interested in - have you seen this thread? http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-November/011759.html
In your example, and others like the BWA and BLAST+ wrappers where the tool XML is hard coded to 8 threads, you would probably want to use a custom runner in universe_wsgi.ini setting the cluster submission to request that many slots/CPUs.
A list of all these wrappers on the Wiki would be nice.
With many tools on the Tool Shed, I'm not sure how easy that would be to co-ordinate. Doing it for the core tools would be more realistic. Peter

Hi Peter, thanks for your replies. On 27.11.2012 11:44, Peter Cock wrote:
On Tue, Nov 27, 2012 at 10:38 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Dear Peter,
As the author of several tool wrappers, I've been asking for a Galaxy wide mechanism for Galaxy to tell the tool how many threads it can use, for example via an environment variable. The value could then be set with a general default, per runner default, or even per tool using the existing runner configuration under [galaxy:tool_runners] in universe_wsgi.ini
This would be a possibility. Another would be to communicate the number of threads the other way. So the tool tells the runner how many threads. And the runner knows how to handle this.
I can imagine universe_wsgi.ini having such lines:
ncbi_blastp_wrapper = drmaa://-V -pe smp $GALAXY_THREADS
and then $GALAXY_THREADS is changed for the value given by the wrapper. Thinking again this is probably not goint to work because the runner comes first and the wrapper after. My idea was that the wrapper could decide what recources to request. So I could use lower memory settings for small mapping jobs ...
There is some work on dynamic job allocation you might be interested in - have you seen this thread? http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-November/011759.html
This looks very promising. What I did not get from these messages is if that's already in galaxy-dist and where to put the dynamic job runner.
In your example, and others like the BWA and BLAST+ wrappers where the tool XML is hard coded to 8 threads, you would probably want to use a custom runner in universe_wsgi.ini setting the cluster submission to request that many slots/CPUs.
A list of all these wrappers on the Wiki would be nice.
With many tools on the Tool Shed, I'm not sure how easy that would be to co-ordinate. Doing it for the core tools would be more realistic.
I see the problem here. Especially since more and more tools are going into Tool Sheds. I was just looking for some way to reduce my workload ;-) -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich

On Nov 27, 2012, at 6:06 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Hi Peter,
thanks for your replies.
On 27.11.2012 11:44, Peter Cock wrote:
On Tue, Nov 27, 2012 at 10:38 AM, Andreas Kuntzagk <andreas.kuntzagk@mdc-berlin.de> wrote:
Dear Peter,
As the author of several tool wrappers, I've been asking for a Galaxy wide mechanism for Galaxy to tell the tool how many threads it can use, for example via an environment variable. The value could then be set with a general default, per runner default, or even per tool using the existing runner configuration under [galaxy:tool_runners] in universe_wsgi.ini
This would be a possibility. Another would be to communicate the number of threads the other way. So the tool tells the runner how many threads. And the runner knows how to handle this.
I can imagine universe_wsgi.ini having such lines:
ncbi_blastp_wrapper = drmaa://-V -pe smp $GALAXY_THREADS
and then $GALAXY_THREADS is changed for the value given by the wrapper. Thinking again this is probably not goint to work because the runner comes first and the wrapper after. My idea was that the wrapper could decide what recources to request. So I could use lower memory settings for small mapping jobs ...
There is some work on dynamic job allocation you might be interested in - have you seen this thread? http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-November/011759.html
This looks very promising. What I did not get from these messages is if that's already in galaxy-dist and where to put the dynamic job runner.
In your example, and others like the BWA and BLAST+ wrappers where the tool XML is hard coded to 8 threads, you would probably want to use a custom runner in universe_wsgi.ini setting the cluster submission to request that many slots/CPUs.
A list of all these wrappers on the Wiki would be nice.
With many tools on the Tool Shed, I'm not sure how easy that would be to co-ordinate. Doing it for the core tools would be more realistic.
I see the problem here. Especially since more and more tools are going into Tool Sheds. I was just looking for some way to reduce my workload ;-)
-- Andreas Kuntzagk
The "Right Way (TM)" I believe would be to have a universal resource request selector that could be plugged into any wrapper simply by including an appropriate element like say <resources proc=x pmem=y walltime=z />. Those variables could be exported, so the corresponding DRMAA call could be made in the dynamic runner and the data could be used in the wrapper to run the underlying tool as needed. Regards, Alex

On Tue, Nov 27, 2012 at 2:20 PM, Oleksandr Moskalenko <om@hpc.ufl.edu> wrote:
The "Right Way (TM)" I believe would be to have a universal resource request selector that could be plugged into any wrapper simply by including an appropriate element like say <resources proc=x pmem=y walltime=z />. Those variables could be exported, so the corresponding DRMAA call could be made in the dynamic runner and the data could be used in the wrapper to run the underlying tool as needed.
I am not convinced about that. For a simple non-dynamic setup I think the resources like the number of threads should be dictated by the local configuration (e.g. universe_wsgi.ini) and customised to the local compute resources, rather than in the tool wrappers which must be sufficiently general to run on any Galaxy install. In general we need dynamic negotiation between the tool (e.g. this tool can use as many threads as you like, suggest 8) and the local configuration (we want to limit this tool to just 4 threads to make maximum use of our cluster), and ideally the input data (e.g. this job will need lots of RAM and must go on the big memory queue). Right now the dynamic runner which John Chilton and others are working on seems capable of this (although quite complex). Regards, Peter

On Nov 27, 2012, at 9:37 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Nov 27, 2012 at 2:20 PM, Oleksandr Moskalenko <om@hpc.ufl.edu> wrote:
The "Right Way (TM)" I believe would be to have a universal resource request selector that could be plugged into any wrapper simply by including an appropriate element like say <resources proc=x pmem=y walltime=z />. Those variables could be exported, so the corresponding DRMAA call could be made in the dynamic runner and the data could be used in the wrapper to run the underlying tool as needed.
I am not convinced about that. For a simple non-dynamic setup I think the resources like the number of threads should be dictated by the local configuration (e.g. universe_wsgi.ini) and customised to the local compute resources, rather than in the tool wrappers which must be sufficiently general to run on any Galaxy install.
In general we need dynamic negotiation between the tool (e.g. this tool can use as many threads as you like, suggest 8) and the local configuration (we want to limit this tool to just 4 threads to make maximum use of our cluster), and ideally the input data (e.g. this job will need lots of RAM and must go on the big memory queue). Right now the dynamic runner which John Chilton and others are working on seems capable of this (although quite complex).
Regards,
Peter
The dynamic wrapper is capable of building a DRMAA call based on the external data and is what I am using for our local production instance. I cannot praise it highly enough. John Chilton has made a wonderful addition to the Galaxy. However, not being able to give users some manual control over the resource requests places the burden of figuring them out on the administrator and the dataset-based heuristics are often much worse then the knowledge of the person running the analysis. In addition, different tools use different options for setting thread numbers and cannot communicate realistic or even reasonable limits as those are based on the data from actually running the tool in different conditions and finding how well it scales. Dynamic negotiation is unfeasible at this time I think. The "simple non-dynamic setup" does not really work for any real-world multi-user instance anymore. Regards, Alex
participants (5)
-
Andreas Kuntzagk
-
Bob Harris
-
Bossers, Alex
-
Oleksandr Moskalenko
-
Peter Cock