Forgot to copy the list on this email.
On Mon, Sep 23, 2013 at 9:32 AM, Carlos Borroto carlos.borroto@gmail.com wrote:
On Fri, Sep 20, 2013 at 6:12 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi Carlos,
Can you try again? Also the new unstable version if you can. Thanks for the help! ATLAS is a beast :(
Sorry for the delayed response. Busy weekend.
Trying to install package_atlas_3_10 on Ubuntu 13.04:
STDERR It appears you have cpu throttling enabled, which makes timings unreliable and an ATLAS install nonsensical. Aborting. See ATLAS/INSTALL.txt for further information #############################################
I think this is the other relevant part in the log:
# try to disable cpu throttling if hash cpufreq-selector 2>/dev/null; then cpufreq-selector -g performance elif hash cpupower 2>/dev/null; then cpupower frequency-set -g performance else echo 'Please deactivate CPU throttling by your
own, or install cpufreq-selector' exit fi
STDERR Error calling SetGovernor: Caller is not authorized
I had to install 'cpufreq-selector' to get here, not installed by default. I can confirm I get the same error when trying to run this command directly: $ cpufreq-selector -g performance Error calling SetGovernor: Caller is not authorized
package_atlas_3_11 fails in exactly the same way in this box.
Somehow this is also a silent failing and numpy, biopython and ngs-tools(my tool package depending on biopython) get "Installed"(Green)
Yes that is correct. I designed it (in theory) in that way, that if ATLAS crashes (due to CPU throttling enabled) every other package can be installed without problem. Every other behaviour is a bug.
while lapack, atlas and split_by_barcode(my actual tool wrapper depending on ngs-tools package) get "Installed, missing tool dependencies"(Grey). This means if I try to use my wrapper in this state I get this error: /bin/sh: 1: ngs-tools: not found
On Ubuntu it gets installed without errors. Everything is green :)
However, if I do a "Repair repository" on split_by_barcode it goes into "Installed"(Green) and everything seems to work from then on.
Mh, thats seems to be a bug. I will try to reproduce on a computer where ATLAS is crashing.
Thanks for testing this. I also believe this might be a bug. --Carlos
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
Thanks,
Peter
In case this will help, I have the framework implemented (and committed) for handling pre-compioled binaries for tool dependencies for a supported set of architectures. Dave B has updated a lot of tool dependency definitions on both the test and main tool sheds to use this enhancement - those that he has updated are currently all owned by the devteam in preparation for a tool migration stage that he will soon be committiing. Perhaps the atlas tool dependency can be updated to proved a pre-compiled binary installation.
Greg Von Kuster
On Sep 26, 2013, at 11:10 AM, Peter Cock p.j.a.cock@googlemail.com wrote:
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
Thanks,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi,
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points: - fixing the atlas issue can speed up numpy, scipy, R considerably (by 400% in some cases) - as far as I understand that performance gain is due to optimizing itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?) - it seems to be complicated to deactivate CPU throttling on OS-X - binary installation does not make sense in that case, because ATLAS is self optimizing - Distribution shipped ATLAS packages are not really faster
Current state: - Atlas tries two different commands to deactivate CPU throttling. Afaik that only works on some Ubuntu versions, where no root privileges are necessary. - If atlas fails for some reason, numpy/R/scipy installation should not be affected (that's was at least the aim)
Questions: - Is it worth the hassle for some speed improvements? pip install numpy, would be so easy?
- If we want to support ATLAS, any better idea to how to implement it? Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
Thanks,
Peter
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance. If a system administrator or institution want to sacrifice reproduciblity and optimize specific packages they should be able to do so manually. Its not just Atlas and CPU throttling right? Its vendor versions of MPI, GPGPU variants of code, variants of OpenMP, etc.... Even if the tool shed provided some mechanism for determining if some particular package optimization is going to work, perhaps its better to just not enable it by default because frequently these cause slightly different results than the unoptimized version.
The problem with this recommendation that is Galaxy currently provides no mechanism for doing so. Luckily this is easy to solve and the solution solves other problems. If the tool dependency resolution code would grab the manually configured dependency instead of the tool shed variant when available, instead of favoring the opposite, then it would be really easy to add in an optimized version of numpy or an MPI version of software X.
Whats great is this solves other problems as well. For instance, our genomics Galaxy web server runs Debian but the worker nodes run CentOS. This means many tool shed installed dependencies do not work. JJ being the patient guy he is goes in and manually updates the tool shed installed env.sh files to load modules. Even if you think not running the same version of the OS on your server and worker nodes is a bit crazy, there is the much more reasonable (common) case of just wanting to submit to multiple different clusters. When I was talking with the guys at NCGAS they were unsure how to do this, this one change would make that a lot more tenable.
-John
On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi,
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points:
- fixing the atlas issue can speed up numpy, scipy, R considerably (by
400% in some cases)
- as far as I understand that performance gain is due to optimizing
itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?)
- it seems to be complicated to deactivate CPU throttling on OS-X
- binary installation does not make sense in that case, because ATLAS is
self optimizing
- Distribution shipped ATLAS packages are not really faster
Current state:
- Atlas tries two different commands to deactivate CPU throttling. Afaik
that only works on some Ubuntu versions, where no root privileges are necessary.
- If atlas fails for some reason, numpy/R/scipy installation should not
be affected (that's was at least the aim)
Questions:
- Is it worth the hassle for some speed improvements? pip install numpy,
would be so easy?
- If we want to support ATLAS, any better idea to how to implement it?
Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
Thanks,
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi John,
On Sep 26, 2013, at 5:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance. If a system administrator or institution want to sacrifice reproduciblity and optimize specific packages they should be able to do so manually. Its not just Atlas and CPU throttling right? Its vendor versions of MPI, GPGPU variants of code, variants of OpenMP, etc.... Even if the tool shed provided some mechanism for determining if some particular package optimization is going to work, perhaps its better to just not enable it by default because frequently these cause slightly different results than the unoptimized version.
The problem with this recommendation that is Galaxy currently provides no mechanism for doing so. Luckily this is easy to solve and the solution solves other problems. If the tool dependency resolution code would grab the manually configured dependency instead of the tool shed variant when available, instead of favoring the opposite, then it would be really easy to add in an optimized version of numpy or an MPI version of software X.
How would you like this to happen? Would it work to provide an admin the ability to create a ToolDependency object and point it to a "manually configured dependency" in whatever location on disk the admin chooses via a new UI feature? Or do you have a different idea?
Thanks,
Greg Von Kuster
Whats great is this solves other problems as well. For instance, our genomics Galaxy web server runs Debian but the worker nodes run CentOS. This means many tool shed installed dependencies do not work. JJ being the patient guy he is goes in and manually updates the tool shed installed env.sh files to load modules. Even if you think not running the same version of the OS on your server and worker nodes is a bit crazy, there is the much more reasonable (common) case of just wanting to submit to multiple different clusters. When I was talking with the guys at NCGAS they were unsure how to do this, this one change would make that a lot more tenable.
-John
On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi,
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points:
- fixing the atlas issue can speed up numpy, scipy, R considerably (by
400% in some cases)
- as far as I understand that performance gain is due to optimizing
itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?)
- it seems to be complicated to deactivate CPU throttling on OS-X
- binary installation does not make sense in that case, because ATLAS is
self optimizing
- Distribution shipped ATLAS packages are not really faster
Current state:
- Atlas tries two different commands to deactivate CPU throttling. Afaik
that only works on some Ubuntu versions, where no root privileges are necessary.
- If atlas fails for some reason, numpy/R/scipy installation should not
be affected (that's was at least the aim)
Questions:
- Is it worth the hassle for some speed improvements? pip install numpy,
would be so easy?
- If we want to support ATLAS, any better idea to how to implement it?
Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
Thanks,
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
There is a lot you could do to make this more complicated of course - an interface for mapping exact tool shed dependencies to manually installed ones, the ability to auto-compile tool shed dependencies against manually installed libraries, etc..., but I am not sure those complexities are buying you anything really.
Thoughts?
-John
On Thu, Sep 26, 2013 at 5:47 PM, Greg Von Kuster greg@bx.psu.edu wrote:
Hi John,
On Sep 26, 2013, at 5:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance. If a system administrator or institution want to sacrifice reproduciblity and optimize specific packages they should be able to do so manually. Its not just Atlas and CPU throttling right? Its vendor versions of MPI, GPGPU variants of code, variants of OpenMP, etc.... Even if the tool shed provided some mechanism for determining if some particular package optimization is going to work, perhaps its better to just not enable it by default because frequently these cause slightly different results than the unoptimized version.
The problem with this recommendation that is Galaxy currently provides no mechanism for doing so. Luckily this is easy to solve and the solution solves other problems. If the tool dependency resolution code would grab the manually configured dependency instead of the tool shed variant when available, instead of favoring the opposite, then it would be really easy to add in an optimized version of numpy or an MPI version of software X.
How would you like this to happen? Would it work to provide an admin the ability to create a ToolDependency object and point it to a "manually configured dependency" in whatever location on disk the admin chooses via a new UI feature? Or do you have a different idea?
Thanks,
Greg Von Kuster
Whats great is this solves other problems as well. For instance, our genomics Galaxy web server runs Debian but the worker nodes run CentOS. This means many tool shed installed dependencies do not work. JJ being the patient guy he is goes in and manually updates the tool shed installed env.sh files to load modules. Even if you think not running the same version of the OS on your server and worker nodes is a bit crazy, there is the much more reasonable (common) case of just wanting to submit to multiple different clusters. When I was talking with the guys at NCGAS they were unsure how to do this, this one change would make that a lot more tenable.
-John
On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi,
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points:
- fixing the atlas issue can speed up numpy, scipy, R considerably (by
400% in some cases)
- as far as I understand that performance gain is due to optimizing
itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?)
- it seems to be complicated to deactivate CPU throttling on OS-X
- binary installation does not make sense in that case, because ATLAS is
self optimizing
- Distribution shipped ATLAS packages are not really faster
Current state:
- Atlas tries two different commands to deactivate CPU throttling. Afaik
that only works on some Ubuntu versions, where no root privileges are necessary.
- If atlas fails for some reason, numpy/R/scipy installation should not
be affected (that's was at least the aim)
Questions:
- Is it worth the hassle for some speed improvements? pip install numpy,
would be so easy?
- If we want to support ATLAS, any better idea to how to implement it?
Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
Thanks,
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Make the precedence a config option. Otherwise I agree.
In addition, I still like the idea I suggested earlier of dependency provider plugins. Then you could (for example) have one that uses 'modules' and skips env.sh entirely.
On Sep 26, 2013, at 9:15 PM, John Chilton chilton@msi.umn.edu wrote:
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
There is a lot you could do to make this more complicated of course - an interface for mapping exact tool shed dependencies to manually installed ones, the ability to auto-compile tool shed dependencies against manually installed libraries, etc..., but I am not sure those complexities are buying you anything really.
Thoughts?
-John
On Thu, Sep 26, 2013 at 5:47 PM, Greg Von Kuster greg@bx.psu.edu wrote:
Hi John,
On Sep 26, 2013, at 5:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance. If a system administrator or institution want to sacrifice reproduciblity and optimize specific packages they should be able to do so manually. Its not just Atlas and CPU throttling right? Its vendor versions of MPI, GPGPU variants of code, variants of OpenMP, etc.... Even if the tool shed provided some mechanism for determining if some particular package optimization is going to work, perhaps its better to just not enable it by default because frequently these cause slightly different results than the unoptimized version.
The problem with this recommendation that is Galaxy currently provides no mechanism for doing so. Luckily this is easy to solve and the solution solves other problems. If the tool dependency resolution code would grab the manually configured dependency instead of the tool shed variant when available, instead of favoring the opposite, then it would be really easy to add in an optimized version of numpy or an MPI version of software X.
How would you like this to happen? Would it work to provide an admin the ability to create a ToolDependency object and point it to a "manually configured dependency" in whatever location on disk the admin chooses via a new UI feature? Or do you have a different idea?
Thanks,
Greg Von Kuster
Whats great is this solves other problems as well. For instance, our genomics Galaxy web server runs Debian but the worker nodes run CentOS. This means many tool shed installed dependencies do not work. JJ being the patient guy he is goes in and manually updates the tool shed installed env.sh files to load modules. Even if you think not running the same version of the OS on your server and worker nodes is a bit crazy, there is the much more reasonable (common) case of just wanting to submit to multiple different clusters. When I was talking with the guys at NCGAS they were unsure how to do this, this one change would make that a lot more tenable.
-John
On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi,
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points:
- fixing the atlas issue can speed up numpy, scipy, R considerably (by
400% in some cases)
- as far as I understand that performance gain is due to optimizing
itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?)
- it seems to be complicated to deactivate CPU throttling on OS-X
- binary installation does not make sense in that case, because ATLAS is
self optimizing
- Distribution shipped ATLAS packages are not really faster
Current state:
- Atlas tries two different commands to deactivate CPU throttling. Afaik
that only works on some Ubuntu versions, where no root privileges are necessary.
- If atlas fails for some reason, numpy/R/scipy installation should not
be affected (that's was at least the aim)
Questions:
- Is it worth the hassle for some speed improvements? pip install numpy,
would be so easy?
- If we want to support ATLAS, any better idea to how to implement it?
Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
Thanks,
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
James, it seems I was answering at the same time you were, so to highlight my comments to John, I'm just wondering how this will work for repositories in the tool shed that do not contain any tools, but just tool dependency definitions or complex repository dependency definitions.
Thanks,
Greg Von Kuster
On Sep 26, 2013, at 9:49 PM, James Taylor james@taylorlab.org wrote:
Make the precedence a config option. Otherwise I agree.
In addition, I still like the idea I suggested earlier of dependency provider plugins. Then you could (for example) have one that uses 'modules' and skips env.sh entirely.
On Sep 26, 2013, at 9:15 PM, John Chilton chilton@msi.umn.edu wrote:
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
There is a lot you could do to make this more complicated of course - an interface for mapping exact tool shed dependencies to manually installed ones, the ability to auto-compile tool shed dependencies against manually installed libraries, etc..., but I am not sure those complexities are buying you anything really.
Thoughts?
-John
On Thu, Sep 26, 2013 at 5:47 PM, Greg Von Kuster greg@bx.psu.edu wrote:
Hi John,
On Sep 26, 2013, at 5:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance. If a system administrator or institution want to sacrifice reproduciblity and optimize specific packages they should be able to do so manually. Its not just Atlas and CPU throttling right? Its vendor versions of MPI, GPGPU variants of code, variants of OpenMP, etc.... Even if the tool shed provided some mechanism for determining if some particular package optimization is going to work, perhaps its better to just not enable it by default because frequently these cause slightly different results than the unoptimized version.
The problem with this recommendation that is Galaxy currently provides no mechanism for doing so. Luckily this is easy to solve and the solution solves other problems. If the tool dependency resolution code would grab the manually configured dependency instead of the tool shed variant when available, instead of favoring the opposite, then it would be really easy to add in an optimized version of numpy or an MPI version of software X.
How would you like this to happen? Would it work to provide an admin the ability to create a ToolDependency object and point it to a "manually configured dependency" in whatever location on disk the admin chooses via a new UI feature? Or do you have a different idea?
Thanks,
Greg Von Kuster
Whats great is this solves other problems as well. For instance, our genomics Galaxy web server runs Debian but the worker nodes run CentOS. This means many tool shed installed dependencies do not work. JJ being the patient guy he is goes in and manually updates the tool shed installed env.sh files to load modules. Even if you think not running the same version of the OS on your server and worker nodes is a bit crazy, there is the much more reasonable (common) case of just wanting to submit to multiple different clusters. When I was talking with the guys at NCGAS they were unsure how to do this, this one change would make that a lot more tenable.
-John
On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi,
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points:
- fixing the atlas issue can speed up numpy, scipy, R considerably (by
400% in some cases)
- as far as I understand that performance gain is due to optimizing
itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?)
- it seems to be complicated to deactivate CPU throttling on OS-X
- binary installation does not make sense in that case, because ATLAS is
self optimizing
- Distribution shipped ATLAS packages are not really faster
Current state:
- Atlas tries two different commands to deactivate CPU throttling. Afaik
that only works on some Ubuntu versions, where no root privileges are necessary.
- If atlas fails for some reason, numpy/R/scipy installation should not
be affected (that's was at least the aim)
Questions:
- Is it worth the hassle for some speed improvements? pip install numpy,
would be so easy?
- If we want to support ATLAS, any better idea to how to implement it?
Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
Thanks,
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Thu, Sep 26, 2013 at 9:10 PM, Greg Von Kuster greg@bx.psu.edu wrote:
James, it seems I was answering at the same time you were, so to highlight my comments to John, I'm just wondering how this will work for repositories in the tool shed that do not contain any tools, but just tool dependency definitions or complex repository dependency definitions.
I am certain our European colleagues will have use cases to contribute when they have had their coffee, but I would just assume not get the tool shed involved. If you want to optimize a tool for Galaxy, the best practice would be you optimize it by replacing the package the corresponding tool sources not by manually compiling some library and having the tool shed compile additional tools against it. Does that answer your question?
-John
Thanks,
Greg Von Kuster
On Sep 26, 2013, at 9:49 PM, James Taylor james@taylorlab.org wrote:
Make the precedence a config option. Otherwise I agree.
In addition, I still like the idea I suggested earlier of dependency provider plugins. Then you could (for example) have one that uses 'modules' and skips env.sh entirely.
On Sep 26, 2013, at 9:15 PM, John Chilton chilton@msi.umn.edu wrote:
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
There is a lot you could do to make this more complicated of course - an interface for mapping exact tool shed dependencies to manually installed ones, the ability to auto-compile tool shed dependencies against manually installed libraries, etc..., but I am not sure those complexities are buying you anything really.
Thoughts?
-John
On Thu, Sep 26, 2013 at 5:47 PM, Greg Von Kuster greg@bx.psu.edu wrote:
Hi John,
On Sep 26, 2013, at 5:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance. If a system administrator or institution want to sacrifice reproduciblity and optimize specific packages they should be able to do so manually. Its not just Atlas and CPU throttling right? Its vendor versions of MPI, GPGPU variants of code, variants of OpenMP, etc.... Even if the tool shed provided some mechanism for determining if some particular package optimization is going to work, perhaps its better to just not enable it by default because frequently these cause slightly different results than the unoptimized version.
The problem with this recommendation that is Galaxy currently provides no mechanism for doing so. Luckily this is easy to solve and the solution solves other problems. If the tool dependency resolution code would grab the manually configured dependency instead of the tool shed variant when available, instead of favoring the opposite, then it would be really easy to add in an optimized version of numpy or an MPI version of software X.
How would you like this to happen? Would it work to provide an admin the ability to create a ToolDependency object and point it to a "manually configured dependency" in whatever location on disk the admin chooses via a new UI feature? Or do you have a different idea?
Thanks,
Greg Von Kuster
Whats great is this solves other problems as well. For instance, our genomics Galaxy web server runs Debian but the worker nodes run CentOS. This means many tool shed installed dependencies do not work. JJ being the patient guy he is goes in and manually updates the tool shed installed env.sh files to load modules. Even if you think not running the same version of the OS on your server and worker nodes is a bit crazy, there is the much more reasonable (common) case of just wanting to submit to multiple different clusters. When I was talking with the guys at NCGAS they were unsure how to do this, this one change would make that a lot more tenable.
-John
On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi,
> Hi Bjoern, > > Is there anything else we (the Galaxy community) can do to help > sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
> Another choice might be to use OpenBLAS instead of ATLAS, e.g. > http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
> However, I think we build NumPy without using ATLAS or any > BLAS library. That seems like the most pragmatic solution > in the short term - which I think is what Dan tried here: > http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points:
- fixing the atlas issue can speed up numpy, scipy, R considerably (by
400% in some cases)
- as far as I understand that performance gain is due to optimizing
itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?)
- it seems to be complicated to deactivate CPU throttling on OS-X
- binary installation does not make sense in that case, because ATLAS is
self optimizing
- Distribution shipped ATLAS packages are not really faster
Current state:
- Atlas tries two different commands to deactivate CPU throttling. Afaik
that only works on some Ubuntu versions, where no root privileges are necessary.
- If atlas fails for some reason, numpy/R/scipy installation should not
be affected (that's was at least the aim)
Questions:
- Is it worth the hassle for some speed improvements? pip install numpy,
would be so easy?
- If we want to support ATLAS, any better idea to how to implement it?
Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
> Thanks, > > Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi John,
On Sep 26, 2013, at 9:15 PM, John Chilton chilton@msi.umn.edu wrote:
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
Nothing in the Tool Shed itself would be affected or require modification for this new feature as this is completely on the Galaxy side.
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
This is a possibility perhaps, but there seems to be a potential weakness in that it doesn't require the Tool Dependency object to exist since the tool will function without a installed dependency from the Tool Shed. Or, if the installed depndency s required, then it is meaningless because it won't be used. If the former case, then the tool dependency cannot be shared via the Tool Shed's dependency mechanism because none of the relationships will be defined since nothing is installed. Wouldn't it be better to allow the Galaxy admin to point the ToolDependency object to a specified binary on disk? In this way, all relationships defined by Tool Shed installations will work as expected, with all contained tools with that dependency point to that same shared location on disk.
There is a lot you could do to make this more complicated of course - an interface for mapping exact tool shed dependencies to manually installed ones, the ability to auto-compile tool shed dependencies against manually installed libraries, etc..., but I am not sure those complexities are buying you anything really.
This is certainly a debatable topic, but I'm not seeing how my approach creates more complexity. The Galaxy admin is required to manually compile the binary dependency in either case. I'm just providing him an easy UI feature to enable a ToolDependency object that can be shared by any number of tools contained in any number of repositories installed from the Tool Shed to locate it. Using this approach, the Galaxy admin can either choose to install thje dependency from the Tool Shed or manually compile hte dependency and have the ToolDependency object point to it. In either case,all Tool Shed dependency definitions (both repository and tool depndencies) would all work as expected with additional repository installs.
Greg Von Kuster
Thoughts?
-John
On Thu, Sep 26, 2013 at 5:47 PM, Greg Von Kuster greg@bx.psu.edu wrote:
Hi John,
On Sep 26, 2013, at 5:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance. If a system administrator or institution want to sacrifice reproduciblity and optimize specific packages they should be able to do so manually. Its not just Atlas and CPU throttling right? Its vendor versions of MPI, GPGPU variants of code, variants of OpenMP, etc.... Even if the tool shed provided some mechanism for determining if some particular package optimization is going to work, perhaps its better to just not enable it by default because frequently these cause slightly different results than the unoptimized version.
The problem with this recommendation that is Galaxy currently provides no mechanism for doing so. Luckily this is easy to solve and the solution solves other problems. If the tool dependency resolution code would grab the manually configured dependency instead of the tool shed variant when available, instead of favoring the opposite, then it would be really easy to add in an optimized version of numpy or an MPI version of software X.
How would you like this to happen? Would it work to provide an admin the ability to create a ToolDependency object and point it to a "manually configured dependency" in whatever location on disk the admin chooses via a new UI feature? Or do you have a different idea?
Thanks,
Greg Von Kuster
Whats great is this solves other problems as well. For instance, our genomics Galaxy web server runs Debian but the worker nodes run CentOS. This means many tool shed installed dependencies do not work. JJ being the patient guy he is goes in and manually updates the tool shed installed env.sh files to load modules. Even if you think not running the same version of the OS on your server and worker nodes is a bit crazy, there is the much more reasonable (common) case of just wanting to submit to multiple different clusters. When I was talking with the guys at NCGAS they were unsure how to do this, this one change would make that a lot more tenable.
-John
On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi,
Hi Bjoern,
Is there anything else we (the Galaxy community) can do to help sort out the ATLAS installation problems?
Thanks for asking. I have indeed a few things I would like some comments.
Another choice might be to use OpenBLAS instead of ATLAS, e.g. http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-in...
I have no experience with it. Does it also need to turn off CPU throttling? I would assume so, or how is it optimizing itself?
However, I think we build NumPy without using ATLAS or any BLAS library. That seems like the most pragmatic solution in the short term - which I think is what Dan tried here: http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
I can remove them if that is the consensus.
A few points:
- fixing the atlas issue can speed up numpy, scipy, R considerably (by
400% in some cases)
- as far as I understand that performance gain is due to optimizing
itself on specific hardware, for atlas there is no way around to disable CPU throttling (How about OpenBlas?)
- it seems to be complicated to deactivate CPU throttling on OS-X
- binary installation does not make sense in that case, because ATLAS is
self optimizing
- Distribution shipped ATLAS packages are not really faster
Current state:
- Atlas tries two different commands to deactivate CPU throttling. Afaik
that only works on some Ubuntu versions, where no root privileges are necessary.
- If atlas fails for some reason, numpy/R/scipy installation should not
be affected (that's was at least the aim)
Questions:
- Is it worth the hassle for some speed improvements? pip install numpy,
would be so easy?
- If we want to support ATLAS, any better idea to how to implement it?
Any Tool Shed feature that can help? -> interactive installation? - can we flag a tool dependency as optional? So it can fail?
- Can anyone help with testing and fixing it?
Any opinions/comments? Bjoern
Thanks,
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
At Thu, 26 Sep 2013 22:03:09 -0400, Greg Von Kuster wrote:
Hi John,
On Sep 26, 2013, at 9:15 PM, John Chilton chilton@msi.umn.edu wrote:
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
Nothing in the Tool Shed itself would be affected or require modification for this new feature as this is completely on the Galaxy side.
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
This is a possibility perhaps, but there seems to be a potential weakness in that it doesn't require the Tool Dependency object to exist since the tool will function without a installed dependency from the Tool Shed. Or, if the installed depndency s required, then it is meaningless because it won't be used. If the former case, then the tool dependency cannot be shared via the Tool Shed's dependency mechanism because none of the relationships will be defined since nothing is installed. Wouldn't it be better to allow the Galaxy admin to point the ToolDependency object to a specified binary on disk? In this way, all relationships defined by Tool Shed installations will work as expected, with all contained tools with that dependency point to that same shared location on disk.
Hi Greg, John,
What you are discussing is pretty close to what I am after, and what I am prepared to spend some time working on, if that can help. This is what I have posted about a couple of times previously. The option to use a pre-installed package rather than a Galaxy installed one I think would be very useful in general. I think it can be done in a way that doesn't break the tool dependency model.
I envisage providing access to existing programs on disk by loading the relevant environment module. Platforms will differ greatly on where they have programs installed (usr/local/bwa-0.5.9, /usr/libexec/bwa-0.5.9, etc.). However, we could arrange to have a conventionally named environment module available, so Galaxy just has to be told somehow to do a module load bwa/0.5.9, say, prior to trying to run that tool, which will then be found on the PATH.
Hooking this in without killing the dependency on named and versioned toolshed repos could work like this. In parallel with the repo sets named, e.g. package_bwa_0_5_9, which download and build the package, we have a set named like native_package_bwa_0_5_9. This could have a tool_dependencies.xml file like this:
<?xml version="1.0"?> <tool_dependency> <native_package name="bwa" version="0.5.9"> <actions> <action type="load_module"> <module name="bwa" version="0.5.9"</module> </action> </actions> <readme> Uses native BWA via environment module bwa/0.5.9 </readme> </native_package> </tool_dependency>
What this does is ensure that any repo which requires version 0.5.9 of bwa can have this dependency resolved by either package_bwa_0_5_9, or native_package_bwa_0_5_9.
I envisage a Galaxy configuration setting that enables native package and/or Galaxy package dependency resolution, and which one should be preferred. Of course, if one of these has been manually installed, it will be used as is.
I would really like to be able to install tools from the toolshed, and have them resolve their dependencies automatically using my preinstalled application suite. I see that would also meet the needs being discussed here.
How does that sound?
cheers, Simon
======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================
Simon,
What is the advantage of putting that XML definition in the tool shed? It is not 100% true because of prior_install_required dependencies, but for the most part sourcing/load the environment for tools is a Galaxy problem, not so much a tool shed one. What if we did this instead?
Add an option to Galaxy's universe_wsgi.ini with the following default:
tool_dependency_resolution_order = gx_package_manual, gx_package_toolshed
Which essentially implements my idea above, with James' additional configuration. But which can be overridden as:
tool_dependency_resolution_order = plugin_module, gx_package_manual, gx_package_toolshed
If set this way then placing <requirement package="0.5.9">bwa</requirement> in a tool will result in the module bwa/0.5.9 being loaded if it is 'avail'able, else it will check for a manually installed env.sh (which is where MSI is currently putting its module loads), and else it will fallback to source the tool shed installed dependency.
I feel like this will give you everything you want without any extra XML or configuration. Let me know if I am wrong.
-John
On Thu, Sep 26, 2013 at 11:39 PM, Guest, Simon Simon.Guest@agresearch.co.nz wrote:
At Thu, 26 Sep 2013 22:03:09 -0400, Greg Von Kuster wrote:
Hi John,
On Sep 26, 2013, at 9:15 PM, John Chilton chilton@msi.umn.edu wrote:
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
Nothing in the Tool Shed itself would be affected or require modification for this new feature as this is completely on the Galaxy side.
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
This is a possibility perhaps, but there seems to be a potential weakness in that it doesn't require the Tool Dependency object to exist since the tool will function without a installed dependency from the Tool Shed. Or, if the installed depndency s required, then it is meaningless because it won't be used. If the former case, then the tool dependency cannot be shared via the Tool Shed's dependency mechanism because none of the relationships will be defined since nothing is installed. Wouldn't it be better to allow the Galaxy admin to point the ToolDependency object to a specified binary on disk? In this way, all relationships defined by Tool Shed installations will work as expected, with all contained tools with that dependency point to that same shared location on disk.
Hi Greg, John,
What you are discussing is pretty close to what I am after, and what I am prepared to spend some time working on, if that can help. This is what I have posted about a couple of times previously. The option to use a pre-installed package rather than a Galaxy installed one I think would be very useful in general. I think it can be done in a way that doesn't break the tool dependency model.
I envisage providing access to existing programs on disk by loading the relevant environment module. Platforms will differ greatly on where they have programs installed (usr/local/bwa-0.5.9, /usr/libexec/bwa-0.5.9, etc.). However, we could arrange to have a conventionally named environment module available, so Galaxy just has to be told somehow to do a module load bwa/0.5.9, say, prior to trying to run that tool, which will then be found on the PATH.
Hooking this in without killing the dependency on named and versioned toolshed repos could work like this. In parallel with the repo sets named, e.g. package_bwa_0_5_9, which download and build the package, we have a set named like native_package_bwa_0_5_9. This could have a tool_dependencies.xml file like this:
<?xml version="1.0"?>
<tool_dependency> <native_package name="bwa" version="0.5.9"> <actions> <action type="load_module"> <module name="bwa" version="0.5.9"</module> </action> </actions> <readme> Uses native BWA via environment module bwa/0.5.9 </readme> </native_package> </tool_dependency>
What this does is ensure that any repo which requires version 0.5.9 of bwa can have this dependency resolved by either package_bwa_0_5_9, or native_package_bwa_0_5_9.
I envisage a Galaxy configuration setting that enables native package and/or Galaxy package dependency resolution, and which one should be preferred. Of course, if one of these has been manually installed, it will be used as is.
I would really like to be able to install tools from the toolshed, and have them resolve their dependencies automatically using my preinstalled application suite. I see that would also meet the needs being discussed here.
How does that sound?
cheers, Simon
======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================
I have issued a pull request with a specific implementation of these ideas:
https://bitbucket.org/galaxy/galaxy-central/pull-request/227/tool-dependency...
Please feel free to comment.
-John
On Fri, Sep 27, 2013 at 12:23 AM, John Chilton chilton@msi.umn.edu wrote:
Simon,
What is the advantage of putting that XML definition in the tool shed? It is not 100% true because of prior_install_required dependencies, but for the most part sourcing/load the environment for tools is a Galaxy problem, not so much a tool shed one. What if we did this instead?
Add an option to Galaxy's universe_wsgi.ini with the following default:
tool_dependency_resolution_order = gx_package_manual, gx_package_toolshed
Which essentially implements my idea above, with James' additional configuration. But which can be overridden as:
tool_dependency_resolution_order = plugin_module, gx_package_manual, gx_package_toolshed
If set this way then placing <requirement package="0.5.9">bwa</requirement> in a tool will result in the module bwa/0.5.9 being loaded if it is 'avail'able, else it will check for a manually installed env.sh (which is where MSI is currently putting its module loads), and else it will fallback to source the tool shed installed dependency.
I feel like this will give you everything you want without any extra XML or configuration. Let me know if I am wrong.
-John
On Thu, Sep 26, 2013 at 11:39 PM, Guest, Simon Simon.Guest@agresearch.co.nz wrote:
At Thu, 26 Sep 2013 22:03:09 -0400, Greg Von Kuster wrote:
Hi John,
On Sep 26, 2013, at 9:15 PM, John Chilton chilton@msi.umn.edu wrote:
I was not even thinking we needed to modify the tool shed to implement this. I was hoping (?) you could just modify:
Nothing in the Tool Shed itself would be affected or require modification for this new feature as this is completely on the Galaxy side.
lib/galaxy/tools/deps/__init__.py
to implement this. If some tool contains the tag
<requirement type="package" version="1.7.1">numpy</requirement>
then if there is a manually installed tool_dependency in `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence over the tool shed installed version (would that be something like `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me know if this is way off base.
This is a possibility perhaps, but there seems to be a potential weakness in that it doesn't require the Tool Dependency object to exist since the tool will function without a installed dependency from the Tool Shed. Or, if the installed depndency s required, then it is meaningless because it won't be used. If the former case, then the tool dependency cannot be shared via the Tool Shed's dependency mechanism because none of the relationships will be defined since nothing is installed. Wouldn't it be better to allow the Galaxy admin to point the ToolDependency object to a specified binary on disk? In this way, all relationships defined by Tool Shed installations will work as expected, with all contained tools with that dependency point to that same shared location on disk.
Hi Greg, John,
What you are discussing is pretty close to what I am after, and what I am prepared to spend some time working on, if that can help. This is what I have posted about a couple of times previously. The option to use a pre-installed package rather than a Galaxy installed one I think would be very useful in general. I think it can be done in a way that doesn't break the tool dependency model.
I envisage providing access to existing programs on disk by loading the relevant environment module. Platforms will differ greatly on where they have programs installed (usr/local/bwa-0.5.9, /usr/libexec/bwa-0.5.9, etc.). However, we could arrange to have a conventionally named environment module available, so Galaxy just has to be told somehow to do a module load bwa/0.5.9, say, prior to trying to run that tool, which will then be found on the PATH.
Hooking this in without killing the dependency on named and versioned toolshed repos could work like this. In parallel with the repo sets named, e.g. package_bwa_0_5_9, which download and build the package, we have a set named like native_package_bwa_0_5_9. This could have a tool_dependencies.xml file like this:
<?xml version="1.0"?>
<tool_dependency> <native_package name="bwa" version="0.5.9"> <actions> <action type="load_module"> <module name="bwa" version="0.5.9"</module> </action> </actions> <readme> Uses native BWA via environment module bwa/0.5.9 </readme> </native_package> </tool_dependency>
What this does is ensure that any repo which requires version 0.5.9 of bwa can have this dependency resolved by either package_bwa_0_5_9, or native_package_bwa_0_5_9.
I envisage a Galaxy configuration setting that enables native package and/or Galaxy package dependency resolution, and which one should be preferred. Of course, if one of these has been manually installed, it will be used as is.
I would really like to be able to install tools from the toolshed, and have them resolve their dependencies automatically using my preinstalled application suite. I see that would also meet the needs being discussed here.
How does that sound?
cheers, Simon
======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================
At Fri, 27 Sep 2013 00:23:37 -0500, John Chilton wrote:
Simon,
What is the advantage of putting that XML definition in the tool shed? It is not 100% true because of prior_install_required dependencies, but for the most part sourcing/load the environment for tools is a Galaxy problem, not so much a tool shed one. What if we did this instead?
Add an option to Galaxy's universe_wsgi.ini with the following default:
tool_dependency_resolution_order = gx_package_manual, gx_package_toolshed
Which essentially implements my idea above, with James' additional configuration. But which can be overridden as:
tool_dependency_resolution_order = plugin_module, gx_package_manual, gx_package_toolshed
If set this way then placing <requirement package="0.5.9">bwa</requirement> in a tool will result in the module bwa/0.5.9 being loaded if it is 'avail'able, else it will check for a manually installed env.sh (which is where MSI is currently putting its module loads), and else it will fallback to source the tool shed installed dependency.
I feel like this will give you everything you want without any extra XML or configuration. Let me know if I am wrong.
Hi John,
I think you're right. Your scheme is neater than what I was proposing. The extra flexibility I was aiming at via some toolshed XML stuff appears not to be necessary upon further reflection. (I wanted to ensure a Galaxy admin could just install some RPMs, install a toolshed tool, and have everything resolve nicely. You seem to have achieved that with your scheme.)
I haven't had a chance to try your code yet, but as soon as I can I will do so, and get back to you.
If I leave out gx_package_toolshed altogether from tool_dependency_resolution_order, will the tool installation in Galaxy simply fail with a nice error message if the environment module and/or env.sh files are not found? (This is what I would like, as it would serve as a prompt to the Galaxy admin to install some extra RPMs or whatever.)
Will this also work for those toolshed packages which bundle their package definitions (to download, make and install the tool dependency) along with their wrappers?
Thanks for working on this.
cheers, Simon
======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================
On Sun, Sep 29, 2013 at 10:43 PM, Guest, Simon Simon.Guest@agresearch.co.nz wrote:
At Fri, 27 Sep 2013 00:23:37 -0500, John Chilton wrote:
Simon,
What is the advantage of putting that XML definition in the tool shed? It is not 100% true because of prior_install_required dependencies, but for the most part sourcing/load the environment for tools is a Galaxy problem, not so much a tool shed one. What if we did this instead?
Add an option to Galaxy's universe_wsgi.ini with the following default:
tool_dependency_resolution_order = gx_package_manual, gx_package_toolshed
Which essentially implements my idea above, with James' additional configuration. But which can be overridden as:
tool_dependency_resolution_order = plugin_module, gx_package_manual, gx_package_toolshed
If set this way then placing <requirement package="0.5.9">bwa</requirement> in a tool will result in the module bwa/0.5.9 being loaded if it is 'avail'able, else it will check for a manually installed env.sh (which is where MSI is currently putting its module loads), and else it will fallback to source the tool shed installed dependency.
I feel like this will give you everything you want without any extra XML or configuration. Let me know if I am wrong.
Hi John,
I think you're right. Your scheme is neater than what I was proposing. The extra flexibility I was aiming at via some toolshed XML stuff appears not to be necessary upon further reflection. (I wanted to ensure a Galaxy admin could just install some RPMs, install a toolshed tool, and have everything resolve nicely. You seem to have achieved that with your scheme.)
I haven't had a chance to try your code yet, but as soon as I can I will do so, and get back to you.
If I leave out gx_package_toolshed altogether from tool_dependency_resolution_order, will the tool installation in Galaxy simply fail with a nice error message if the environment module and/or env.sh files are not found? (This is what I would like, as it would serve as a prompt to the Galaxy admin to install some extra RPMs or whatever.)
First tool_dependency_resolution_order is not how I landed up implementing it. Each "resolver" may need parameters so I decided to do an XML configuration kind of like Nate's job_conf.xml stuff. So to leave the tool shed stuff off you can just add dependency_resolvers_conf.xml to your Galaxy root.
<dependency_resolvers> <galaxy_packages /> <modules /> <modules versionless="true" /> </dependency>
You can also just leave galaxy_packages off if you are not using any manually installed env.sh files.
The first modules resolver will try to match each tag like this:
<requirement type="package" version="3.0.1">R</requirement>
with a module load like:
R/3.0.1
if that exact module is available. The second versionless form of the tag will result in Galaxy just falling back to auto-load whatever the default R module is if the exact version specified (3.0.1) is not available.
This code doesn't touch the tool shed at all, so if you check to install dependencies, they will still be installed and if you don't they will not be installed. This configuration just tells the Galaxy how to use the dependencies that happen to be installed. You will need to review installed tools and make sure you have matching modules. Additionally, some tools may use this set_environment requirement type, I am not sure how to implement this or how prevalent its use is.
-John
Will this also work for those toolshed packages which bundle their package definitions (to download, make and install the tool dependency) along with their wrappers?
Can you opt not to install packages for such repositories? Either way, the answer is I guess the same as above, the tool shed is unchanged, its just how Galaxy utilizes the installed dependencies that is being modified here.
Thanks for working on this.
cheers, Simon
======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================
On Thu, Sep 26, 2013 at 10:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance.
Reproducibility versus speed is a particular issue with floating point libraries - NumPy using ATLAS vs OpenBLAST vs Intel MKL vs just plain NumPy will probably all give slightly different answers (on some tasks), and at different speed.
In this case, for simplicity I would advocate plain NumPy, without worrying about needing ATLAS. For packages needing NumPy with ALTAS, perhaps a new Tool Shed entry could be created, package_numpy_1_7_with_atlas or similar (based on the current configuration)?
Peter
Hi Peter and John,
thanks for your comments and thanks for working on the patch John.
On Thu, Sep 26, 2013 at 10:27 PM, John Chilton chilton@msi.umn.edu wrote:
My recommendation would be make the tool dependency install work on as many platforms as you can and not try to optimize in such a way that it is not going to work - i.e. favor reproduciblity over performance.
Reproducibility versus speed is a particular issue with floating point libraries - NumPy using ATLAS vs OpenBLAST vs Intel MKL vs just plain NumPy will probably all give slightly different answers (on some tasks), and at different speed.
In this case, for simplicity I would advocate plain NumPy, without worrying about needing ATLAS. For packages needing NumPy with ALTAS, perhaps a new Tool Shed entry could be created, package_numpy_1_7_with_atlas or similar (based on the current configuration)?
I updated R, numpy, scipy, scikit and removed the atlas dependency. It seems to work fine for the ChemicalToolBox. I do not remove the lapack dependency, because I did not get any complains until now. I also created new repositories *_with_atlas in my galaxytools repository, if anyone is interested in atlas dependent packages.
Lets concentrate on reproducibility and leave out the speed improvements for now. I admit it was do ambitious/idealistic.
Have a nice weekend, Bjoern
Peter
On Sun, Sep 29, 2013 at 1:17 PM, Björn Grüning bjoern.gruening@pharmazie.uni-freiburg.de wrote:
Hi Peter and John,
thanks for your comments and thanks for working on the patch John.
Peter wrote:
In this case, for simplicity I would advocate plain NumPy, without worrying about needing ATLAS. For packages needing NumPy with ALTAS, perhaps a new Tool Shed entry could be created, package_numpy_1_7_with_atlas or similar (based on the current configuration)?
I updated R, numpy, scipy, scikit and removed the atlas dependency. It seems to work fine for the ChemicalToolBox. I do not remove the lapack dependency, because I did not get any complains until now.
Great. Hopefully all my tools with a NumPy dependency via Biopython will pickup this change automatically when the next nightly Tool Shed tests are run.
(That's what I hope will happen, otherwise I'll need to bump the dependency revision in the Biopython package & all the tools calling that).
I also created new repositories *_with_atlas in my galaxytools repository, if anyone is interested in atlas dependent packages.
Sounds good :)
Lets concentrate on reproducibility and leave out the speed improvements for now. I admit it was do ambitious/idealistic.
It is nice to aim high, but I think here practicality wins.
Have a nice weekend, Bjoern
You too,
Peter
galaxy-dev@lists.galaxyproject.org