Tool Shed install best practise : Precompiled binaries vs local compile
Hi Greg, I've retitled the thread, previously about a ToolShed nightly test failure. A brief recap, we're talking about the Galaxy ToolShed XML installation recipes for the NCBI BLAST+ packages and my MIRA4 wrapper in their tool_dependencies.xml files: http://toolshed.g2.bx.psu.edu/view/iuc/package_blast_plus_2_2_29 http://testtoolshed.g2.bx.psu.edu/view/iuc/package_blast_plus_2_2_29 http://testtoolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler These use the pattern of having os/arch specific <action> tags (which download and install the tool author's precompiled binaries) and a fall back default <action> which is to report an error with the os/arch combination and that there are no ready made binaries available. Greg is instead advocating the fall back action be to download the source code, and do a local compile. My reply is below... On Thu, Mar 6, 2014 at 5:24 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Thu, Mar 6, 2014 at 4:53 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
As we briefly discussed earlier, your mira4 recipe is not currently following best practices. Although you uncovered a problem in the framework which has now been corrected, your recipe's fall back <actions> tag set should be the recipe for installing mira4 from source ( http://sourceforge.net/projects/mira-assembler/ ) since there is no licensing issues for doing so. This would be a more ideal approach than echoing the error messages.
Thanks very much for helping us discover this problem though!
Greg Von Kuster
Hi Greg,
No problem - I'm good at discovering problems ;)
If the download approach failed, it it most likely due to a transient error (e.g. network issues with download). Here I would much prefer Galaxy aborted and reported this as an error (and does not attempt the default action). Is that what you just fixed?
As to best practice for the fall back action, I think that needs a new thread.
Regards,
Peter
As to best practice, I do not agree that in cases like this (MIRA4, NCBI BLAST+) where there are provided binaries for the major platforms that the fall back should be compiling from source. The NCBI BLAST+ provide binaries for 32 bit and 64 bit Linux and Mac OS X (which I believe covers all the mainstream platforms Galaxy runs on). Similarly, MIRA4 provides binaries for 64 bit Linux and Mac OS X. Note that 32 bit binaries are not provided, but would be very restricted in terms of the datasets they could be used on anyway - and I doubt many of the systems Galaxy runs on these days are 32 bits. If the os/arch combination is exotic enough that precompiled binaries are not available, then it is likely compilation will be tricky anyway - or not supported for that tool, or Galaxy itself. Essentially I am arguing that where the precompiled tool binaries cover any mainstream system Galaxy might be used on, a local compile fall back is not needed. Also, these are both complex tools which are relatively slow to compile, and have quite a large compile time dependency set (e.g. MIRA4 requires at least a quite recent GCC, BOOST, flex, expat, and strongly recommends TCmalloc). Here at least some of the dependencies have been packaged for the ToolShed (probably by Bjoern?) but in the case of MIRA4 and BLAST+ this is still a lot of effort for no practical gain. I also feel there is an argument that the Galaxy goal of reproducibility should favour using precompiled binaries if available: A locally compiled binary will generally mean a different compiler version, perhaps with different optimisation flags, and different library versions. It will not necessarily give the same results as the tool author's provided precompiled binary. (Wow, this ended up being a long email!) Regards, Peter
Hi Peter, I just wanted to add my $0.02 USD to say that I mostly agree with this - I have long used binaries precompiled by the tool author on Main, especially for cases where, as you say, the compile-time dependency list is large and painful. The only "gotcha" here is to make sure that binaries support the oldest possible version of glibc that might be installed on any of the Linux distributions and versions that we support, and that no non-standard preinstalled libraries have been pulled in during the build process. --nate On Thu, Mar 6, 2014 at 12:46 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi Greg,
I've retitled the thread, previously about a ToolShed nightly test failure.
A brief recap, we're talking about the Galaxy ToolShed XML installation recipes for the NCBI BLAST+ packages and my MIRA4 wrapper in their tool_dependencies.xml files:
http://toolshed.g2.bx.psu.edu/view/iuc/package_blast_plus_2_2_29 http://testtoolshed.g2.bx.psu.edu/view/iuc/package_blast_plus_2_2_29 http://testtoolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
These use the pattern of having os/arch specific <action> tags (which download and install the tool author's precompiled binaries) and a fall back default <action> which is to report an error with the os/arch combination and that there are no ready made binaries available.
Greg is instead advocating the fall back action be to download the source code, and do a local compile.
My reply is below...
On Thu, Mar 6, 2014 at 5:24 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Thu, Mar 6, 2014 at 4:53 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
As we briefly discussed earlier, your mira4 recipe is not currently following best practices. Although you uncovered a problem in the framework which has now been corrected, your recipe's fall back <actions> tag set should be the recipe for installing mira4 from source ( http://sourceforge.net/projects/mira-assembler/ ) since there is no licensing issues for doing so. This would be a more ideal approach than echoing the error messages.
Thanks very much for helping us discover this problem though!
Greg Von Kuster
Hi Greg,
No problem - I'm good at discovering problems ;)
If the download approach failed, it it most likely due to a transient error (e.g. network issues with download). Here I would much prefer Galaxy aborted and reported this as an error (and does not attempt the default action). Is that what you just fixed?
As to best practice for the fall back action, I think that needs a new thread.
Regards,
Peter
As to best practice, I do not agree that in cases like this (MIRA4, NCBI BLAST+) where there are provided binaries for the major platforms that the fall back should be compiling from source.
The NCBI BLAST+ provide binaries for 32 bit and 64 bit Linux and Mac OS X (which I believe covers all the mainstream platforms Galaxy runs on).
Similarly, MIRA4 provides binaries for 64 bit Linux and Mac OS X. Note that 32 bit binaries are not provided, but would be very restricted in terms of the datasets they could be used on anyway - and I doubt many of the systems Galaxy runs on these days are 32 bits.
If the os/arch combination is exotic enough that precompiled binaries are not available, then it is likely compilation will be tricky anyway - or not supported for that tool, or Galaxy itself.
Essentially I am arguing that where the precompiled tool binaries cover any mainstream system Galaxy might be used on, a local compile fall back is not needed.
Also, these are both complex tools which are relatively slow to compile, and have quite a large compile time dependency set (e.g. MIRA4 requires at least a quite recent GCC, BOOST, flex, expat, and strongly recommends TCmalloc). Here at least some of the dependencies have been packaged for the ToolShed (probably by Bjoern?) but in the case of MIRA4 and BLAST+ this is still a lot of effort for no practical gain.
I also feel there is an argument that the Galaxy goal of reproducibility should favour using precompiled binaries if available: A locally compiled binary will generally mean a different compiler version, perhaps with different optimisation flags, and different library versions. It will not necessarily give the same results as the tool author's provided precompiled binary.
(Wow, this ended up being a long email!)
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Fri, Mar 14, 2014 at 4:41 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi Peter,
I just wanted to add my $0.02 USD to say that I mostly agree with this - I have long used binaries precompiled by the tool author on Main, especially for cases where, as you say, the compile-time dependency list is large and painful. The only "gotcha" here is to make sure that binaries support the oldest possible version of glibc that might be installed on any of the Linux distributions and versions that we support, and that no non-standard preinstalled libraries have been pulled in during the build process.
--nate
Good point about potential problems with glibc, and I guess any run time linked libraries which might be a different version or missing? Peter
On Fri, Mar 14, 2014 at 12:47 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Fri, Mar 14, 2014 at 4:41 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi Peter,
I just wanted to add my $0.02 USD to say that I mostly agree with this - I have long used binaries precompiled by the tool author on Main, especially for cases where, as you say, the compile-time dependency list is large and painful. The only "gotcha" here is to make sure that binaries support the oldest possible version of glibc that might be installed on any of the Linux distributions and versions that we support, and that no non-standard preinstalled libraries have been pulled in during the build process.
--nate
Good point about potential problems with glibc, and I guess any run time linked libraries which might be a different version or missing?
Peter
Right, many things (especially using autoconf) will automatically link against libraries if present and disable functionality if not. Hopefully if we were returning to mostly vanilla tool test VMs in the future, any problems would be easily discoverable. --nate
Hi Greg and Peter, Am 06.03.2014 18:46, schrieb Peter Cock:
Hi Greg,
I've retitled the thread, previously about a ToolShed nightly test failure.
A brief recap, we're talking about the Galaxy ToolShed XML installation recipes for the NCBI BLAST+ packages and my MIRA4 wrapper in their tool_dependencies.xml files:
http://toolshed.g2.bx.psu.edu/view/iuc/package_blast_plus_2_2_29 http://testtoolshed.g2.bx.psu.edu/view/iuc/package_blast_plus_2_2_29 http://testtoolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
These use the pattern of having os/arch specific <action> tags (which download and install the tool author's precompiled binaries) and a fall back default <action> which is to report an error with the os/arch combination and that there are no ready made binaries available.
Greg is instead advocating the fall back action be to download the source code, and do a local compile.
My reply is below...
On Thu, Mar 6, 2014 at 5:24 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Thu, Mar 6, 2014 at 4:53 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
As we briefly discussed earlier, your mira4 recipe is not currently following best practices. Although you uncovered a problem in the framework which has now been corrected, your recipe's fall back <actions> tag set should be the recipe for installing mira4 from source ( http://sourceforge.net/projects/mira-assembler/ ) since there is no licensing issues for doing so. This would be a more ideal approach than echoing the error messages.
Thanks very much for helping us discover this problem though!
Greg Von Kuster
Hi Greg,
No problem - I'm good at discovering problems ;)
If the download approach failed, it it most likely due to a transient error (e.g. network issues with download). Here I would much prefer Galaxy aborted and reported this as an error (and does not attempt the default action). Is that what you just fixed?
As to best practice for the fall back action, I think that needs a new thread.
Regards,
Peter
As to best practice, I do not agree that in cases like this (MIRA4, NCBI BLAST+) where there are provided binaries for the major platforms that the fall back should be compiling from source.
The NCBI BLAST+ provide binaries for 32 bit and 64 bit Linux and Mac OS X (which I believe covers all the mainstream platforms Galaxy runs on).
Similarly, MIRA4 provides binaries for 64 bit Linux and Mac OS X. Note that 32 bit binaries are not provided, but would be very restricted in terms of the datasets they could be used on anyway - and I doubt many of the systems Galaxy runs on these days are 32 bits.
I also think that supporting 32 bit is not really needed and in case of a few libs are really troublesome.
If the os/arch combination is exotic enough that precompiled binaries are not available, then it is likely compilation will be tricky anyway - or not supported for that tool, or Galaxy itself.
Essentially I am arguing that where the precompiled tool binaries cover any mainstream system Galaxy might be used on, a local compile fall back is not needed.
Imho, that statement is to general. There might be some binaries that are done properly but many of them have still some strange runtime dependencies. In these cases we need to have a compile time fallback.
Also, these are both complex tools which are relatively slow to compile, and have quite a large compile time dependency set (e.g. MIRA4 requires at least a quite recent GCC, BOOST, flex, expat, and strongly recommends TCmalloc). Here at least some of the dependencies have been packaged for the ToolShed (probably by Bjoern?) but in the case of MIRA4 and BLAST+ this is still a lot of effort for no practical gain.
I don't think compile time really matters, you only need to compile them once and I think most of us can wait one hour.
I also feel there is an argument that the Galaxy goal of reproducibility should favour using precompiled binaries if available: A locally compiled binary will generally mean a different compiler version, perhaps with different optimisation flags, and different library versions. It will not necessarily give the same results as the tool author's provided precompiled binary.
Yes, that's a good point. One the other hand we should not forget that binaries are not necessarily usable over many years. As a really bad example take a look at the UCSC tools. You can't run the latest UCSC tools on a old scientific linux. Because libc is to old. So you are totally lost. I'm not sure how good the MIRA binaries are, but I would like to point out that there are huge differences in how you can produce these binaries. I'm in favour of having both options available where ever we can and let the administrator choose the best way to install. Maybe with a default universe_wsgi.xml setting (preferred_toolshed_install = "binary"). I would not call it 'fallback', its really a different installation strategy, with different priorities. (There was/is a trello card for it, or?) That said, I totally understand that if you have binaries you do not want to go through the trouble of compiling it with all dependency but we should highlight that this is the 'best/ideal/preferred way' to do so. Ciao, Bjoern
(Wow, this ended up being a long email!)
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
Björn Grüning
-
Nate Coraor
-
Peter Cock