Please see my inline comments.  Thanks!

Greg Von Kuster


On Nov 7, 2013, at 1:33 PM, John Chilton <chilton@msi.umn.edu> wrote:

On Thu, Nov 7, 2013 at 1:46 AM, Björn Grüning
<bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Am Donnerstag, den 07.11.2013, 00:25 -0600 schrieb John Chilton:

My two cents below.

On Wed, Nov 6, 2013 at 4:20 PM, Björn Grüning
<bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Dave,

We're thinking that the following approach makes the most sense:

<action type="setup_perl_environment"> OR <action
type="setup_r_environment"> OR <action type="setup_ruby_environment"> OR
<action type="setup_virtualenv">
    <repository changeset_revision="978287122b91"
name="package_perl_5_18" owner="iuc"
toolshed="http://testtoolshed.g2.bx.psu.edu">
        <package name="perl" version="5.18.1" />
    </repository>
    <repository changeset_revision="8fc96166cddd"
name="package_expat_2_1" owner="iuc"
toolshed="http://testtoolshed.g2.bx.psu.edu">
        <package name="expat" version="2.1" />
    </repository>
</action>

For all <repository> tag sets contained within these setup_* tags, the
repository's env.sh would be pulled in for the setup of the specified
environment without requiring a set_environment_for_install action type.

Would this work for your use cases?

Yes, the first one. But its a little bit to verbose or? Include the perl
repository in a setup_perl environment should be implicit or? We can
assume that this need to be present.
Do you have example why sourcing every <repository> by default can be
harmful? It would make such an installation so much easier and less
complex.

I am not sure I understand this paragraph - I have a vague sense I
agree but is there any chance you could rephrase this or elaborate?

My first use case will be addressed by this suggestion. I had hoped that we
can create a less verbose syntax.
If we I specify a package at the top of my xml file:


   <package name="expat" version="2.1.0">
       <repository name="package_expat_2_1" owner="iuc"
prior_installation_required="True" />
   </package>

I need to repeat it either in a <action type="set_environment_for_install">
or in a <action type="setup_perl_environment">.
My hope was to get rid of these. Once a package definition is
specified/build, every ENV var is available in any downstream <package>.
But if there is any downsides or pitfalls this more verbose and explicit
syntax will work for my usecase.


The potential problem I see here is that environment variables are not name spaced in any way, so if all env.sh files are sourced no matter what, there is the potential for a certain environment variable to get set to a certain dependency version, and then later during the installation (assuming a hierarchy of repository dependencies), the same environment variable gets set to a different version of the same dependency.  I'm not sure how often (if ever) this couls occur, but if it did, it the installation would not be as expected.




I see, this makes perfect sense to me now, thanks! I certainly agree
that it should have to be spelled out twice unless there is a good
reason. I guess my preference would be to just see it inside of the
setup_perl_environment tag - why should it need to be at the top-level
as well? There could be many implementation details that make this
difficult though, so obviously I delegate to Greg/Dave on this.


Just so I'm clear on this, is this what you want implemented as an enhancement to the setup_* tag sets?

<action type="setup_perl_environment"> OR <action type="setup_r_environment"> OR <action type="setup_ruby_environment"> OR <action type="setup_virtualenv">
   <repository changeset_revision="978287122b91" name="package_perl_5_18" owner="iuc" toolshed="http://testtoolshed.g2.bx.psu.edu">
       <package name="perl" version="5.18.1" />
   </repository>
   <repository changeset_revision="8fc96166cddd" name="package_expat_2_1" owner="iuc" toolshed="http://testtoolshed.g2.bx.psu.edu">
       <package name="expat" version="2.1" />
   </repository>
</action>

For all <repository> tag sets contained within these setup_* tags, the repository's env.sh would be pulled in for the setup of the specified environment without requiring a set_environment_for_install action type.





Also that did not solve the second use case. If have two <packages> one
that is installing perl libraries and the second a binary that is
checking or that needs these perl libs.

We have discussed off list in another thread. Just to summarize my
thoughts there - I think we should delay this or not make it a
priority if there are marginally acceptable workarounds that can be
found for the time being. Getting these four actions to work well as
sort of terminal endpoints and allow specification as tersely as
possible should be the primary goal for the time being. You will see
Perl or Python packages depend on C libraries 10 times more frequently
than you will find makefiles and C programs depend on complex perl or
python environments (correct me if I am wrong). Given that there is
already years worth of tool shed development outlined in existing
Trello cards - this is just how I would prioritize things (happy to be
overruled).

Ok point taken. Lets focus on real issue. That use case is just a
simplification / more structured way to write tool depdendecies,
its not strictly needed to get my packages done.
John just to make that use case clearer:
- You have a package (A) with dependency (B)
- B is not worth to put it in a extra repository (extra
tool_dependencies.xml file)

Currently, you are forced to define both in one <package> tag, because if
you define it in two <package> tags A will not see B.
The perl and python was a bad example you have that problem with every
dependency that are not worth to put it in a separate repository.


To summarize:
I'm fine with that approach. It will address my current use case and it
would be great to have it as proposed by Dave!

Thanks a lot!
Bjoern



If so, can you confirm that this should be done for all four currently
supported setup_* action types?

I think it would be best to tackle setup_r_environment and
setup_ruby_environment first. setup_virtualenv cannot have nested
elements at this time - it is just assumed to be a bunch of text
(either a file containing the dependencies or a list of the
dependencies).

So setup_r_environment and setup_ruby_environment have the same structure:

<setup_ruby_environment>
 <repository .. />
 <package .. />
 <package .. />
</setup_ruby_environment>

... but setup_virtualenv is just

<setup_virtualenv>requests=1.20
pycurl==1.3</setup_virtualenv>

I have created a Trello card for this: https://trello.com/c/NsLJv9la
(and some other related stuff).

Once that is tackled though, it will make sense to allow
setup_virtualenv to utilize the same functionality.

Thanks all,
-John


I think it will solve my current issues.

Based on your response, Greg or I will implement this as soon as
possible.

Thanks!
Bjoern

   --Dave B.

On 11/06/2013 03:05 AM, Björn Grüning wrote:
Hi John,

Perl complicates things, TPP complicates things greatly.

So true, so true ...

Bjoern, can I ask you if this hypothetical exhibits the same problem
and can be used to reason about these things more easily and drive a
test implementation.

Yes to both questions :)

So right now, Galaxy has setup_virtualenv which will build and install
Python packages in a virtual environment. However, some Python
packages have C library dependencies that could prevent them from
being installed.

As a specific example - take PyTables (install via "pip install
tables") - which is a package for managing hierarchical datasets. If
you try to install this with pip the way Galaxy will - it will fail if
you do not have libhdf5 installed. So at a high-level, it would be
nice if the tool shed had a libhdf5 definition and the dependencies
file had some mechanism for declaring libhdf5 should be installed
before a setup_virtualenv containing "tables" and its environment
configured properly so the pip install succeeds (maybe just
LD_LIBRARY_PATH needs to be set).

Indeed, same problem. I think we have this problem in every high-level
install methodm because <set_environment_for_install> is not allowed as
first action tag.

Can you think of any case where ENV vars can conflict with each other,
besides set_to, and assuming that we source every env.sh file by
default
for every specified <package>.

Cheers,
Bjoern

-John


On Tue, Nov 5, 2013 at 3:35 PM, Björn Grüning
<bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Greg,


Hello Bjoern,


On Nov 5, 2013, at 12:13 PM, Bjoern Gruening
<bjoern.gruening@gmail.com>
wrote:

Hi Greg,

I'm right now in implementing a setup_perl_environment and stumbled
about
a tricky problem (that is not only related to perl but also for
ruby, python
and R).

The Problem:
Lets assume a perl package (A) requires a xml parser written in
C/C++ (Z).
(Z) is a dependency that I can import but as far as I can see there
is no
way to call set_environment_for_install before
setup_perl_environment,
because setup_perl_environment defines an installation type.


The above is fairly difficult to understand - can you provide an
actual xml
recipe that provides some context?

Attached, please see a detailed explanation at the bottom.




I would like to discuss that issue to get a few ideas. I can think
about
these solutions:

- hackish solution:
I can call install_environment.add_env_shell_file_paths(
action_dict[
'env_shell_file_paths' ] ) inside of the setup_*_environment path
and remove
it from action type afterwards

Again, it's difficult to provide good feedback on the above approach
without
an example recipe.  However, your "hackish solution" term probably
means it
is not ideal.  ;)

:)


- import all env.sh variables from every (package) definition.
Regardless
if set_environment_for_install is set or not.


I don't think the above approach would be ideal.  It seems that it
could
fairly easily create conflicting environment variables within a
single
installation,
so the latest value for an environment variable may not be what is
expected.

What means conflicting ENV vars, I only can imagine multiple set_to
that
overwrite each other. append_to and prepend_to should be save or?



I must admit, I do not understand why set_environment_for_install is
actually needed. I think we can assume that if I specify a

    <package name="R_3_0_1" version="3.0.1">
        <repository name="package_r_3_0_1" owner="iuc"
prior_installation_required="True" />
    </package>

I want the ENV vars sourced.

Hmmm…so you are saying that you want the be able to define the above
<package> tag set inside of an <actions> tag set and have everything
work?


Oh no, I mean just have it as package like it is but source the
env.sh file
for every other <package> set automatically. So you do not need
<set_environment_for_install>.

In the attached example:
    <package name="expat" version="2.1.0">
        <repository changeset_revision="8fc96166cddd"
name="package_expat_2_1" owner="iuc"
prior_installation_required="True"
toolshed="http://testtoolshed.g2.bx.psu.edu" />
    </package>

Is not imported with <set_environment_for_install> so its actually
useless
(now). But the env.sh needs to be sourced during the
"setup_perl_environment" part.

I think this may cause problems because  I believe the
<set_environment_for_install> tag set restricts activity to only the
time
when a dependent
repository will be using the defined environment from the required
repository in order to compile one or more of it's dependencies.
Eliminating this restriction may cause problems after compilation.
ALthough
I cannot state this as a definite fact.

Furthermore, that can solve an other issue: Namely, the need of ENV
vars
from a package definition in the same file. Lets imagine package P
has
dependency D and you want to download compile both in one
tool_dependencies.xml file.
You can either do it in one <package> definition or you need to
split them
up in 2 tool_dependencies.xml files, rigth?
Maybe we can just assume a strict order in a tool_dependencies.xml
file,
where every ENV vars are sourced for the following one? Does that
make
sense?

It may make sense, but without an example it's diffiecult to answer
this for
sure.  Can you provide some xml recipes that use your different
proposals?


Sure, attached.
Its quite complicated.

- TPP needs libgd to compile C-Code (no problem).
- TPP needs some perl libs and perl -> "setup_perl_environment" (at
runtime)
   - no problem until one of these perl packages (here XML-Parser)
needs a C
library (expat)
  - I don't see how to source expat during "setup_perl_environment"
- TPP needs perl (at compile time) ... It would be more readable or
logical
to separate this recipe into two parts: TPP and Perl libraries

Something like that:
    <package name="trans_proteomic_pipeline_perl_libs"
version="4.6.3">
        .....
        set PERL5LIBS
   </package>
    <package name="trans_proteomic_pipeline" version="4.6.3">
        ....
        Here I need the PERL5LIBS
    </package>

- I don't see any way to get the PERL5LIBS from the perl libraries
into a
separate <package> section which tries to compile TPP.

Any other ideas?

Not yet, but possibly after your next response.  ;)

:) Here we go!
Thanks Greg!
Bjoern



Thanks,
Bjoern

Thanks!

Greg Von Kuster






___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/







___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/