tool_dependencies inside tool_dependencies
Hi, is there a general rule to handle dependencies inside of tool_dependencies.xml? Lets assume I write a matplotlib orphan tool_dependencies.xml file. matplotlib depends on numpy. Numpy has already a orphan definition. Is there a way to include numpy as dependency inside the matplotlib-definition, so that I did not need to fetch and compile numpy inside of matplotlib? I tried to specify it beforehand but that did not work. Thanks! Bjoern
Hello Björn, If numpy is not required for compiling matplotlib components (i.e., matplotlib components just use numpy after installation), then you should be able to make this work using a complex repository dependency for numpy in your tool_dependencies.xml definition for matplotlib. The discussion for doing this is at http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_reposit... By the way, I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 repository on the test tool shed includes the following contents. Is this the repository you are working with? Strangely, the repository dependency should be invalid because it should not be possible for a repository to define a dependency upon any revision of itself. You may have uncovered a way to do this using a tool dependency definition with a complex repository dependency. I'll look into this and make sure to provide a fix for the scenario you used. Instead of the above approach, your approach here should be to include only the tool_dependencies.cml definition file for installing only numpy version 1.7.1 in a repository named package_numpy_1_7_1 (use the full version in naming the repository). You should create a separate repository named package_matplotlib_1_2_1 that similarly contains a single tool_dependencies.xml file that (in addition to defining how to install and compile mtplotlib) defines a complex repository dependency on the package_numpy_1_7_1 repository as described in the wiki at the link above. This approach creates 2 separate orphan tool dependencies, the second of which (matplotlib) has a complex repository dependency on the first (numpy). When you install the package_matplotlib_1_2_1 repository and check the box for handling tool dependencies during the installation, it will install the package_numpy_1_7_1 repository and create a pointer to the numpy binary in the env.sh file within the package_matplotlib_1_2_1 repository environment. This enables matplotlib to locate the required version of numpy. I know this is a bit tricky, so please let me know if it still does not make sense. Thanks very much, Greg Von Kuster On Apr 15, 2013, at 3:29 PM, Björn Grüning wrote:
Hi,
is there a general rule to handle dependencies inside of tool_dependencies.xml?
Lets assume I write a matplotlib orphan tool_dependencies.xml file. matplotlib depends on numpy. Numpy has already a orphan definition.
Is there a way to include numpy as dependency inside the matplotlib-definition, so that I did not need to fetch and compile numpy inside of matplotlib?
I tried to specify it beforehand but that did not work.
Thanks! Bjoern
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Greg,
If numpy is not required for compiling matplotlib components (i.e., matplotlib components just use numpy after installation), then you should be able to make this work using a complex repository dependency for numpy in your tool_dependencies.xml definition for matplotlib. The discussion for doing this is at http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_reposit...
Thanks! But it is required at compile time.
By the way,
I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 repository on the test tool shed includes the following contents. Is this the repository you are working with? Strangely, the repository dependency should be invalid because it should not be possible for a repository to define a dependency upon any revision of itself. You may have uncovered a way to do this using a tool dependency definition with a complex repository dependency. I'll look into this and make sure to provide a fix for the scenario you used.
Oh, ok. in revision 3 of both packages you should see, what I was trying.
Instead of the above approach, your approach here should be to include only the tool_dependencies.cml definition file for installing only numpy version 1.7.1 in a r epository named package_numpy_1_7_1 (use the full version in naming the repository). You should create a separate repository named package_matplotlib_1_2_1 that similarly contains a single tool_dependencies.xml file that (in addition to defining how to install and compile mtplotlib) defines a complex repository dependency on the package_numpy_1_7_1 repository as described in the wiki at the link above.
This approach creates 2 separate orphan tool dependencies, the second of which (matplotlib) has a complex repository dependency on the first (numpy). When you install the package_matplotlib_1_2_1 repository and check the box for handling tool dependencies during the installation, it will install the package_numpy_1_7_1 repository and create a pointer to the numpy binary in the env.sh file within the package_matplotlib_1_2_1 repository environment. This enables matplotlib to locate the required version of numpy.
I know this is a bit tricky, so please let me know if it still does not make sense.
Lets see if I got it right. repository_dependencies.xml will be pared first. The defined repo's and the included and populated system variables will be available in tool_dependencies.xml, which is parsed afterwards. Is that correct? I will try that. Thanks! Bjoern
Thanks very much,
Greg Von Kuster
On Apr 15, 2013, at 3:29 PM, Björn Grüning wrote:
Hi,
is there a general rule to handle dependencies inside of tool_dependencies.xml?
Lets assume I write a matplotlib orphan tool_dependencies.xml file. matplotlib depends on numpy. Numpy has already a orphan definition.
Is there a way to include numpy as dependency inside the matplotlib-definition, so that I did not need to fetch and compile numpy inside of matplotlib?
I tried to specify it beforehand but that did not work.
Thanks! Bjoern
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Björn, On Apr 15, 2013, at 6:31 PM, Björn Grüning wrote:
Hi Greg,
If numpy is not required for compiling matplotlib components (i.e., matplotlib components just use numpy after installation), then you should be able to make this work using a complex repository dependency for numpy in your tool_dependencies.xml definition for matplotlib. The discussion for doing this is at http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_reposit...
Thanks! But it is required at compile time.
Ok, we may need to do a bit of work to support this requirement, but I'm not quite sure. What I've described to you should still be your approach, but we'll need to ensure that the package_numpy_1_7_1 repository is installed before the package_matplotlib_1_2_1 is installed. Guaranteeing this is not currently possible, but this is a feature I am hoping to have available this week. This is a feature that Ira Cooke has needed for his repositories. When the feature is available, it will support an attribute named "prior_installation_required" in the <repository> tag, so this tag will look something like: <repository toolshed=www" name="xxx" owner="yyy" changeset_revision="zzz" prior_installation_required="True" /> What this will do is skip installation of the repository that contains this dependency until the repository that is associated with the "prior_installation_required" attribute is installed (unless that repository is not in the current list of repositories being installed). What I think still needs to be worked out is how to ensure that the tool_dependencies.xml definition that installs the matplotlib package will find the previously installed numpy binary during compilation of matplotlib. Currently, the numpy binary will only be available to the installed and compiled matplotlib binary. I'll create a Trello card for this and let you know an estimate of when it will be available.
By the way,
I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 repository on the test tool shed includes the following contents. Is this the repository you are working with? Strangely, the repository dependency should be invalid because it should not be possible for a repository to define a dependency upon any revision of itself. You may have uncovered a way to do this using a tool dependency definition with a complex repository dependency. I'll look into this and make sure to provide a fix for the scenario you used.
Oh, ok. in revision 3 of both packages you should see, what I was trying.
Ok, revision 3 looks good as long as it correctly installs and compiles numpy.
Instead of the above approach, your approach here should be to include only the tool_dependencies.cml definition file for installing only numpy version 1.7.1 in a r epository named package_numpy_1_7_1 (use the full version in naming the repository). You should create a separate repository named package_matplotlib_1_2_1 that similarly contains a single tool_dependencies.xml file that (in addition to defining how to install and compile mtplotlib) defines a complex repository dependency on the package_numpy_1_7_1 repository as described in the wiki at the link above.
This approach creates 2 separate orphan tool dependencies, the second of which (matplotlib) has a complex repository dependency on the first (numpy). When you install the package_matplotlib_1_2_1 repository and check the box for handling tool dependencies during the installation, it will install the package_numpy_1_7_1 repository and create a pointer to the numpy binary in the env.sh file within the package_matplotlib_1_2_1 repository environment. This enables matplotlib to locate the required version of numpy.
I know this is a bit tricky, so please let me know if it still does not make sense.
Lets see if I got it right.
repository_dependencies.xml will be pared first. The defined repo's and the included and populated system variables will be available in tool_dependencies.xml, which is parsed afterwards. Is that correct?
I'm not quite sure I understand your statements above, but I've looked at revision 3 of your package_matplotlib_1_2_1 repository and the tool_dependencies.xml definition looks good (with the exception of the currently unsupported "prior_installation_required" attribute), so I think you've successfully deciphered my documentation. I'll make sure to keep you informed as I make progress on the missing pieces that will support what you need this week.
I will try that. Thanks! Bjoern
Thanks very much,
Greg Von Kuster
On Apr 15, 2013, at 3:29 PM, Björn Grüning wrote:
Hi,
is there a general rule to handle dependencies inside of tool_dependencies.xml?
Lets assume I write a matplotlib orphan tool_dependencies.xml file. matplotlib depends on numpy. Numpy has already a orphan definition.
Is there a way to include numpy as dependency inside the matplotlib-definition, so that I did not need to fetch and compile numpy inside of matplotlib?
I tried to specify it beforehand but that did not work.
Thanks! Bjoern
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Greg.
If numpy is not required for compiling matplotlib components (i.e., matplotlib components just use numpy after installation), then you should be able to make this work using a complex repository dependency for numpy in your tool_dependencies.xml definition for matplotlib. The discussion for doing this is at http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_reposit...
Thanks! But it is required at compile time.
Ok, we may need to do a bit of work to support this requirement, but I'm not quite sure. What I've described to you should still be your approach, but we'll need to ensure that the package_numpy_1_7_1 repository is installed before the package_matplotlib_1_2_1 is installed. Guaranteeing this is not currently possible, but this is a feature am hoping to have available this week. This is a feature that Ira Cooke has needed for his repositories. When the feature is available, it will support an attribute named "prior_installation_required" in the <repository> tag, so this tag will look something like: <repository toolshed=www" name="xxx" owner="yyy" changeset_revision="zzz" prior_installation_required="True" />
Such a tag would also ensure that we do not end up in a dependency-loop right?
What this will do is skip installation of the repository that contains this dependency until the repository that is associated with the "prior_installation_required" attribute is installed (unless that repository is not in the current list of repositories being installed).
What I think still needs to be worked out is how to ensure that the tool_dependencies.xml definition that installs the matplotlib package will find the previously installed numpy binary during compilation of matplotlib. Currently, the numpy binary will only be available to the installed and compiled matplotlib binary. I'll create a Trello card for this and let you know an estimate of when it will be available.
I already created a trello card after talking with InitHello in IRC. https://trello.com/c/QTeSmNSs My idea would be to populate all env.sh scripts associated from all <repository toolshed=www" > tags during the execution of <action type="shell_command"></action> commands. The tool author can add the ./lib/ folder to LD_LIBRARY_PATH and can use it in any compile-time depending program, as long as <repository toolshed=www" name="dep_with_populated_LD_LIBRARY_PATH"> is included.
By the way,
I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 repository on the test tool shed includes the following contents. Is this the repository you are working with? Strangely, the repository dependency should be invalid because it should not be possible for a repository to define a dependency upon any revision of itself. You may have uncovered a way to do this using a tool dependency definition with a complex repository dependency. I'll look into this and make sure to provide a fix for the scenario you used.
Oh, ok. in revision 3 of both packages you should see, what I was trying.
Ok, revision 3 looks good as long as it correctly installs and compiles numpy.
Instead of the above approach, your approach here should be to include only the tool_dependencies.cml definition file for installing only numpy version 1.7.1 in a r epository named package_numpy_1_7_1 (use the full version in naming the repository). You should create a separate repository named package_matplotlib_1_2_1 that similarly contains a single tool_dependencies.xml file that (in addition to defining how to install and compile mtplotlib) defines a complex repository dependency on the package_numpy_1_7_1 repository as described in the wiki at the link above.
This approach creates 2 separate orphan tool dependencies, the second of which (matplotlib) has a complex repository dependency on the first (numpy). When you install the package_matplotlib_1_2_1 repository and check the box for handling tool dependencies during the installation, it will install the package_numpy_1_7_1 repository and create a pointer to the numpy binary in the env.sh file within the package_matplotlib_1_2_1 repository environment. This enables matplotlib to locate the required version of numpy.
I know this is a bit tricky, so please let me know if it still does not make sense.
Lets see if I got it right.
repository_dependencies.xml will be pared first. The defined repo's and the included and populated system variables will be available in tool_dependencies.xml, which is parsed afterwards. Is that correct?
I'm not quite sure I understand your statements above, but I've looked at revision 3 of your package_matplotlib_1_2_1 repository and the tool_dependencies.xml definition looks good (with the exception of the currently unsupported "prior_installation_required" attribute), so I think you've successfully deciphered my documentation.
I'll make sure to keep you informed as I make progress on the missing pieces that will support what you need this week.
I will try that. Thanks! Bjoern
Thanks very much,
Greg Von Kuster
On Apr 15, 2013, at 3:29 PM, Björn Grüning wrote:
Hi,
is there a general rule to handle dependencies inside of tool_dependencies.xml?
Lets assume I write a matplotlib orphan tool_dependencies.xml file. matplotlib depends on numpy. Numpy has already a orphan definition.
Is there a way to include numpy as dependency inside the matplotlib-definition, so that I did not need to fetch and compile numpy inside of matplotlib?
I tried to specify it beforehand but that did not work.
Thanks! Bjoern
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Stepping back a little, is the right way to address Python dependencies? I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that. At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go. Brad, Enis, and I came up with this idea to use virtualenv to automatically create environments for Galaxy tools in CloudBioLinux based on a requirements file and then activating that environment in the tool's env.sh file. https://github.com/chapmanb/cloudbiolinux/commit/0e4489275bba2e8f77e1218e3cc... It would be easier for tool authors if they could just say here is a requirements.txt file and have the Python environment automatically created or here is a Gemfile and use rvm+bundler to automatically configure a Ruby environment. Thanks, -John On Tue, Apr 16, 2013 at 4:50 AM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Greg.
If numpy is not required for compiling matplotlib components (i.e., matplotlib components just use numpy after installation), then you should be able to make this work using a complex repository dependency for numpy in your tool_dependencies.xml definition for matplotlib. The discussion for doing this is at http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_reposit...
Thanks! But it is required at compile time.
Ok, we may need to do a bit of work to support this requirement, but I'm not quite sure. What I've described to you should still be your approach, but we'll need to ensure that the package_numpy_1_7_1 repository is installed before the package_matplotlib_1_2_1 is installed. Guaranteeing this is not currently possible, but this is a feature am hoping to have available this week. This is a feature that Ira Cooke has needed for his repositories. When the feature is available, it will support an attribute named "prior_installation_required" in the <repository> tag, so this tag will look something like: <repository toolshed=www" name="xxx" owner="yyy" changeset_revision="zzz" prior_installation_required="True" />
Such a tag would also ensure that we do not end up in a dependency-loop right?
What this will do is skip installation of the repository that contains this dependency until the repository that is associated with the "prior_installation_required" attribute is installed (unless that repository is not in the current list of repositories being installed).
What I think still needs to be worked out is how to ensure that the tool_dependencies.xml definition that installs the matplotlib package will find the previously installed numpy binary during compilation of matplotlib. Currently, the numpy binary will only be available to the installed and compiled matplotlib binary. I'll create a Trello card for this and let you know an estimate of when it will be available.
I already created a trello card after talking with InitHello in IRC. https://trello.com/c/QTeSmNSs
My idea would be to populate all env.sh scripts associated from all <repository toolshed=www" > tags during the execution of <action type="shell_command"></action> commands.
The tool author can add the ./lib/ folder to LD_LIBRARY_PATH and can use it in any compile-time depending program, as long as <repository toolshed=www" name="dep_with_populated_LD_LIBRARY_PATH"> is included.
By the way,
I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 repository on the test tool shed includes the following contents. Is this the repository you are working with? Strangely, the repository dependency should be invalid because it should not be possible for a repository to define a dependency upon any revision of itself. You may have uncovered a way to do this using a tool dependency definition with a complex repository dependency. I'll look into this and make sure to provide a fix for the scenario you used.
Oh, ok. in revision 3 of both packages you should see, what I was trying.
Ok, revision 3 looks good as long as it correctly installs and compiles numpy.
Instead of the above approach, your approach here should be to include only the tool_dependencies.cml definition file for installing only numpy version 1.7.1 in a r epository named package_numpy_1_7_1 (use the full version in naming the repository). You should create a separate repository named package_matplotlib_1_2_1 that similarly contains a single tool_dependencies.xml file that (in addition to defining how to install and compile mtplotlib) defines a complex repository dependency on the package_numpy_1_7_1 repository as described in the wiki at the link above.
This approach creates 2 separate orphan tool dependencies, the second of which (matplotlib) has a complex repository dependency on the first (numpy). When you install the package_matplotlib_1_2_1 repository and check the box for handling tool dependencies during the installation, it will install the package_numpy_1_7_1 repository and create a pointer to the numpy binary in the env.sh file within the package_matplotlib_1_2_1 repository environment. This enables matplotlib to locate the required version of numpy.
I know this is a bit tricky, so please let me know if it still does not make sense.
Lets see if I got it right.
repository_dependencies.xml will be pared first. The defined repo's and the included and populated system variables will be available in tool_dependencies.xml, which is parsed afterwards. Is that correct?
I'm not quite sure I understand your statements above, but I've looked at revision 3 of your package_matplotlib_1_2_1 repository and the tool_dependencies.xml definition looks good (with the exception of the currently unsupported "prior_installation_required" attribute), so I think you've successfully deciphered my documentation.
I'll make sure to keep you informed as I make progress on the missing pieces that will support what you need this week.
I will try that. Thanks! Bjoern
Thanks very much,
Greg Von Kuster
On Apr 15, 2013, at 3:29 PM, Björn Grüning wrote:
Hi,
is there a general rule to handle dependencies inside of tool_dependencies.xml?
Lets assume I write a matplotlib orphan tool_dependencies.xml file. matplotlib depends on numpy. Numpy has already a orphan definition.
Is there a way to include numpy as dependency inside the matplotlib-definition, so that I did not need to fetch and compile numpy inside of matplotlib?
I tried to specify it beforehand but that did not work.
Thanks! Bjoern
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote:
Stepping back a little, is the right way to address Python dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution. I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored): <requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements> Adding a version key would be sensible, handling min/max etc as per Python packaging norms. Peter
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread: As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation. Thanks for your consideration, -John On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote:
Stepping back a little, is the right way to address Python dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution.
I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored):
<requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements>
Adding a version key would be sensible, handling min/max etc as per Python packaging norms.
Peter
Hi John, A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for. If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much. --nate On May 13, 2013, at 6:49 PM, John Chilton wrote:
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread:
As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b...
I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation.
Thanks for your consideration, -John
On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote:
Stepping back a little, is the right way to address Python dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution.
I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored):
<requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements>
Adding a version key would be sensible, handling min/max etc as per Python packaging norms.
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hey Nate, On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi John,
A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for.
If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much.
I don't know what to make of your response. It seems like a no, but the word no doesn't appear anywhere. I don't know the particulars of rpy, but numpy installs fine via this method and I see no problem with each application having its own copy of numpy. I think relying on OS managed python packages for instance is something of a bad practice, when developing and distributing software I use virtualenvs for everything. I think that stand-alone python defined packages in the tool shed are directly analogous to OS managed packages. I also disagree we have not gained much. Setting up these repositories is a onerous, brittle process. This patch provides some high-level functionality for creating virtualenv's which negates the need for creating separate repositories per package. -John
--nate
On May 13, 2013, at 6:49 PM, John Chilton wrote:
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread:
As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b...
I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation.
Thanks for your consideration, -John
On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote:
Stepping back a little, is the right way to address Python dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution.
I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored):
<requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements>
Adding a version key would be sensible, handling min/max etc as per Python packaging norms.
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On May 14, 2013, at 10:58 AM, John Chilton wrote:
Hey Nate,
On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi John,
A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for.
If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much.
I don't know what to make of your response. It seems like a no, but the word no doesn't appear anywhere.
Sorry about being wishy-washy. Unless anyone has any objections or can foresee other problems, I would say yes to this. But I believe it should not break the concept of common-dependency-only repositories. I'm pretty sure that as long as the process of creating a venv also adds the venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem should be automatically dealt with.
I don't know the particulars of rpy, but numpy installs fine via this method and I see no problem with each application having its own copy of numpy. I think relying on OS managed python packages for instance is something of a bad practice, when developing and distributing software I use virtualenvs for everything. I think that stand-alone python defined packages in the tool shed are directly analogous to OS managed packages.
Completely agree that we want to avoid OS-managed python packages. I had, in the past, considered that for something like numpy, we ought to make it easy for an administrator to allow their own version of numpy to be used, since numpy can be linked against a number of optimized libraries for significant performance gains, and this generally won't happen for versions installed from the toolshed unless the system already has stuff like atlas-dev installed. But I think we still allow admins that possibility with reasonable ease since dependency management in Galaxy is not a requirement. What we do want to avoid is the situation where someone clones a new copy of Galaxy, wants to install 10 different tools that all depend on numpy, and has to wait an hour while 10 versions of numpy compile. Add that in with other tools that will have a similar process (installing R + packages + rpy) plus the hope that down the line you'll be able to automatically maintain separate builds for remote resources that are not the same (i.e. multiple clusters with differing operating systems) and this hopefully highlights why I think reducing duplication where possible will be important.
I also disagree we have not gained much. Setting up these repositories is a onerous, brittle process. This patch provides some high-level functionality for creating virtualenv's which negates the need for creating separate repositories per package.
This is a good point. I probably also sold short the benefit of being able to install with pip, since this does indeed remove a similarly brittle and tedious step of downloading and installing modules. --nate
-John
--nate
On May 13, 2013, at 6:49 PM, John Chilton wrote:
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread:
As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b...
I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation.
Thanks for your consideration, -John
On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote:
Stepping back a little, is the right way to address Python dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution.
I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored):
<requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements>
Adding a version key would be sensible, handling min/max etc as per Python packaging norms.
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Greg created the following card, and I'm working on a few changes to your commit: https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definiti... Thanks, --nate On May 14, 2013, at 1:45 PM, Nate Coraor wrote:
On May 14, 2013, at 10:58 AM, John Chilton wrote:
Hey Nate,
On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi John,
A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for.
If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much.
I don't know what to make of your response. It seems like a no, but the word no doesn't appear anywhere.
Sorry about being wishy-washy. Unless anyone has any objections or can foresee other problems, I would say yes to this. But I believe it should not break the concept of common-dependency-only repositories.
I'm pretty sure that as long as the process of creating a venv also adds the venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem should be automatically dealt with.
I don't know the particulars of rpy, but numpy installs fine via this method and I see no problem with each application having its own copy of numpy. I think relying on OS managed python packages for instance is something of a bad practice, when developing and distributing software I use virtualenvs for everything. I think that stand-alone python defined packages in the tool shed are directly analogous to OS managed packages.
Completely agree that we want to avoid OS-managed python packages. I had, in the past, considered that for something like numpy, we ought to make it easy for an administrator to allow their own version of numpy to be used, since numpy can be linked against a number of optimized libraries for significant performance gains, and this generally won't happen for versions installed from the toolshed unless the system already has stuff like atlas-dev installed. But I think we still allow admins that possibility with reasonable ease since dependency management in Galaxy is not a requirement.
What we do want to avoid is the situation where someone clones a new copy of Galaxy, wants to install 10 different tools that all depend on numpy, and has to wait an hour while 10 versions of numpy compile. Add that in with other tools that will have a similar process (installing R + packages + rpy) plus the hope that down the line you'll be able to automatically maintain separate builds for remote resources that are not the same (i.e. multiple clusters with differing operating systems) and this hopefully highlights why I think reducing duplication where possible will be important.
I also disagree we have not gained much. Setting up these repositories is a onerous, brittle process. This patch provides some high-level functionality for creating virtualenv's which negates the need for creating separate repositories per package.
This is a good point. I probably also sold short the benefit of being able to install with pip, since this does indeed remove a similarly brittle and tedious step of downloading and installing modules.
--nate
-John
--nate
On May 13, 2013, at 6:49 PM, John Chilton wrote:
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread:
As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b...
I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation.
Thanks for your consideration, -John
On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote:
Stepping back a little, is the right way to address Python dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution.
I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored):
<requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements>
Adding a version key would be sensible, handling min/max etc as per Python packaging norms.
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hey All, There was a long conversation about this topic in IRC yesterday (among people who don't actually use the tool shed all that frequently), I have posted it to the new unofficial Galaxy Google+ group if anyone would like to read and chime in. https://plus.google.com/111860405027053012444/posts/TkCFwA2jkDN -John On Tue, May 14, 2013 at 3:59 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Greg created the following card, and I'm working on a few changes to your commit:
https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definiti...
Thanks, --nate
On May 14, 2013, at 1:45 PM, Nate Coraor wrote:
On May 14, 2013, at 10:58 AM, John Chilton wrote:
Hey Nate,
On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi John,
A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for.
If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much.
I don't know what to make of your response. It seems like a no, but the word no doesn't appear anywhere.
Sorry about being wishy-washy. Unless anyone has any objections or can foresee other problems, I would say yes to this. But I believe it should not break the concept of common-dependency-only repositories.
I'm pretty sure that as long as the process of creating a venv also adds the venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem should be automatically dealt with.
I don't know the particulars of rpy, but numpy installs fine via this method and I see no problem with each application having its own copy of numpy. I think relying on OS managed python packages for instance is something of a bad practice, when developing and distributing software I use virtualenvs for everything. I think that stand-alone python defined packages in the tool shed are directly analogous to OS managed packages.
Completely agree that we want to avoid OS-managed python packages. I had, in the past, considered that for something like numpy, we ought to make it easy for an administrator to allow their own version of numpy to be used, since numpy can be linked against a number of optimized libraries for significant performance gains, and this generally won't happen for versions installed from the toolshed unless the system already has stuff like atlas-dev installed. But I think we still allow admins that possibility with reasonable ease since dependency management in Galaxy is not a requirement.
What we do want to avoid is the situation where someone clones a new copy of Galaxy, wants to install 10 different tools that all depend on numpy, and has to wait an hour while 10 versions of numpy compile. Add that in with other tools that will have a similar process (installing R + packages + rpy) plus the hope that down the line you'll be able to automatically maintain separate builds for remote resources that are not the same (i.e. multiple clusters with differing operating systems) and this hopefully highlights why I think reducing duplication where possible will be important.
I also disagree we have not gained much. Setting up these repositories is a onerous, brittle process. This patch provides some high-level functionality for creating virtualenv's which negates the need for creating separate repositories per package.
This is a good point. I probably also sold short the benefit of being able to install with pip, since this does indeed remove a similarly brittle and tedious step of downloading and installing modules.
--nate
-John
--nate
On May 13, 2013, at 6:49 PM, John Chilton wrote:
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread:
As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b...
I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation.
Thanks for your consideration, -John
On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote: > Stepping back a little, is the right way to address Python > dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
> I was a big advocate for inter-repository dependencies, > but I think taking it to the level of individual python packages might > be going too far - my thought was they were needed for big 100Mb > programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
> At the Java jar/Python library/Ruby gem > level I think using some of the platform specific packaging stuff to > creating isolated environments for each program might be a better way > to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution.
I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored):
<requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements>
Adding a version key would be sensible, handling min/max etc as per Python packaging norms.
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
John, Could you create a pull request with your changes from the branch in github? I'll accept them and then commit my additions and changes. Today is the "freeze" so I'd like to get this in to the next release. Thanks, ---nate On May 17, 2013, at 11:21 AM, John Chilton wrote:
Hey All,
There was a long conversation about this topic in IRC yesterday (among people who don't actually use the tool shed all that frequently), I have posted it to the new unofficial Galaxy Google+ group if anyone would like to read and chime in.
https://plus.google.com/111860405027053012444/posts/TkCFwA2jkDN
-John
On Tue, May 14, 2013 at 3:59 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Greg created the following card, and I'm working on a few changes to your commit:
https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definiti...
Thanks, --nate
On May 14, 2013, at 1:45 PM, Nate Coraor wrote:
On May 14, 2013, at 10:58 AM, John Chilton wrote:
Hey Nate,
On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi John,
A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for.
If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much.
I don't know what to make of your response. It seems like a no, but the word no doesn't appear anywhere.
Sorry about being wishy-washy. Unless anyone has any objections or can foresee other problems, I would say yes to this. But I believe it should not break the concept of common-dependency-only repositories.
I'm pretty sure that as long as the process of creating a venv also adds the venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem should be automatically dealt with.
I don't know the particulars of rpy, but numpy installs fine via this method and I see no problem with each application having its own copy of numpy. I think relying on OS managed python packages for instance is something of a bad practice, when developing and distributing software I use virtualenvs for everything. I think that stand-alone python defined packages in the tool shed are directly analogous to OS managed packages.
Completely agree that we want to avoid OS-managed python packages. I had, in the past, considered that for something like numpy, we ought to make it easy for an administrator to allow their own version of numpy to be used, since numpy can be linked against a number of optimized libraries for significant performance gains, and this generally won't happen for versions installed from the toolshed unless the system already has stuff like atlas-dev installed. But I think we still allow admins that possibility with reasonable ease since dependency management in Galaxy is not a requirement.
What we do want to avoid is the situation where someone clones a new copy of Galaxy, wants to install 10 different tools that all depend on numpy, and has to wait an hour while 10 versions of numpy compile. Add that in with other tools that will have a similar process (installing R + packages + rpy) plus the hope that down the line you'll be able to automatically maintain separate builds for remote resources that are not the same (i.e. multiple clusters with differing operating systems) and this hopefully highlights why I think reducing duplication where possible will be important.
I also disagree we have not gained much. Setting up these repositories is a onerous, brittle process. This patch provides some high-level functionality for creating virtualenv's which negates the need for creating separate repositories per package.
This is a good point. I probably also sold short the benefit of being able to install with pip, since this does indeed remove a similarly brittle and tedious step of downloading and installing modules.
--nate
-John
--nate
On May 13, 2013, at 6:49 PM, John Chilton wrote:
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread:
As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b...
I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation.
Thanks for your consideration, -John
On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote: > On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote: >> Stepping back a little, is the right way to address Python >> dependencies? > > Looks like I missed this thread, hence: > http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html > >> I was a big advocate for inter-repository dependencies, >> but I think taking it to the level of individual python packages might >> be going too far - my thought was they were needed for big 100Mb >> programs and stuff like that. > > It should work but it is a lot of boilerplate for something which > should be more automated. > >> At the Java jar/Python library/Ruby gem >> level I think using some of the platform specific packaging stuff to >> creating isolated environments for each program might be a better way >> to go. > > I agree, the best way forward isn't obvious here, and it may make > sense to have tailored solutions for Python, Perl, Java, R, Ruby, > etc packages rather than the current Tool Shed package solution. > > I've like to be able to just continue to write this kind of thing in my > tool XML files and have it actually taken care of (rather than ignored): > > <requirements> > <requirement type="python-module">numpy</requirement> > <requirement type="python-module">Bio</requirement> > </requirements> > > Adding a version key would be sensible, handling min/max etc > as per Python packaging norms. > > Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Done. On the topic of the freeze, if Dannon's change requiring all metadata to be "set externally" is going to be included I would suggest someone looking at build_command_line in runners/__init__.py. https://bitbucket.org/galaxy/galaxy-central/src/237209336f0337ea9f47df39548d... I think there is a bug that when metadata is set externally it masks the return code of the tool (likewise if from_work_dir is used). I had just created a trello (https://trello.com/c/JfB2w1Br) card with an idea for how to address it, but I think the problem is going to be more severe when everyone is setting metadata externally. I have only observed this for the from_work_dir case, but based on code inspection I don't know how setting metadata externally would be different. Also, that same change broke the LWR so it would be very appreciated if pr 166 could be accepted before release is tagged :) or at least the first two changesets. Thanks all, -John On Mon, May 20, 2013 at 8:17 AM, Nate Coraor <nate@bx.psu.edu> wrote:
John,
Could you create a pull request with your changes from the branch in github? I'll accept them and then commit my additions and changes. Today is the "freeze" so I'd like to get this in to the next release.
Thanks, ---nate
On May 17, 2013, at 11:21 AM, John Chilton wrote:
Hey All,
There was a long conversation about this topic in IRC yesterday (among people who don't actually use the tool shed all that frequently), I have posted it to the new unofficial Galaxy Google+ group if anyone would like to read and chime in.
https://plus.google.com/111860405027053012444/posts/TkCFwA2jkDN
-John
On Tue, May 14, 2013 at 3:59 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Greg created the following card, and I'm working on a few changes to your commit:
https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definiti...
Thanks, --nate
On May 14, 2013, at 1:45 PM, Nate Coraor wrote:
On May 14, 2013, at 10:58 AM, John Chilton wrote:
Hey Nate,
On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi John,
A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for.
If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much.
I don't know what to make of your response. It seems like a no, but the word no doesn't appear anywhere.
Sorry about being wishy-washy. Unless anyone has any objections or can foresee other problems, I would say yes to this. But I believe it should not break the concept of common-dependency-only repositories.
I'm pretty sure that as long as the process of creating a venv also adds the venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem should be automatically dealt with.
I don't know the particulars of rpy, but numpy installs fine via this method and I see no problem with each application having its own copy of numpy. I think relying on OS managed python packages for instance is something of a bad practice, when developing and distributing software I use virtualenvs for everything. I think that stand-alone python defined packages in the tool shed are directly analogous to OS managed packages.
Completely agree that we want to avoid OS-managed python packages. I had, in the past, considered that for something like numpy, we ought to make it easy for an administrator to allow their own version of numpy to be used, since numpy can be linked against a number of optimized libraries for significant performance gains, and this generally won't happen for versions installed from the toolshed unless the system already has stuff like atlas-dev installed. But I think we still allow admins that possibility with reasonable ease since dependency management in Galaxy is not a requirement.
What we do want to avoid is the situation where someone clones a new copy of Galaxy, wants to install 10 different tools that all depend on numpy, and has to wait an hour while 10 versions of numpy compile. Add that in with other tools that will have a similar process (installing R + packages + rpy) plus the hope that down the line you'll be able to automatically maintain separate builds for remote resources that are not the same (i.e. multiple clusters with differing operating systems) and this hopefully highlights why I think reducing duplication where possible will be important.
I also disagree we have not gained much. Setting up these repositories is a onerous, brittle process. This patch provides some high-level functionality for creating virtualenv's which negates the need for creating separate repositories per package.
This is a good point. I probably also sold short the benefit of being able to install with pip, since this does indeed remove a similarly brittle and tedious step of downloading and installing modules.
--nate
-John
--nate
On May 13, 2013, at 6:49 PM, John Chilton wrote:
> The proliferation of individual python package install definitions has > continued and it has spread to some MSI managed tools. I worry about > the tedium I will have to endure in the future if that becomes an > established best practice :) so I have implemented the python version > of what I had described in this thread: > > As patch: > https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... > Pretty version: > https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... > > I understand that there are going to be differing opinions as to > whether this is the best way forward but I thought I would give my > position a better chance of succeeding by providing an implementation. > > Thanks for your consideration, > -John > > > On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote: >> On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote: >>> Stepping back a little, is the right way to address Python >>> dependencies? >> >> Looks like I missed this thread, hence: >> http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html >> >>> I was a big advocate for inter-repository dependencies, >>> but I think taking it to the level of individual python packages might >>> be going too far - my thought was they were needed for big 100Mb >>> programs and stuff like that. >> >> It should work but it is a lot of boilerplate for something which >> should be more automated. >> >>> At the Java jar/Python library/Ruby gem >>> level I think using some of the platform specific packaging stuff to >>> creating isolated environments for each program might be a better way >>> to go. >> >> I agree, the best way forward isn't obvious here, and it may make >> sense to have tailored solutions for Python, Perl, Java, R, Ruby, >> etc packages rather than the current Tool Shed package solution. >> >> I've like to be able to just continue to write this kind of thing in my >> tool XML files and have it actually taken care of (rather than ignored): >> >> <requirements> >> <requirement type="python-module">numpy</requirement> >> <requirement type="python-module">Bio</requirement> >> </requirements> >> >> Adding a version key would be sensible, handling min/max etc >> as per Python packaging norms. >> >> Peter > ___________________________________________________________ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > http://lists.bx.psu.edu/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hello Nate,
On May 14, 2013, at 10:58 AM, John Chilton wrote:
Hey Nate,
On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi John,
A few of us in the lab here at Penn State actually discussed automatic creation of virtualenvs for dependency installations a couple weeks ago. This was in the context of Bjoern's request for supporting compile-time dependencies. I think it's a great idea, but there's a limitation that we'd need to account for.
If you're going to have frequently used and expensive to build libraries (e.g. numpy, R + rpy) in dependency-only repositories and then have your tool(s) depend on those repositories, the activate method won't work. virtualenvs cannot depend on other virtualenvs or be active at the same time as other virtualenvs. We could work around it by setting PYTHONPATH in the dependencies' env.sh like we do now. But then, other than making installation a bit easier (e.g. by allowing the use of pip), we have not gained much.
I don't know what to make of your response. It seems like a no, but the word no doesn't appear anywhere.
Sorry about being wishy-washy. Unless anyone has any objections or can foresee other problems, I would say yes to this. But I believe it should not break the concept of common-dependency-only repositories.
I'm pretty sure that as long as the process of creating a venv also adds the venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem should be automatically dealt with.
I don't know the particulars of rpy, but numpy installs fine via this method and I see no problem with each application having its own copy of numpy. I think relying on OS managed python packages for instance is something of a bad practice, when developing and distributing software I use virtualenvs for everything. I think that stand-alone python defined packages in the tool shed are directly analogous to OS managed packages.
Completely agree that we want to avoid OS-managed python packages. I had, in the past, considered that for something like numpy, we ought to make it easy for an administrator to allow their own version of numpy to be used, since numpy can be linked against a number of optimized libraries for significant performance gains, and this generally won't happen for versions installed from the toolshed unless the system already has stuff like atlas-dev installed. But I think we still allow admins that possibility with reasonable ease since dependency management in Galaxy is not a requirement.
The repository in the testtoolshed is now able to compile numpy against atlas and lapack. It is a little bit of work but we can do such things now. (It still did not deactivate cpu-scaling during compilation, but I hope that has not a big impact on performacne)
What we do want to avoid is the situation where someone clones a new copy of Galaxy, wants to install 10 different tools that all depend on numpy, and has to wait an hour while 10 versions of numpy compile. Add that in with other tools that will have a similar process (installing R + packages + rpy) plus the hope that down the line you'll be able to automatically maintain separate builds for remote resources that are not the same (i.e. multiple clusters with differing operating systems) and this hopefully highlights why I think reducing duplication where possible will be important.
I also disagree we have not gained much. Setting up these repositories is a onerous, brittle process. This patch provides some high-level functionality for creating virtualenv's which negates the need for creating separate repositories per package.
This is a good point. I probably also sold short the benefit of being able to install with pip, since this does indeed remove a similarly brittle and tedious step of downloading and installing modules.
--nate
-John
--nate
On May 13, 2013, at 6:49 PM, John Chilton wrote:
The proliferation of individual python package install definitions has continued and it has spread to some MSI managed tools. I worry about the tedium I will have to endure in the future if that becomes an established best practice :) so I have implemented the python version of what I had described in this thread:
As patch: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b... Pretty version: https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b...
I understand that there are going to be differing opinions as to whether this is the best way forward but I thought I would give my position a better chance of succeeding by providing an implementation.
Thanks for your consideration, -John
On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chilton@msi.umn.edu> wrote:
Stepping back a little, is the right way to address Python dependencies?
Looks like I missed this thread, hence: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
I was a big advocate for inter-repository dependencies, but I think taking it to the level of individual python packages might be going too far - my thought was they were needed for big 100Mb programs and stuff like that.
It should work but it is a lot of boilerplate for something which should be more automated.
At the Java jar/Python library/Ruby gem level I think using some of the platform specific packaging stuff to creating isolated environments for each program might be a better way to go.
I agree, the best way forward isn't obvious here, and it may make sense to have tailored solutions for Python, Perl, Java, R, Ruby, etc packages rather than the current Tool Shed package solution.
I've like to be able to just continue to write this kind of thing in my tool XML files and have it actually taken care of (rather than ignored):
<requirements> <requirement type="python-module">numpy</requirement> <requirement type="python-module">Bio</requirement> </requirements>
Adding a version key would be sensible, handling min/max etc as per Python packaging norms.
Peter
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (6)
-
Björn Grüning
-
Greg Von Kuster
-
John Chilton
-
John Chilton
-
Nate Coraor
-
Peter Cock