Repository tool dependencies and virtualenv install

Hi, I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62. [1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6 I'm installing my package using the virtualenv option. As you can see below I marked biopython as prior_installation_required="True". However, I can see that virtualenv is installing a copy of biopython in 'venv' under my tool install(see below). This doesn't sound right. I was expecting that by making my repository dependent of biopython's one that source would be used in my tool. My plan is to create any non-existing "Tool dependency definition" repository for every package my tool requires. This way you can always count the same version of any of these packages is being used. I thought that was the point, right? Full ool_dependencies.xml: <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency> Contents of site-packages in my tool install dir: $ ls -1 shed_tools_dependencies.central/ngs-tools/0.1.6/cjav/package_ngs_tools_0_1_6/3c646b8328bb/venv/lib/python2.7/site-packages/ Bio biopython-1.62-py2.7.egg-info BioSQL docopt-0.6.1-py2.7.egg-info docopt.py docopt.pyc easy-install.pth Levenshtein.so ngs ngs_tools-0.1.6-py2.7.egg-info pip-1.3.1-py2.7.egg python_Levenshtein-0.10.2-py2.7.egg-info setuptools-0.6c11-py2.7.egg setuptools.pth Thanks, Carlos

Hi Carlos!
Hi,
I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62.
[1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6
Cool thanks! I think that is best practise!
I'm installing my package using the virtualenv option. As you can see below I marked biopython as prior_installation_required="True". However, I can see that virtualenv is installing a copy of biopython in 'venv' under my tool install(see below). This doesn't sound right. I was expecting that by making my repository dependent of biopython's one that source would be used in my tool.
Can you try to insert that (maybe adopt, its not checked): <action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action> Only with that the PYTHONPATH is populated.
My plan is to create any non-existing "Tool dependency definition" repository for every package my tool requires. This way you can always count the same version of any of these packages is being used. I thought that was the point, right?
Yes!
Full ool_dependencies.xml: <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency>
Contents of site-packages in my tool install dir: $ ls -1 shed_tools_dependencies.central/ngs-tools/0.1.6/cjav/package_ngs_tools_0_1_6/3c646b8328bb/venv/lib/python2.7/site-packages/ Bio biopython-1.62-py2.7.egg-info BioSQL docopt-0.6.1-py2.7.egg-info docopt.py docopt.pyc easy-install.pth Levenshtein.so ngs ngs_tools-0.1.6-py2.7.egg-info pip-1.3.1-py2.7.egg python_Levenshtein-0.10.2-py2.7.egg-info setuptools-0.6c11-py2.7.egg setuptools.pth
Thanks, Carlos ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

On Mon, Sep 23, 2013 at 12:51 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Carlos!
Hi,
I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62.
[1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6
Cool thanks! I think that is best practise!
I disagree strongly. This is not a good idea. My thought is if you are using a virtualenv, you shouldn't be trying to compose it with other python dependencies - to me the whole point of virtualenv is that it creates isolated little environments. I am not sure why duplicating the biopython install reduces reproduciblity. If you want to use individual Python packages using Galaxy's dependency mechanism, I think you should then package them up one at a time and hand modify PYTHONPATH and PATH - the way biopython is done. Galaxy should have a separate action to make this easy, say install_pip or something like that, as outlined by James Taylor at some point. -John
I'm installing my package using the virtualenv option. As you can see below I marked biopython as prior_installation_required="True". However, I can see that virtualenv is installing a copy of biopython in 'venv' under my tool install(see below). This doesn't sound right. I was expecting that by making my repository dependent of biopython's one that source would be used in my tool.
Can you try to insert that (maybe adopt, its not checked):
<action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action>
Only with that the PYTHONPATH is populated.
My plan is to create any non-existing "Tool dependency definition" repository for every package my tool requires. This way you can always count the same version of any of these packages is being used. I thought that was the point, right?
Yes!
Full ool_dependencies.xml: <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency>
Contents of site-packages in my tool install dir: $ ls -1 shed_tools_dependencies.central/ngs-tools/0.1.6/cjav/package_ngs_tools_0_1_6/3c646b8328bb/venv/lib/python2.7/site-packages/ Bio biopython-1.62-py2.7.egg-info BioSQL docopt-0.6.1-py2.7.egg-info docopt.py docopt.pyc easy-install.pth Levenshtein.so ngs ngs_tools-0.1.6-py2.7.egg-info pip-1.3.1-py2.7.egg python_Levenshtein-0.10.2-py2.7.egg-info setuptools-0.6c11-py2.7.egg setuptools.pth
Thanks, Carlos ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

On Mon, Sep 23, 2013 at 1:57 PM, John Chilton <chilton@msi.umn.edu> wrote:
On Mon, Sep 23, 2013 at 12:51 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Carlos!
Hi,
I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62.
[1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6
Cool thanks! I think that is best practise!
I disagree strongly. This is not a good idea. My thought is if you are using a virtualenv, you shouldn't be trying to compose it with other python dependencies - to me the whole point of virtualenv is that it creates isolated little environments. I am not sure why duplicating the biopython install reduces reproduciblity.
If you want to use individual Python packages using Galaxy's dependency mechanism, I think you should then package them up one at a time and hand modify PYTHONPATH and PATH - the way biopython is done.
Galaxy should have a separate action to make this easy, say install_pip or something like that, as outlined by James Taylor at some point.
Hi John, First let me tell you why I think duplicating the biopython install reduces reproducibility. I have this on my tool setup.py: install_requires=[ "docopt", "biopython", "python-levenshtein" ], While I could specify versions here(ex. biopython==1.62), I feel that is not a good thing outside of Galaxy. I think pip should be free to install the latest version of these packages until I found there is an issue otherwise. I think this is the most common approach, I might be wrong thou. This leaves me with the issue that then when Galaxy installs my tool using virtualenv, it will grab the most up-to-date version of these packages, hence reducing reproducibility. Did I explain myself well enough? I'll be happy to debate about any of this. I agree with you about breaking the original intent of virtualenv, but as you mention without 'install_pip' or similar, I'm left without the option of making my life easy by just installing my package with an approach using pypi. Thanks, Carlos

On Mon, Sep 23, 2013 at 1:09 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
On Mon, Sep 23, 2013 at 1:57 PM, John Chilton <chilton@msi.umn.edu> wrote:
On Mon, Sep 23, 2013 at 12:51 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Carlos!
Hi,
I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62.
[1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6
Cool thanks! I think that is best practise!
I disagree strongly. This is not a good idea. My thought is if you are using a virtualenv, you shouldn't be trying to compose it with other python dependencies - to me the whole point of virtualenv is that it creates isolated little environments. I am not sure why duplicating the biopython install reduces reproduciblity.
If you want to use individual Python packages using Galaxy's dependency mechanism, I think you should then package them up one at a time and hand modify PYTHONPATH and PATH - the way biopython is done.
Galaxy should have a separate action to make this easy, say install_pip or something like that, as outlined by James Taylor at some point.
Hi John,
First let me tell you why I think duplicating the biopython install reduces reproducibility. I have this on my tool setup.py: install_requires=[ "docopt", "biopython", "python-levenshtein" ],
While I could specify versions here(ex. biopython==1.62), I feel that is not a good thing outside of Galaxy. I think pip should be free to install the latest version of these packages until I found there is an issue otherwise. I think this is the most common approach, I might be wrong thou. This leaves me with the issue that then when Galaxy installs my tool using virtualenv, it will grab the most up-to-date version of these packages, hence reducing reproducibility. Did I explain myself well enough? I'll be happy to debate about any of this.
I understand what you are saying and I sympathize with you here. Still I think the better approach is going to be to copy these requirements into the setup_virtualenv block and specify hard-coded versions. This way you get reproduciblity across all packages, not just biopython. I think this slight duplication is a smaller problem then mixing dependency mechanisms you described in your approach. To me it is analogous to installing some python dependencies via os packages and other ones via sudo pip install into /usr, it is a recipe for confusion.
I agree with you about breaking the original intent of virtualenv, but as you mention without 'install_pip' or similar, I'm left without the option of making my life easy by just installing my package with an approach using pypi.
Thanks, Carlos

On Mon, Sep 23, 2013 at 2:22 PM, John Chilton <chilton@msi.umn.edu> wrote:
Hi John,
First let me tell you why I think duplicating the biopython install reduces reproducibility. I have this on my tool setup.py: install_requires=[ "docopt", "biopython", "python-levenshtein" ],
While I could specify versions here(ex. biopython==1.62), I feel that is not a good thing outside of Galaxy. I think pip should be free to install the latest version of these packages until I found there is an issue otherwise. I think this is the most common approach, I might be wrong thou. This leaves me with the issue that then when Galaxy installs my tool using virtualenv, it will grab the most up-to-date version of these packages, hence reducing reproducibility. Did I explain myself well enough? I'll be happy to debate about any of this.
I understand what you are saying and I sympathize with you here. Still I think the better approach is going to be to copy these requirements into the setup_virtualenv block and specify hard-coded versions. This way you get reproduciblity across all packages, not just biopython.
Hi John, Could you go a little further with this recommendation. How can I specify versions for required packages in setup_virtualenv. I now have this: <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> I tried these two without luck: <action type="setup_virtualenv">docopt==0.6.1 python-levenshtein==0.10.2 biopython==1.62 ngs-tools==0.1.6</action> <action type="setup_virtualenv">docopt==0.6.1, python-levenshtein==0.10.2, biopython==1.62, ngs-tools==0.1.6</action>
I think this slight duplication is a smaller problem then mixing dependency mechanisms you described in your approach. To me it is analogous to installing some python dependencies via os packages and other ones via sudo pip install into /usr, it is a recipe for confusion.
While I'm getting convinced that maybe some duplication is not that bad after all, please notice that my plan is to install everything from the toolshed. I also don't like mixing install methods. In fact, I would like for 'install_pip' to have the best practice option of doing always 'pip install --no-deps'. This would force you to first upload everything your package needs to the toolshed. Thanks, Carlos

On Mon, Sep 23, 2013 at 2:30 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
On Mon, Sep 23, 2013 at 2:22 PM, John Chilton <chilton@msi.umn.edu> wrote:
Hi John,
First let me tell you why I think duplicating the biopython install reduces reproducibility. I have this on my tool setup.py: install_requires=[ "docopt", "biopython", "python-levenshtein" ],
While I could specify versions here(ex. biopython==1.62), I feel that is not a good thing outside of Galaxy. I think pip should be free to install the latest version of these packages until I found there is an issue otherwise. I think this is the most common approach, I might be wrong thou. This leaves me with the issue that then when Galaxy installs my tool using virtualenv, it will grab the most up-to-date version of these packages, hence reducing reproducibility. Did I explain myself well enough? I'll be happy to debate about any of this.
I understand what you are saying and I sympathize with you here. Still I think the better approach is going to be to copy these requirements into the setup_virtualenv block and specify hard-coded versions. This way you get reproduciblity across all packages, not just biopython.
Hi John,
Could you go a little further with this recommendation. How can I specify versions for required packages in setup_virtualenv. I now have this: <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install>
I tried these two without luck: <action type="setup_virtualenv">docopt==0.6.1 python-levenshtein==0.10.2 biopython==1.62 ngs-tools==0.1.6</action>
So the contents is treated like a requirements.txt file. So the whitespace becomes important (I have a plan to improve this and sort of synchronize the syntax used for Ruby, Python, and R, but for now its just a file). So you want this: <action type="setup_virtualenv">docopt==0.6.1 python-levenshtein==0.10.2 biopython==1.62 ngs-tools==0.1.6 </action> Newline between dependencies, and no whitespace to the left of each package. Someday the syntax will be: <action type="setup_virtualenv"> <package>docopt==0.6.1</package> <package>python-levenshtein==0.10.2</package> biopython==1.62 ngs-tools==0.1.6 </action>
<action type="setup_virtualenv">docopt==0.6.1, python-levenshtein==0.10.2, biopython==1.62, ngs-tools==0.1.6</action>
I think this slight duplication is a smaller problem then mixing dependency mechanisms you described in your approach. To me it is analogous to installing some python dependencies via os packages and other ones via sudo pip install into /usr, it is a recipe for confusion.
While I'm getting convinced that maybe some duplication is not that bad after all, please notice that my plan is to install everything from the toolshed. I also don't like mixing install methods. In fact, I would like for 'install_pip' to have the best practice option of doing always 'pip install --no-deps'. This would force you to first upload everything your package needs to the toolshed.
Thanks, Carlos

Opps, sent that last mail out before I meant to. On Mon, Sep 23, 2013 at 2:30 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
On Mon, Sep 23, 2013 at 2:22 PM, John Chilton <chilton@msi.umn.edu> wrote:
Hi John,
First let me tell you why I think duplicating the biopython install reduces reproducibility. I have this on my tool setup.py: install_requires=[ "docopt", "biopython", "python-levenshtein" ],
While I could specify versions here(ex. biopython==1.62), I feel that is not a good thing outside of Galaxy. I think pip should be free to install the latest version of these packages until I found there is an issue otherwise. I think this is the most common approach, I might be wrong thou. This leaves me with the issue that then when Galaxy installs my tool using virtualenv, it will grab the most up-to-date version of these packages, hence reducing reproducibility. Did I explain myself well enough? I'll be happy to debate about any of this.
I understand what you are saying and I sympathize with you here. Still I think the better approach is going to be to copy these requirements into the setup_virtualenv block and specify hard-coded versions. This way you get reproduciblity across all packages, not just biopython.
Hi John,
Could you go a little further with this recommendation. How can I specify versions for required packages in setup_virtualenv. I now have this: <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install>
I tried these two without luck: <action type="setup_virtualenv">docopt==0.6.1 python-levenshtein==0.10.2 biopython==1.62 ngs-tools==0.1.6</action> <action type="setup_virtualenv">docopt==0.6.1, python-levenshtein==0.10.2, biopython==1.62, ngs-tools==0.1.6</action>
So the contents or text of the action is treated like a requirements.txt file. So the whitespace becomes important (I have a plan to improve this and sort of synchronize the syntax used for Ruby, Python, and R, but for now its just a file). So you want this: <action type="setup_virtualenv">docopt==0.6.1 python-levenshtein==0.10.2 biopython==1.62 ngs-tools==0.1.6 </action> Newline between dependencies, and no whitespace to the left of each package. Someday I would hope the syntax will be (with the older version supported as well for backward compatibility sake) and to support an even simpler YAML based version: <action type="setup_virtualenv"> <package>docopt==0.6.1</package> <package>python-levenshtein==0.10.2</package> <package>ngs-tools==0.1.6</package> </action> Hope this helps. -John
I think this slight duplication is a smaller problem then mixing dependency mechanisms you described in your approach. To me it is analogous to installing some python dependencies via os packages and other ones via sudo pip install into /usr, it is a recipe for confusion.
While I'm getting convinced that maybe some duplication is not that bad after all, please notice that my plan is to install everything from the toolshed. I also don't like mixing install methods. In fact, I would like for 'install_pip' to have the best practice option of doing always 'pip install --no-deps'. This would force you to first upload everything your package needs to the toolshed.
Thanks, Carlos

On Mon, Sep 23, 2013 at 3:41 PM, John Chilton <chilton@msi.umn.edu> wrote:
So you want this: <action type="setup_virtualenv">docopt==0.6.1 python-levenshtein==0.10.2 biopython==1.62 ngs-tools==0.1.6 </action>
Newline between dependencies, and no whitespace to the left of each package.
Hi John, I can confirm this did the trick. It also solves my worries with reproducibility. Before I didn't know you could specify more than one package in "setup_virtualenv". Björn, I'll be happy to keep testing Biopython package and I will keep an eye on how the community decide to move forward in the future about dealing with dependencies. For now I feel the virtualenv approach seems to be the more easy and less prone to break way for python packages. Thanks, I greatly appreciate your time helping me, Carlos

Am Montag, den 23.09.2013, 12:57 -0500 schrieb John Chilton:
On Mon, Sep 23, 2013 at 12:51 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Carlos!
Hi,
I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62.
[1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6
Cool thanks! I think that is best practise!
I disagree strongly. This is not a good idea. My thought is if you are using a virtualenv, you shouldn't be trying to compose it with other python dependencies - to me the whole point of virtualenv is that it creates isolated little environments. I am not sure why duplicating the biopython install reduces reproduciblity.
If you want to use individual Python packages using Galaxy's dependency mechanism, I think you should then package them up one at a time and hand modify PYTHONPATH and PATH - the way biopython is done.
Galaxy should have a separate action to make this easy, say install_pip or something like that, as outlined by James Taylor at some point.
Hi John, interesting point. What is the specific difference between venv and the pip installation? In which case do you propose to use which method? I never thought about different 'isolation-levels' (venv vs. the rest of the toolshed). We need to document somehow the use cases if we have different methods for python installations. Ciao, Bjoern
-John
I'm installing my package using the virtualenv option. As you can see below I marked biopython as prior_installation_required="True". However, I can see that virtualenv is installing a copy of biopython in 'venv' under my tool install(see below). This doesn't sound right. I was expecting that by making my repository dependent of biopython's one that source would be used in my tool.
Can you try to insert that (maybe adopt, its not checked):
<action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action>
Only with that the PYTHONPATH is populated.
My plan is to create any non-existing "Tool dependency definition" repository for every package my tool requires. This way you can always count the same version of any of these packages is being used. I thought that was the point, right?
Yes!
Full ool_dependencies.xml: <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency>
Contents of site-packages in my tool install dir: $ ls -1 shed_tools_dependencies.central/ngs-tools/0.1.6/cjav/package_ngs_tools_0_1_6/3c646b8328bb/venv/lib/python2.7/site-packages/ Bio biopython-1.62-py2.7.egg-info BioSQL docopt-0.6.1-py2.7.egg-info docopt.py docopt.pyc easy-install.pth Levenshtein.so ngs ngs_tools-0.1.6-py2.7.egg-info pip-1.3.1-py2.7.egg python_Levenshtein-0.10.2-py2.7.egg-info setuptools-0.6c11-py2.7.egg setuptools.pth
Thanks, Carlos ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

On Mon, Sep 23, 2013 at 1:25 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Am Montag, den 23.09.2013, 12:57 -0500 schrieb John Chilton:
On Mon, Sep 23, 2013 at 12:51 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Carlos!
Hi,
I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62.
[1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6
Cool thanks! I think that is best practise!
I disagree strongly. This is not a good idea. My thought is if you are using a virtualenv, you shouldn't be trying to compose it with other python dependencies - to me the whole point of virtualenv is that it creates isolated little environments. I am not sure why duplicating the biopython install reduces reproduciblity.
If you want to use individual Python packages using Galaxy's dependency mechanism, I think you should then package them up one at a time and hand modify PYTHONPATH and PATH - the way biopython is done.
Galaxy should have a separate action to make this easy, say install_pip or something like that, as outlined by James Taylor at some point.
Hi John,
interesting point. What is the specific difference between venv and the pip installation? In which case do you propose to use which method? I never thought about different 'isolation-levels' (venv vs. the rest of the toolshed). We need to document somehow the use cases if we have different methods for python installations.
I would use the venv method for everything (unless there is some reason virtualenv cannot be used a certain case, I would not be surprised if I was over estimating its power) and it wouldn't be composable - let the broader Python community worry about Python dependencies, how to install them, how to package them, etc.... Galaxy should just be concerned with figuring out how to enable the environment before running a job. Let me know if I am being a jackass who is oversimplifying this and there is some reason it cannot be done. The use case for not using venv is not necessarily something I understand, but its clear you and Carlos and others want to have individual Python dependencies packaged in the tool shed the way Debian or CentOS do it. You will have to explain that use case to me :). While I don't like this approach, if it is something you guys want to do, you should have the tools to make it as easy as possible to do this. If that is something like setup_pip (i.e. wrapper for pip install) or setup_python (wrapper for python setup.py --prefix=$install_dir install), cool beans these should be pretty straight forward to implement and should save on some typing. I still think of them as anti-patterns though :). -John
Ciao, Bjoern
-John
I'm installing my package using the virtualenv option. As you can see below I marked biopython as prior_installation_required="True". However, I can see that virtualenv is installing a copy of biopython in 'venv' under my tool install(see below). This doesn't sound right. I was expecting that by making my repository dependent of biopython's one that source would be used in my tool.
Can you try to insert that (maybe adopt, its not checked):
<action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action>
Only with that the PYTHONPATH is populated.
My plan is to create any non-existing "Tool dependency definition" repository for every package my tool requires. This way you can always count the same version of any of these packages is being used. I thought that was the point, right?
Yes!
Full ool_dependencies.xml: <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency>
Contents of site-packages in my tool install dir: $ ls -1 shed_tools_dependencies.central/ngs-tools/0.1.6/cjav/package_ngs_tools_0_1_6/3c646b8328bb/venv/lib/python2.7/site-packages/ Bio biopython-1.62-py2.7.egg-info BioSQL docopt-0.6.1-py2.7.egg-info docopt.py docopt.pyc easy-install.pth Levenshtein.so ngs ngs_tools-0.1.6-py2.7.egg-info pip-1.3.1-py2.7.egg python_Levenshtein-0.10.2-py2.7.egg-info setuptools-0.6c11-py2.7.egg setuptools.pth
Thanks, Carlos ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

On Mon, Sep 23, 2013 at 1:51 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
I'm installing my package using the virtualenv option. As you can see below I marked biopython as prior_installation_required="True". However, I can see that virtualenv is installing a copy of biopython in 'venv' under my tool install(see below). This doesn't sound right. I was expecting that by making my repository dependent of biopython's one that source would be used in my tool.
Can you try to insert that (maybe adopt, its not checked):
<action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action>
Only with that the PYTHONPATH is populated.
Hi Björn, No luck with this recommendation. See my full tool_dependencies.xml below in case I miss read you. Biopython still gets installed into this repository install 'venv'. I will try to move to what biopython is doing. Hopefully and probably better, as John mentioned something like 'install_pip' will come around in the future with support for automatically modifying PYTHONPATH accordingly based on the repository dependencies. <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency> Thanks, Carlos

On Mon, Sep 23, 2013 at 2:24 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
Can you try to insert that (maybe adopt, its not checked):
<action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action>
Only with that the PYTHONPATH is populated.
Hi Björn,
No luck with this recommendation. See my full tool_dependencies.xml below in case I miss read you. Biopython still gets installed into this repository install 'venv'. I will try to move to what biopython is doing. Hopefully and probably better, as John mentioned something like 'install_pip' will come around in the future with support for automatically modifying PYTHONPATH accordingly based on the repository dependencies.
<?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action> <actions> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency>
I made a mistake here. After fixing it still didn't work. I placed action "set_environment_for_install" outside of actions. It should be: <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.62"> <repository changeset_revision="ac9cc2992b69" name="package_biopython_1_62" owner="biopython" prior_installation_required="True" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <package name="ngs-tools" version="0.1.6"> <install version="1.0"> <actions> <action type="set_environment_for_install"> <repository name="package_biopython_1_62" owner="biopython"> <package name="biopython" version="1.62" /> </repository> </action> <action type="setup_virtualenv">ngs-tools==0.1.6</action> </actions> </install> </package> </tool_dependency> Testing now John's recommendations of specifying versions in setup_virtualenv, which means I won't be using package_biopython_1_62 and its dependencies. It would be nice to see how package_biopython_* could be used in my case. It would also be nice to know if the consensus is to keep everything(dependencies included) inside the toolshed or not. Thanks, Carlos
participants (4)
-
Björn Grüning
-
Carlos Borroto
-
John Chilton
-
John Chilton