On Mon, Sep 23, 2013 at 1:09 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
On Mon, Sep 23, 2013 at 1:57 PM, John Chilton <chilton@msi.umn.edu> wrote:
On Mon, Sep 23, 2013 at 12:51 PM, Björn Grüning <bjoern.gruening@pharmazie.uni-freiburg.de> wrote:
Hi Carlos!
Hi,
I have created a tool that depends on biopython. I don't like the idea of having several copies of biopython around and also for best practice making workflows using my tool reproducible, I chose to make my package repository[1] dependent of package_biopython_1_62.
[1]http://testtoolshed.g2.bx.psu.edu/view/cjav/package_ngs_tools_0_1_6
Cool thanks! I think that is best practise!
I disagree strongly. This is not a good idea. My thought is if you are using a virtualenv, you shouldn't be trying to compose it with other python dependencies - to me the whole point of virtualenv is that it creates isolated little environments. I am not sure why duplicating the biopython install reduces reproduciblity.
If you want to use individual Python packages using Galaxy's dependency mechanism, I think you should then package them up one at a time and hand modify PYTHONPATH and PATH - the way biopython is done.
Galaxy should have a separate action to make this easy, say install_pip or something like that, as outlined by James Taylor at some point.
Hi John,
First let me tell you why I think duplicating the biopython install reduces reproducibility. I have this on my tool setup.py: install_requires=[ "docopt", "biopython", "python-levenshtein" ],
While I could specify versions here(ex. biopython==1.62), I feel that is not a good thing outside of Galaxy. I think pip should be free to install the latest version of these packages until I found there is an issue otherwise. I think this is the most common approach, I might be wrong thou. This leaves me with the issue that then when Galaxy installs my tool using virtualenv, it will grab the most up-to-date version of these packages, hence reducing reproducibility. Did I explain myself well enough? I'll be happy to debate about any of this.
I understand what you are saying and I sympathize with you here. Still I think the better approach is going to be to copy these requirements into the setup_virtualenv block and specify hard-coded versions. This way you get reproduciblity across all packages, not just biopython. I think this slight duplication is a smaller problem then mixing dependency mechanisms you described in your approach. To me it is analogous to installing some python dependencies via os packages and other ones via sudo pip install into /usr, it is a recipe for confusion.
I agree with you about breaking the original intent of virtualenv, but as you mention without 'install_pip' or similar, I'm left without the option of making my life easy by just installing my package with an approach using pypi.
Thanks, Carlos