Hi Rémy,
Hi,
Ok, it seems to be great. We had many problems with RPM, R, or Perl packages not maintained/"downloadable" any more. The only SPOF I see now is Github itself and the Cargo-Port source code ;) Perhaps you should try to add it to python/pip ?
Carogo-port is already pip installable. The problem is more to host and update the big proxy table. For this you always want to have the latest table so pypi is not practical here. For the code, the checks and the gsl installer, sure this can be made available in pypi. Cheers, Bjoern
Best, Remy
2016-01-04 17:50 GMT+01:00 Eric Rasche <esr@tamu.edu>:
Hi Peter,
On 01/02/2016 11:54 PM, Peter Cock wrote:
On Friday, 1 January 2016, Björn Grüning < <bjoern.gruening@gmail.com> bjoern.gruening@gmail.com> wrote:
Hi Galaxy developers,
this is a RFC to get the implementation details right for a new action type in `tool_dependencies.xml`.
Since years we try to save a very crucial sustainability problem: **Non-sustainable links**!
A little bit of history ------------------------
At first we tried to [mirror tarballs](https://github.com/bgruening/download_store) with sceptical sustainability, like BioC or random FTP servers. But over time we encountered many more places which we can not trust. Google-Code, SourceForge etc ... We tried to mirror the entire BioC history by tracking the SVN history down and creating tarball for every revision ... a Herculean task ... but still limited in scope because there are so many other things that needs to be archived to make Galaxy and all tools sustainable.
In the end we ended up with the simplest solution, provide a community archive where everyone can drop tarballs that they want to be sustainable. The Galaxy Project was so generous and is funding the storage but we have plans to mirror and distribute the workload to universities and other institutes that want to help.
The biggest problem we needed to solve was the access to the archive. Who can drop tarballs? How do we control access to prevent abuse of this system?
We went ahead and the created the Cargo-Port: https://github.com/galaxyproject/cargo-port Access will be controlled by a community and via PR. Add your package and we will check the content (hopefully) automatically and the tarball will be mirrored to a storage server.
RFC ---
So far so good. This RFC is about the usage of Cargo-Port inside of Galaxy. I would like to propose a new action type that uses the Cargo-Port directly. It should replace `<action type="download_by_url" sha256sum="6387238383883...">` and `<action type="download_file">` and offer a more transparent and user-friendly solution. The current state of the art is quite cumbersome since we need to generate manually the checksum, offer the correct link and get the same information into Cargo-Port. I would like to streamline this a little bit and use this as a good opportunity to fix and work on https://github.com/galaxyproject/galaxy/issues/896.
Proposal `<action type="download_by_proxy">`: * attribute for Id, Version, Platform, Architecture * no URL, no checksum * attribute for the URL to cargo-port/urls.tsv * default to the current github repo * configurable via galaxy.ini * this action will more or less trigger this curl command: `$ curl https://raw.githubusercontent.com/galaxyproject/cargo-port/master/gsl.py | python - --package_id augustus_3_1` * which give us the freedom to change API, columns ... in Cargo-Port without updating Galaxy core * the only API that need to keep stable is `gsl` * `gsl` will try to download from the original URL, specified in Cargo-Port. If this does not work we will download our archived one. * Changing the current working dir? Is this what we want, e.g. automatically uncompress and change cwd like `download_by_url`. * We will need an attribute to not uncompress. A few tools need the tarballs uncompressed.
Single Point of Failure - a small remark ----------------------------------------
Previously, Galaxy packages relied entirely on the kindness of upstream to maintain existing packages indefinitely. Obviously not a sustainable practice. Every time a tarball was moved, we had to hope one of us retained a copy so that we could ensure reproducibility. With the advent of the Cargo Port, we now maintain a complete, redundant copy of every upstream tarball used in IUC and devteam repositories, additionally adding sha256sums for every file to ensure download integrity. The community is welcome to request that files they use in their packages be added as well. We believe this will help combat the single point of failure by providing at least one level of duplication. The Cargo Port is considering plans to provide mirrors of itself to various universities and another layer of redundancy.
Thanks for reading and we appreciate any comments.
Eric, Nitesh & Bjoern
Maybe a question for Nitesh,
Would this replace or coexist with related but narrower in scope Bioarchive project?
Different scope, coexist.
Bioarchive
- Hosts only bioconductor packages - R package specific UI features. - may someday offer advanced features like dependency tree building
The Cargo Port
- Packages from any upstream - Pretty much feature complete
https://bioarchive.galaxyproject.org/
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Eric Rasche Programmer II
Center for Phage Technology Rm 312A, BioBio Texas A&M University College Station, TX 77843404-692-2048esr@tamu.edu
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/