Thanks Enis, just to elaborate on Pulsar - I suspect it would work with something like configuring Galaxy with S3 object store right now - but it would do so by having Galaxy cache the data locally and then Pulsar would negotiate the transfer with Galaxy (many different ways this could occur depending on who things are mounted). Ideally - it wouldn't happen this way though - I would love it if Galaxy could determine the job is going to be run remotely and not attempt the cache and then configure the remote Pulsar to cache the file directly from the object store abstraction. In addition to eliminating the extra cache and transfer, it could allow Pulsar and Galaxy to have different views of the underlying data sources (e.g. here the data is mounted as X and there the data is mounted as Y - or here the data is directly available and there get it via IRODS, etc...). There are some ... initial grasps... at this sort of thing in Pulsar and Galaxy but it is not fully (or even substantially) implemented currently. -John On Tue, Aug 26, 2014 at 11:18 AM, Enis Afgan <afgane@gmail.com> wrote:
Hi Inge, There is an implementation for using the AWS S3 object store as the data store for a given Galaxy instance. The implementation is located here https://bitbucket.org/galaxy/galaxy-central/src/3a51eaf209f2502bf32dbb421eca... and it offers several config options in universe_wsgi.ini.
The data stored in S3 is locally cached while it's being operated on but always synced with the back end object store.
Pulsar seems to have some support for S3 but, as the docs say in the implementation, it's explicitly beta: https://github.com/galaxyproject/pulsar/blob/b32b7caafc6582a3a28e694e2dbb75e...
As a side note, there are some planned enhancements to how the object store implementation is handled and there will hopefully be quite a bit of activity on this topic in the near future (eg, https://trello.com/c/YynQKq8m).
Hope this at least clarifies the state of object store support, Enis
On Mon, Aug 25, 2014 at 10:24 AM, Raknes Inge Alexander <inge.a.raknes@uit.no> wrote:
I have a few questions about object stores in Galaxy:
1: Can all Galaxy data sets be stored in an object store? 2: If so, does Galaxy still need to maintain a local copy of the data? 3: Is LWR or Pulsar able to get the data directly from the object store, or does it still have to go through Galaxy?
We are planning to let users of our Galaxy installation handle large input/output files (~30G) and we expect that the VM containing our Galaxy installation will become a bottleneck if all data needs to travel through that node.
- Inge Alexander Raknes
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/