Hi Ryan, What you're suggesting to do is still somewhat experimental but we're continuing to work on it to make it more integrated into the Galaxy ecosystem and more robust. There are really three general approaches: 1. Run Galaxy via CloudMan 100% on AWS. This option is most robust and basically ready for use but, over time and if you decide to make modifications to various pieces of the puzzle, will require an increased understanding of how CloudMan works. It's also the most expensive option. 2. Run Galaxy UI locally and create a CloudMan cluster on demand with the Pulsar <http://pulsar.readthedocs.org/en/latest/index.html> service enabled to accept jobs from the local Galaxy. This paper describes that approach: http://onlinelibrary.wiley.com/doi/10.1002/cpe.3536/abstract 3. Run your Galaxy UI locally and create Ansible roles/tasks to dynamically acquire cloud instances and assemble those into a cluster. You will probably want to use Pulsar for job management again. This option gives you most control but also means you'll need to build the system. Nate may also have more comments about this. I've also put some comments about your specific questions inline. Hope this helps clarify the situation at least. We're actively working on this scenario so things should get easier in the future. Let us know if you have more questions and what you decide. Cheers, Enis On Wed, Aug 19, 2015 at 8:53 AM, Ryan G <ngsbioinformatics@gmail.com> wrote:
Hi all - We are running a local instance of Galaxy on our internal infrastructure. It seems to be going well.
We've gotten to the point where we are ready to migrate our NGS data to Amazon for storage in S3. We are also looking at how Galaxy can be used in Amazon. Specifically, we are interested in understanding:
1) Should we run an instance of Galaxy in Amazon, or continue to run it locally (to minimize costs) but have it run analyses in Amazon?
The options above summarize this scenario.
2) Regardless of how we run it, data will be stored in S3. How will Galaxy interact with S3 for its Data Libraries?
Galaxy implements an Object Store interface that can link to S3 as a back-end data store. It's been around for a number of years now and demonstrated as working but it also hasn't been used in production so I'd suggest testing this first. Galaxy configuration options for the object store are in Galaxy's config file: https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L2...
3) Is it even possible to separate the Galaxy web interface from the HPC cluster?
Yes; you either need a shared file system between the resources or use Pulsar.
3) We understand Galaxy in Amazon uses CloudMan. Can we run this in our VPC with our own AMI?
Yes; you can build your own version of the system with the tools and whatever else you would like to configure. Docs on how to do this are available here: https://wiki.galaxyproject.org/CloudMan/Building
If anyone can provide insights into how they are using Galaxy in Amazon, I am very interested to hear your thoughts.
Ryan
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/