Galaxy installation inside a secure environment
??Hi Galaxy people, At the University of Oslo, we have an infrastructure for storage and computation on sensitive data, called TSD (presented in the attached image). from inside, the infrastructure is logically divided into projects. Each project has a set of shared VMs (among its users) which are running over a file-system (HNAS). The shared project VMS has access to a computational SLURM cluster which is installed on another file-system (Colossus). There are directories in colossus which are mounted on HNAS for each project in order to allow each project users to push files into colossus and run jobs on the cluster. Users of a particular project can access the project main VMs through user VMs (one for each user). Each project VMs (both shared and user VMs) are in a separate subnet. All of this is inside the TSD. Now to access the TSD from outside, there is a complex authentication mechanism where a user can access his/her own VM. And to transfer data from/to the TSD is another complex story. The important thing is that there is NO internet access in or out.? There are two issues here: 1 - What we need to do is to install one Galaxy VM inside each project area, so that it is accessible by all project users. But we cannot use mercurial to access your distribution server. We can though install a bitbucket server inside the TSD and have the code-base there, so that It can be accessed by all project VMs, but I'm not sure what is the procedure here. 2- We are very concerned about the issue of regularly updating Galaxy instances in projects to the recent release. In many cases it causes many problems, e.g. tool versioning conflicts. So we have the idea of installing each of our tools together with all of its dependencies in a separate docker container, and run those as images on each Galaxy project VM. Is this possible and tested? Should this permanently solve the upgrading problem, or do you suggested another alternative? Thank you, Yours sincerely, Abdulrahman Azab Head engineer, ELIXIR.NO / The Genomic HyperBrowser team Department of Informatics, University of Oslo, Boks 1072 Blindern, NO-0316 OSLO, Norway Email: azab@ifi.uio.no, Cell-phone: +47 46797339 ---- Senior Lecturer in Computer Engineering Faculty of Engineering, University of Mansoura, 35516-Mansoura, Egypt Email: abdulrahman.azab@mans.edu.eg</owa/abdulrahman.azab@mans.edu.eg>
On Thu, Jan 15, 2015 at 10:05 AM, Abdulrahman Azab <azab@ifi.uio.no> wrote:
Hi Galaxy people,
At the University of Oslo, we have an infrastructure for storage and computation on sensitive data, called TSD (presented in the attached image). from inside, the infrastructure is logically divided into projects. Each project has a set of shared VMs (among its users) which are running over a file-system (HNAS). The shared project VMS has access to a computational SLURM cluster which is installed on another file-system (Colossus). There are directories in colossus which are mounted on HNAS for each project in order to allow each project users to push files into colossus and run jobs on the cluster. Users of a particular project can access the project main VMs through user VMs (one for each user). Each project VMs (both shared and user VMs) are in a separate subnet.
All of this is inside the TSD. Now to access the TSD from outside, there is a complex authentication mechanism where a user can access his/her own VM. And to transfer data from/to the TSD is another complex story. The important thing is that there is NO internet access in or out.
There are two issues here:
1 - What we need to do is to install one Galaxy VM inside each project area, so that it is accessible by all project users. But we cannot use mercurial to access your distribution server. We can though install a bitbucket server inside the TSD and have the code-base there, so that It can be accessed by all project VMs, but I'm not sure what is the procedure here.
You can clone mercurial (or in the future git) repositories to some file system that is accessible both internal to the firewalls and external to them (I believe there has to be some file system like this - even if it is just a USB stick - in order to install new software :) ). You can then treat the repository in the shared location as the source of Galaxy and clone/update against it. At a very high-level the initial clone process might look like this: (desktop) % ssh login_node (login_node) % cd /shared_directory/ (login_node) % hg clone https://bitbucket.org/galaxy/galaxy-dist galaxy-dist (login_node) % exit (desktop) % ssh secure_node (secure_node) % cd /project_directory ; hg clone /shared_directory/galaxy-dist galaxy-dist (secure_node) % cd galaxy-dist ; hg update latest_2015.01.13 (secure_node) % exit (desktop) % then doing updates might look like this: (desktop) % ssh login_node (login_node) % cd /shared_directory/ (login_node) % hg pull https://bitbucket.org/galaxy/galaxy-dist galaxy-dist (login_node) % exit (desktop) % ssh secure_node (secure_node) % cd /project_directory/galaxy-dist (secure_node) % hg update latest_future_tag_name (secure_node) % exit (desktop) %
2- We are very concerned about the issue of regularly updating Galaxy instances in projects to the recent release. In many cases it causes many problems, e.g. tool versioning conflicts. So we have the idea of installing each of our tools together with all of its dependencies in a separate docker container, and run those as images on each Galaxy project VM. Is this possible and tested? Should this permanently solve the upgrading problem, or do you suggested another alternative?
Tool dependencies can be installed in Docker containers - several people have tested it - and while there can be some problems getting everything setup (a lot of moving pieces) I think it works fine once setup. I really like Aaron Petkau's tutorial here: https://github.com/apetkau/galaxy-hackathon-2014 https://github.com/apetkau/galaxy-hackathon-2014/smalt Another blog post with more information was put together by the Galaxy User Group Grand Ouest here: https://www.e-biogenouest.org/groups/guggo/wiki/FirstGenOuest And Galaxy's wiki documentation can be found here: https://wiki.galaxyproject.org/Admin/Tools/Docker Kyle Ellrott is really getting the Dockerized tools to run at scale - and we have worked with him to really scale things up across a large cluster. I will try to update the wiki with some of that information. Setting up a local tool shed might be another approach to address this reproduciblity/maintenance problem - but I generally discourage the use of local tool sheds (but cannot access the Internet might be a very good reason to set this up). Hope this helps, -John
Thank you,
Yours sincerely, Abdulrahman Azab
Head engineer, ELIXIR.NO / The Genomic HyperBrowser team Department of Informatics, University of Oslo, Boks 1072 Blindern, NO-0316 OSLO, Norway Email: azab@ifi.uio.no, Cell-phone: +47 46797339 ---- Senior Lecturer in Computer Engineering Faculty of Engineering, University of Mansoura, 35516-Mansoura, Egypt Email: abdulrahman.azab@mans.edu.eg
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Abdulrahman Azab
-
John Chilton