Dear all, This email is mostly to report a negative result, maybe to help others _not_ trying something. I researched the possibility of replacing ubuntu with alpine on the Docker images (alpine seems to be used more and more on docker images, with plenty of official containers now based on it and not on debian). The reason alpine is used, its because it generates very small containers (a bare bones one is below 10 MB). But, due the the large dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20% is something, but not a revolution. This being said, for servers with smaller dependencies (a mail server, web, ldap, dns...) alpine really reduces the footprint of docker containers. Tiago -- "While I may be sending this email outside my normal office hours, I have no expectation to receive a reply outside yours" - @tomstafford
Hi Tiago, thanks for the heads-up. I also tried alpine some time ago but Galaxy needs some external dependencies which Nate is building also with a ppa for Debian based systems. So this is not so easy to migrate. What we could do is to orchestrate the containers and its deps: https://github.com/bgruening/docker-galaxy-stable/issues/43 But this has the big disadvantages of not sharing the setup with other Galaxy installations, like the VM installation from planemo-machine. So in the end I stoped to make it more modular and tried to share as much as possible with other installations of Galaxy and move more and more into the ansible-playbook. Thanks Tiago for trying this, Bjoern Am 23.02.2016 um 05:15 schrieb Tiago Antao:
Dear all,
This email is mostly to report a negative result, maybe to help others _not_ trying something.
I researched the possibility of replacing ubuntu with alpine on the Docker images (alpine seems to be used more and more on docker images, with plenty of official containers now based on it and not on debian).
The reason alpine is used, its because it generates very small containers (a bare bones one is below 10 MB). But, due the the large dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20% is something, but not a revolution.
This being said, for servers with smaller dependencies (a mail server, web, ldap, dns...) alpine really reduces the footprint of docker containers.
Tiago
Just my 2 cents about this: I feel your pain, our connection to the docker hub is horrible, it takes an hour to pull the GIE images ... . (While it only takes seconds from the cloud ...). The galaxy docker images are pretty fat, because we use them VM-style, in principle nothing wrong about that. So the base-image is about 1.2GB .... add a few tools and you're quickly reaching 5GB. If in addition you use interactive environments ... add a few GB more. I think that instead of trying to reduce the size of the base-image, it might be a better effort to separate the components into a proxy-image, database image (perhaps 2, one for tools / one for user-data), galaxy-image, cluster-image ... and so on. This would allow you to just update the tools and galaxy image regularly, plus you could do all the neat docker stuff, like versioning, committing, rolling updates, streaming database replication, worker scaling ... It's certainly something I would be interested in. The other thing is to make sure that the tool dependencies are as slim as possible. Having many different R packages and their source lying around makes for a lot of data. Hopefully conda can alleviate that situation. Cheers, Marius On 23 February 2016 at 09:36, Björn Grüning <bjoern.gruening@gmail.com> wrote:
Hi Tiago,
thanks for the heads-up. I also tried alpine some time ago but Galaxy needs some external dependencies which Nate is building also with a ppa for Debian based systems. So this is not so easy to migrate.
What we could do is to orchestrate the containers and its deps:
https://github.com/bgruening/docker-galaxy-stable/issues/43
But this has the big disadvantages of not sharing the setup with other Galaxy installations, like the VM installation from planemo-machine.
So in the end I stoped to make it more modular and tried to share as much as possible with other installations of Galaxy and move more and more into the ansible-playbook.
Thanks Tiago for trying this, Bjoern
Am 23.02.2016 um 05:15 schrieb Tiago Antao:
Dear all,
This email is mostly to report a negative result, maybe to help others _not_ trying something.
I researched the possibility of replacing ubuntu with alpine on the Docker images (alpine seems to be used more and more on docker images, with plenty of official containers now based on it and not on debian).
The reason alpine is used, its because it generates very small containers (a bare bones one is below 10 MB). But, due the the large dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20% is something, but not a revolution.
This being said, for servers with smaller dependencies (a mail server, web, ldap, dns...) alpine really reduces the footprint of docker containers.
Tiago
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Am 23.02.2016 um 10:13 schrieb Marius van den Beek:
Just my 2 cents about this: I feel your pain, our connection to the docker hub is horrible, it takes an hour to pull the GIE images ... . (While it only takes seconds from the cloud ...).
This is a Dockerhub related issue and we offer also images from: https://quay.io/repository/bgruening/galaxy This is much faster in downloading and building.
The galaxy docker images are pretty fat, because we use them VM-style, in principle nothing wrong about that. So the base-image is about 1.2GB ....
mh, the base image is 400 Mb at least this is what quay is telling me: https://quay.io/repository/bgruening/galaxy?tab=tags
add a few tools and you're quickly reaching 5GB.
Not if you use Docker or conda as dependency resolving mechanism. Indeed I'm working on switching many flavours to use conda and Docker as dependency resolvers, which postpones (and lowers) the download from pulltime to runtime of the tools.
If in addition you use interactive environments ... add a few GB more.
Only if you use them, at first run.
I think that instead of trying to reduce the size of the base-image, it might be a better effort to separate the components into a proxy-image, database image (perhaps 2, one for tools / one for user-data), galaxy-image, cluster-image ... and so on. This would allow you to just update the tools and galaxy image regularly, plus you could do all the neat docker stuff, like versioning, committing, rolling updates, streaming database replication, worker scaling ... It's certainly something I would be interested in.
Here is the related issue: https://github.com/bgruening/docker-galaxy-stable/issues/43 But going this route means to not be in sync with the ansible-roles used by creating the VM's, Clound-Images ... but we have plans to base the Cloudman ontop of Docker. If you can base VM's ontop of Docker we can possibly use Docker compose if we decide it's worth the effort and switch our other deployments to it. I would very much keep all deployments in sync as we do currently.
The other thing is to make sure that the tool dependencies are as slim as possible. Having many different R packages and their source lying around makes for a lot of data. Hopefully conda can alleviate that situation.
Yeah, it does! Ciao, Bjoern
Cheers, Marius
On 23 February 2016 at 09:36, Björn Grüning <bjoern.gruening@gmail.com <mailto:bjoern.gruening@gmail.com>> wrote:
Hi Tiago,
thanks for the heads-up. I also tried alpine some time ago but Galaxy needs some external dependencies which Nate is building also with a ppa for Debian based systems. So this is not so easy to migrate.
What we could do is to orchestrate the containers and its deps:
https://github.com/bgruening/docker-galaxy-stable/issues/43
But this has the big disadvantages of not sharing the setup with other Galaxy installations, like the VM installation from planemo-machine.
So in the end I stoped to make it more modular and tried to share as much as possible with other installations of Galaxy and move more and more into the ansible-playbook.
Thanks Tiago for trying this, Bjoern
Am 23.02.2016 um 05:15 schrieb Tiago Antao: > Dear all, > > This email is mostly to report a negative result, maybe to help others > _not_ trying something. > > I researched the possibility of replacing ubuntu with alpine on the > Docker images (alpine seems to be used more and more on docker images, > with plenty of official containers now based on it and not on debian). > > The reason alpine is used, its because it generates very small > containers (a bare bones one is below 10 MB). But, due the the large > dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20% > is something, but not a revolution. > > This being said, for servers with smaller dependencies (a mail server, > web, ldap, dns...) alpine really reduces the footprint of docker > containers. > > Tiago > ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
Björn Grüning
-
Marius van den Beek
-
Tiago Antao