Just my 2 cents about this:
I feel your pain, our connection to the docker hub is horrible, it takes an hour to pull the GIE images ... .
(While it only takes seconds from the cloud ...).
The galaxy docker images are pretty fat, because we use them VM-style, in principle nothing wrong about that.
So the base-image is about 1.2GB .... add a few tools and you're quickly reaching 5GB.
If in addition you use interactive environments ... add a few GB more.
I think that instead of trying to reduce the size of the base-image, it might be a better effort to separate the components
into a proxy-image, database image (perhaps 2, one for tools / one for user-data), galaxy-image,
cluster-image ... and so on. This would allow you to just update the tools and galaxy image regularly,
plus you could do all the neat docker stuff, like versioning, committing, rolling updates, streaming database replication, worker scaling ...
It's certainly something I would be interested in.
The other thing is to make sure that the tool dependencies are as slim as possible. Having many different R packages
and their source lying around makes for a lot of data. Hopefully conda can alleviate that situation.
Cheers,
Marius