Re: [galaxy-user] galaxy-user Digest, Vol 26, Issue 6

20 Aug 2008

      Michael, I don't have any experience with Condor, but we're finding
that the Galaxy framework scales very well - mostly because it doesn't
do any of the computationally intense stuff itself - it hands that off
through the job runner.  Our internal Galaxy works fine with very
large (6k subjects, Affy 6.0 snp chips, 9.6k subjects, Affy 5.0 snp
chips...) datasets. Tools take a while to run (!) but Galaxy itself is
more or less indifferent to the size of files because it only stores
references (paths eg) to the disk files in the database - not the
actual gigagobs of data. A collection of 100gb files takes about the
same space in the Galaxy database tables as a collection of 1k ones as
far as I can tell. A user's experience of Galaxy tool operation will
obviously be impacted by the effects of physically shuffling large
datafiles around for the cluster backend when a tool is run, so the
cluster architecture, and the way datasets are made available to
cluster nodes for processing is a key issue for very large datasets I
suspect.

On backends, I believe the party-line is that both PostgreSQL and
MySQL are fully supported. We've used MySQL as our backend for nearly
2 years without any problems with released Galaxy versions - all 3
database backends are now all auto-tested before release AFAIK.
Arguably, Postresql might be a better choice technically, and
operationally, that's what runs the primary Galaxy site so is likely
to work! My group remain familiar and comfortable with MySQL and don't
have the energy to swap over. If you were going to swap, do it before
you build a large userbase unless you have a bored DBA available to
unload and reload a set of Galaxy history and user tables mid-stream.

On Thu, Aug 21, 2008 at 2:00 AM,  <galaxy-user-request@bx.psu.edu> wrote:
...
1. newbie questions (Michael Rusch)
----------------------------------------------------------------------
Message: 1
Date: Tue, 19 Aug 2008 16:53:27 -0500
From: Michael Rusch <mcrusch@wisc.edu>
Subject: [galaxy-user] newbie questions
To: galaxy-user@bx.psu.edu
Message-ID: <8085BDD01E3A4A40A0F01E5C73BBA505@gel.local>
Content-Type: text/plain; charset="us-ascii"
We're strongly considering switching to Galaxy from a piece of home-built
software that we're in the process of developing.  So, I have a couple of
newbie questions to see what people's experience is.
How does Galaxy scale?  Does anybody have experience with scaling to
thousands of datasets, or working with datasets in the hundreds of
megabytes?
We have traditionally done most of our work using a MySQL backend.  I
haven't (yet) received the green light from our sysadmin to install
Postgres, and I'm wondering if anybody has any experience running on MySQL.
Is it possible?  Are there pitfalls?
Has anybody by any chance implemented support for condor as a job scheduler?
-- 
python -c "foo = map(None,'moc.liamg@surazal.ssor'); foo.reverse();
print ''.join(foo)"

Ross

tags

participants (1)