The minimum requirements for Galaxy are pretty much any recent model
computer. A basic install will load and run within a very short amount
of time, following the instructions at:
Advanced tools come from the Tool Shed. These are installed separately.
No programming is needed, but some very basic unix skills are required,
as they would be for ongoing server maintenance. The Tool Shed is one of
the best documented parts of Galaxy, so you should be able to find all
the answers here, or this is the right list to use for questions about
local installs (galaxy-dev(a)bx.psu.edu):
However, running a production instance that is intended to run compute
intensive tools (such as Tophat2), and where you have some throughput
goals, will require more substantial resources. This is always a
difficult question to answer since so much depends on the tools used and
the data volume. But in general, minimum requirements are about around
the same as what those underlying tools would require on their own, if
run line-command. So for Tophat or Tophat2, and really the entire Tuxedo
RNA-seq tool package, you might be able to get by on 8G memory and 2 or
4 cores. But, it will probably be slow and if you are running replicates
through Cuffdiff at the end, you might run out of memory if the files
are large and the genome is large (such as human). And if you are are
hosting a web Galaxy at the same time with visualizations and such,
well, this is why systems are often set up with clusters. With all of
these will be competing for the same resources, going low can work, but
this will be something that will have to be managed/tested, and it will
change through time as tools upgrade.
You could test this out by setting up a cloud instance with the hardware
you plan to use, loading some of your data, and running your workflow to
see how this benchmarks. Have users on the instance while you are doing
this - to judge performance. Cloud installations come with many of the
advanced tools and data indexes already installed/configured, so this
would be less investment than buying the hardware first and finding out
later it was not not enough.
And of course Slipstream Galaxy is an option. The whole intention here
is to make a complete package with tools & data already configured, in a
system that has enough compute capability to do the work, for
scientists/labs who do not want to deal with administrative tasks or
work in the cloud (for whatever reason).
Good luck with your decision! Other are welcomed to comment about how
they have set up their system.
On 7/15/13 7:14 AM, Zain A Alvi wrote:
I hope this reaches you well. I have a small question in regards to
setting up a galaxy server. My mentors and I are looking into buying a
server for doing NGS analysis through the use of Galaxy. We saw that
slipstream's specifications for hardware to be the following:
CPU: 2x Intel Xeon Processor E5-2690, 8 core (16 cores total)
RAM Memory: 12x 32GB RDIMM (384 GB) with option to upgrade it to 512 GB
Storage (Hard Drive space): 7x 3TB SAS 6 Gbps (16 TB usable) with 1 x
100GB Solid State Disk
Power: Dual Pedundant Power Supplies
Network: Dual Gigabit Network Adapter
We are wondering what can be the minimum server hardware
specifications that Galaxy be run on.
Our second question is if we install Galaxy on the server, do all the
tools currently available on Galaxy come pre-installed on it or do we
have to program (Via Perl and Python) and install each of those tool
sets ourselves. If we have to install those tools ourselves, is there
a guide that we can do so? Lastly, how can we upgrade the tool sets
such as Tophat 1.44 to Tophat 2 on this server. I was wondering about
the last question as Tophat 1.44 is available on the main Galaxy
server whereas Tophat 2 is available on the test Galaxy server.
Sorry for so many questions. Thank you again for all the great help.
Galaxy Support and Training