Local Galaxy concept system: hardware spec questions
Hi all, I have a couple of question around the topic "hardware requirements" for a server which is intended to be bought and used as concept machine for NGS-related jobs. It should be used for development of tools and workflows (using Galaxy, sure) as well as platform for some "alpha" users, who should learn to work on NGS data, which they just began to generate. This concept phase is planned to last 1-2 years. During this time main memory and especially storage could be extended, the latter on a per-project basis. We will start with a small team of 3 people for supporting and developing Galaxy and system due to the user's requirements, and the first group of users will bring in data, scientific questions and hands-on work on their own data. Main task (regarding system load) will be sequence alignment (BLAST, mapping tools like BWA/Bowtie), and after that maybe some experimental sequence clustering/de novo assembly for exome data. Additionally variant detection in whatever form are targeted. Only active projects will be stored locally, data no more in use will be stored elsewhere in the network. So far for the setting, regarding the specs the following is intended: - dual-CPU mainboard - 256 GB RAM - 20-30 TB HDD @ RAID6 (data) - SSDs @ RAID5 (system, tmp) Due to funding limitations it may be the case that RAM has to be decreased to 128 GB, not solved is currently the question, if it will be enough for those SSD bundle in RAID5, maybe we have to go for only two of them in RAID1. What we try to find out is, where in those described tasks the machine would run into bottlenecks. What's pretty clear is that I/O is everything, already by a theoretical point of view. But we also observed that on a comparable machine (2x 3,33 Ghz Intel 6-core, 100GB RAM, 450 MB/s R/W to data RAID6). The question of questions is right at the beginning of configuring a system, if one should go for an AMD or an Intel architecture system. The first offers more cores (8-12) at a lower frequency (~2,4 Ghz), the latter less cores (6) with higher frequency (~3,3 Ghz). Due to the data sheets, the Intel CPUs are on a per-core basis ~30% faster with integer operations and ~50% faster with floating point. The risk we see with the AMDs is on the one hand that the number of cores per socket could saturate the memory controller, and on the other hand those jobs, which can not or only poorly be parallelized need more time. To bring all this to some distinct questions (don't feel forced to answer all of them): 1. Using the described bioinformatics software: where are the potential system bottlenecks? (connections between CPUs, RAM, HDDs) 2. What is the expected relation of integer-based and floating point based calculations, which will be loading the CPU cores? 3. Regarding the architectural differences (strengths, weaknesses): Would an AMD- or an Intel-System be more suitable? 4. How much I/O (read and write) can be expected at the memory controllers? Which tasks are most I/O intensive (regarding RAM and/or HDDs)? 5. Roughly separated in mapping and clustering jobs: which amounts of main memory can be expected to be required by a single job (given e.g. Illumina exome data, 50x coverage)? As far as I know mapping should be around 4 GB, clustering much more (may reach high double digits). 6. HDD access (R/W) is mainly in bigger blocks instead of masses of short operations - correct? All those questions are a bit rough and improved (yes, it IS a bit of a chaos currently - sorry for that), but any clue to a single question would help. "Unfortunately" we got the money to place the order for our own hardware unexpectedly quick, and we are now forced to act. We want to make as few cardinal errors as possible... Thanks a lot in advance, Sebastian -- Sebastian Schaaf, M.Sc. Bioinformatics Chair of Biometry and Bioinformatics Department of Medical Information Sciences, Biometry and Epidemiology University of Munich
On Mon, Aug 13, 2012 at 11:23 AM, Sebastian Schaaf <schaaf@ibe.med.uni-muenchen.de> wrote:
Hi all,
I have a couple of question around the topic "hardware requirements" for a server which is intended to be bought and used as concept machine for NGS-related jobs. It should be used for development of tools and workflows (using Galaxy, sure) as well as platform for some "alpha" users, who should learn to work on NGS data, which they just began to generate. ...
Duplicate thread on the SEQanswers forum: http://seqanswers.com/forums/showthread.php?t=22456 Peter
Yes, thanks, I should have mentioned that. I posted in both forum and dev-list, because I don't expect the forum members and the dev-list subscribers to be a 100% identical... Sorry for any inconvenience... Peter Cock wrote:
On Mon, Aug 13, 2012 at 11:23 AM, Sebastian Schaaf <schaaf@ibe.med.uni-muenchen.de> wrote:
Hi all,
I have a couple of question around the topic "hardware requirements" for a server which is intended to be bought and used as concept machine for NGS-related jobs. It should be used for development of tools and workflows (using Galaxy, sure) as well as platform for some "alpha" users, who should learn to work on NGS data, which they just began to generate. ... Duplicate thread on the SEQanswers forum: http://seqanswers.com/forums/showthread.php?t=22456
Peter
-- Sebastian Schaaf, M.Sc. Bioinformatics Chair of Biometry and Bioinformatics Department of Medical Information Sciences, Biometry and Epidemiology University of Munich Marchioninistr. 15, K U1 (postal) Marchioninistr. 17, U 006 (office) D-81377 Munich (Germany) Tel: +49 89 2180-78178
Hey Sebastian- It may help to consider other pieces aside from compute nodes that you will need, such as nodes for proxies and databases, networking gear (such as switches and cables), and so on. http://usegalaxy.org/production has some details, and there are high-level pieces explained at http://wiki.g2.bx.psu.edu/Events/GDC2010?action=AttachFile&do=get&target=GDC2010_building_scalable.pdf You should also talk to your institution's IT folks about power requirements, how those costs passed on, off-site backup storage (though it sounds like you're counting on RAID 5/6), etc. It also may help if folks could share their experiences with benchmarking their own systems along with the tools that they've been using. The Galaxy Czars conference call could help - you could bring this up at the next meeting. I've answered inline, but in general I think that the bottleneck for your planned architecture will be I/O with respect to disk. The next bottleneck may be with respect to the network - if you have a disk farm with a 1 Gbps (125 MBps) connection, then it doesn't matter if your disks can write 400+ MBps. (Nate also included this in his presentation.) You may want to consider Infiniband over Ethernet - I think the Galaxy Czars call would be really helpful in this respect.
1. Using the described bioinformatics software: where are the potential system bottlenecks? (connections between CPUs, RAM, HDDs)
One way to get a better idea is to start with existing resources, create a sample workflow or two, and measure performance. Again, the Galaxy czars call could be a good bet.
2. What is the expected relation of integer-based and floating point based calculations, which will be loading the CPU cores?
This also depends on the tools being used. This might be more relevant if your architecture were to use more specialized hardware (such as GPUs or FPGAs), but this should be a secondary concern.
3. Regarding the architectural differences (strengths, weaknesses): Would an AMD- or an Intel-System be more suitable?
I really can't answer which processor line is more suitable, but I think that having enough RAM per core is more important. Nate shows that main.g2.bx.psu.edu has 4 GB RAM per core.
4. How much I/O (read and write) can be expected at the memory controllers? Which tasks are most I/O intensive (regarding RAM and/or HDDs)?
Workflows currently write all output to disk and read all input from disk. This gets back to previous questions on benchmarking.
5. Roughly separated in mapping and clustering jobs: which amounts of main memory can be expected to be required by a single job (given e.g. Illumina exome data, 50x coverage)? As far as I know mapping should be around 4 GB, clustering much more (may reach high double digits).
Nate's presentation shows that main.g2.bx.psu.edu has 24 to 48 GB per 8 core reservation, and as before it shows that there is 4 GB per core.
6. HDD access (R/W) is mainly in bigger blocks instead of masses of short operations - correct?
Again, this all depends on the tool being used and could help with some benchmarks. This question sounds like it's mostly related to choosing the filesystem - is that right? If so, then you may want to consider a compressing file system such as ZFS or BtrFS. You may also want to consider filesystems like Ceph or Gluster (now Red Hat). I know that Ceph can run on top of XFS and BtrFS, but you should look into BtrFS's churn rate - it might still be evolving quickly. Again, a ping to the Galaxy Czars call may help on any and possibly all of these questions. Good luck! -Scott
Hey Scott, First of all thanks for the long reply - to keep it short I'll follow you with answering inline: Scott McManus wrote:
Hey Sebastian-
It may help to consider other pieces aside from compute nodes that you will need, such as nodes for proxies and databases, networking gear (such as switches and cables), and so on. http://usegalaxy.org/production has some details, and there are high-level pieces explained at http://wiki.g2.bx.psu.edu/Events/GDC2010?action=AttachFile&do=get&target=GDC2010_building_scalable.pdf Thanks, I read through it, that is some evidence. You should also talk to your institution's IT folks about power requirements, how those costs passed on, off-site backup storage (though it sounds like you're counting on RAID 5/6), etc. One non-technical note regarding the organization (techies: skip that): This is right the point we're on currently - we had a first non-technical conversation ~1 month ago, and in the last days suddenly funding was released and led to "zugzwang" (as far as I read it also describes in English the force to (re)act). The structure is roughly as follows: there is the IT provider for the complete hospital campus (consisting of several clinics and some medical school institutions; we belong to the latter) and our own institute's IT, which serves internally science and research. We had hours of chats inside our institute and agreed that we are neither able nor willed to manage everything on our own (the system is intended for everyone doing NGS research at the campus). This main IT was not integrated in the announced non-technical conversation.
Regarding the technical environment everything is on the way, today we'll have another meeting (the "main" IT folks are bothered by our targeted "custom" hardware). Backup is also part of conversations in September, we don't want to count on RAID6 alone. This topic is additionally very politics-driven (who pays for what?)... Technically the need is no question.
It also may help if folks could share their experiences with benchmarking their own systems along with the tools that they've been using. The Galaxy Czars conference call could help - you could bring this up at the next meeting.
Fortunately I joined the Czars group from the first meeting and also took part at the GCC2012 breakout session. You're absolutely right. Too bad that is of so short time until we have to act - that's the reason why I included the whole list, hoping that anyone did some benchmarking. We planed to, but our first server behaved quite "moody"... Sharing some experiences or "hard fact values" including system specs would be great for other people who are at the point to order hardware and are forced to state some reasons.
I've answered inline, but in general I think that the bottleneck for your planned architecture will be I/O with respect to disk. The next bottleneck may be with respect to the network - if you have a disk farm with a 1 Gbps (125 MBps) connection, then it doesn't matter if your disks can write 400+ MBps. (Nate also included this in his presentation.) You may want to consider Infiniband over Ethernet - I think the Galaxy Czars call would be really helpful in this respect.
Planned for the HDD connection is a RAID controller offering 1 GB/s - the array on our first server btw delivered 450 MB/s (measured). Network should not be the problem for the concept, it is intended to be relatively autarkic. Network load will only appear while loading data from an archive or the sequencer itself. A 10 Gbit/s connection is available. InfiniBand was considered for a short time but would exceed the current funding. A cluster is available, but the connection speed is quite low (due to usage more for statistical analyses).
1. Using the described bioinformatics software: where are the potential system bottlenecks? (connections between CPUs, RAM, HDDs) One way to get a better idea is to start with existing resources, create a sample workflow or two, and measure performance. Again, the Galaxy czars call could be a good bet.
This is what we wanted to do (see above), but we did not get so far due to the announced technical issues (RAID controller, HDD crashes etc.)
2. What is the expected relation of integer-based and floating point based calculations, which will be loading the CPU cores? This also depends on the tools being used. This might be more relevant if your architecture were to use more specialized hardware (such as GPUs or FPGAs), but this should be a secondary concern.
3. Regarding the architectural differences (strengths, weaknesses): Would an AMD- or an Intel-System be more suitable? I really can't answer which processor line is more suitable, but I think that having enough RAM per core is more important. Nate shows that main.g2.bx.psu.edu has 4 GB RAM per core.
4. How much I/O (read and write) can be expected at the memory controllers? Which tasks are most I/O intensive (regarding RAM and/or HDDs)? Workflows currently write all output to disk and read all input from disk. This gets back to previous questions on benchmarking. Inspired by a tool calling BWA several times ('Stampy') on one memory-mapped file our idea was that, due to the HDD bottleneck, RAM should be available as much as possible. Therefore our medium-term idea would be to optimize I/O-intensive workflows by wrappers enabling memory-mapping as far as possible. As far as I understood until now, the CPU<->RAM I/O would be a bottleneck in case of trimming, simple filter steps etc. - everything were masses of data/strings don't really challenge the CPU. As long as
From plain theory I would expect the Needleman-Wunsch algorithm to be of high relevance, which should be integer calculation, basically. In the case of pairwise sequence alignment. MSAs may be different (may require floating point calcs). Unfortunately, GPU and/or FPGA usage are currently far out of range of this first concept, but in the back of my mind for a longer time :). In the standard CPU setting/environment I would suppose also the I/O between CPU and RAM to be a bottleneck for tasks which are not that CPU intensive (more details down below). the source data is already in the main memory and not to be loaded from disk.
5. Roughly separated in mapping and clustering jobs: which amounts of main memory can be expected to be required by a single job (given e.g. Illumina exome data, 50x coverage)? As far as I know mapping should be around 4 GB, clustering much more (may reach high double digits). Nate's presentation shows that main.g2.bx.psu.edu has 24 to 48 GB per 8 core reservation, and as before it shows that there is 4 GB per core.
Than I should be quite save with 128+ GB of RAM. Sure, if one core has to request RAM outside of it's own adress space, via a neighbored core, speed goes significantly down. But this should appear rarely in our setting.
6. HDD access (R/W) is mainly in bigger blocks instead of masses of short operations - correct? Again, this all depends on the tool being used and could help with some benchmarks. This question sounds like it's mostly related to choosing the filesystem - is that right? If so, then you may want to consider a compressing file system such as ZFS or BtrFS. You may also want to consider filesystems like Ceph or Gluster (now Red Hat). I know that Ceph can run on top of XFS and BtrFS, but you should look into BtrFS's churn rate - it might still be evolving quickly. Again, a ping to the Galaxy Czars call may help on any and possibly all of these questions. It was less about considering the filesystem (we recently abolished ZFS due to issues, which afterwards turned out to (maybe) refer back to some strange hardware behaviour), we go for EXT4 currently. Gluster or Ceph may get interesting when we go for an extended system, after having built up a working stand-alone concept. I'll come back to you for that. And also to the Czars group where this topic came up as one of the first ones.
Good luck!
-Scott Thanks, I'll need some of that.
UPDATE: in the meantime I had this meeting, we gained some more days. At least until mid of next week. Anyway, if someone could offer some measured values or experiences I would be very glad. Thanks, Sebastian -- Sebastian Schaaf, M.Sc. Bioinformatics Chair of Biometry and Bioinformatics Department of Medical Information Sciences, Biometry and Epidemiology University of Munich Marchioninistr. 15, K U1 (postal) Marchioninistr. 17, U 006 (office) D-81377 Munich (Germany) Tel: +49 89 2180-78178
participants (3)
-
Peter Cock
-
Scott McManus
-
Sebastian Schaaf