Thanks Alex for updating your lists.
As I am the one "floating" around :-), I thought there might be some interest in the xtremesystems community to share a few words about the 2 systems I built for a personal study project in the last 4 months, which I used here.
Traditionally and in the last few years, PCs grew their computational capabilities faster than any other part of the overall system. Main memory bandwidth grew slower than mips and mflop/s. I/O latency and bandwidth advances grew significantly slower than sheer compute power, etc .... So, from an I/O perspective, systems became more and more "unbalanced" for data intensive workloads. Yet the amount of data to be processed grew at least as fast as the development of CPU capabilities. Another imbalance continue to happen was in hard disks. The disk capacity grew faster than sustainable bandwidth and even faster than the reduction in latency.
When I started the project to built an I/O balanced workstation, I took advantage of - from my POV - a few significant developments in recent months, which in combination allow much improved data handling improvements. I'd like to list just 4 components which contribute to the improvements:
- The new I/O architecture of latest generation Sandy Bridge CPU's, allowing a massive increase in I/O capabilities
- The latest generation of low-cost SAS/SATA hostbus adapters, which are not impeding the performance of operating parallel SSDs
- The performance characteristics of indiv?dual SSDs are well known. But there is much less experience in parallel configurations in PCs with an I/O speed of over 20 GB/sec
- The final availability of Windows Server 2012 with a much improved I/O and networking subsystem
This is not the place to go deeper in I/O, but I used the PCs to let it compute Alex Yee's excellent ycrunch application as a background task. (All runs in memory were on an otherwise idle machine)
Not only is ycrunch well optimized on the computational front, but its I/O subsection is able to hit peak transfer rates of 12 GB/sec and more.
I am currently writing a paper (mostly on weekends) to describe the systems and its performance characteristics of the HW and SW environment in more detail, but a few words about the 2 systems which are in a kind of constant reconfiguration state:
1) The single socket PC
CPU: i7-3960K
MB: Asus P9-X78 WS
Mem: 8 x 8 GB Kingston DDR3-1600
Disk controller: 4 x LSI 9207-8i (each with 8 x 6GBit/s SAS/SATA ports)
Data SSDs: 32 x Samsung 830 (128GB)
OS: SanDisk SSD 240 GB
2) The dual socket PC
CPU: 2 x E5-2687W
MB: Asus Z9PE-D16 (4 x GBit LAN ports)
Mem: 16 x 16 GB Kingston ECC DDR3-1600
Disk controller: 6 x LSI 9207-8i (ea. 8x SATA/SAS ports)
Data SSD: 48 x Samsung 830 (128GB)
OS: SanDisk SSD 480GB
Disk controllers and data SSDs are shared betwen the 2 PCs, depending on requirements.
Some comments on the numbers and observations during the runs:
- "Small" sizes of Pi (below 100m) achieve better performance when HT is disabled
- Overall efficiency expressed as % of peak is more challenging on NUMA machines (vs. single socket machines with one physical memory space)
- The 1 trillion pi run generated close to 500 TB of data transfer (avg 725 MB/sec over the total run time and > 12 GB/sec peak)
- The Sandy Bridge architecture is an excellent platform for high I/O apps (either dedicated I/O application, or as part of a combined compute/IO application like ycrunch)
- The new generation of low cost SAS HBA controllers offer much better scaling than previous generation controllers
- As said, the machines are in a constant flux of configurations. The runs were done with I/O system configurations ranging from 0 to 48 SSDs
- Long running applications like ycrunch with algorithmic error detection show the value of ECC in RAM
- To keep the CPUs safe with this computationally demanding application, I ran them below 60 degree Celcius.
I've tried to aggregate the data in the list below as accurately as possible, please let me know of any potential error.
With that, thanks to Alex for his great application, and to all community members, enjoy the fascinating world of computing,
Andy
PS:
In the spirit of transparency and as I mentioned a product of my employer.
In my day job, I am currently working as Regional Technology Officer in Microsoft's field organisation in Western Europe.
Full size to download
https://jzykpa.sn2.livefilestore.com...as%20Ebert.jpg
Depending of the state of the application, the 16 physical cores (plus HT) were quite busy
http://www.pbase.com/andrease/image/...1/original.jpg
During I/O intense times, the CPU graphs look differently
http://www.pbase.com/andrease/image/...8/original.jpg
One snapshot while the application was writing faster than 12 GB/sec
http://upload.pbase.com/image/146267297/original.jpg