A cost effective way to fill it up
http://www.servethehome.com/Server-d...ode-8-sockets/
rgds,
Andy
Printable View
A cost effective way to fill it up
http://www.servethehome.com/Server-d...ode-8-sockets/
rgds,
Andy
Yup, I am indeed. Was planning to provision the gear in a datacenter in Irvine, but the cost of bandwidth there and lack of 10G and transit options swayed us away. We currently have a 10G uplink per rack for our use and is it fun to play with. If only I can get my own dwdm gear setup from LA to Irvine =)
Andy your killing me man, I would love a bunch of those LOL :)
Dang those are very nice prices tho !
Yeah I am not sure how they do it connection wise, but their server setup is very impressive. It's all their's, no hosted space.
Bought one with 2 nodes each holding dual L5520's and 48Gigs of ram. New ESXi nodes :D
Soon as i get the server I'll post pictures from my rack cabin.
Goin to hold:
HP1800-24G
Dell2848
Dell C6100 XS23-TY3
Custom ZFS server in Supermicro 823
Dell PowerVault MD1000
HP Storageworks MSA70
5000VA APC Rack UPS
The last of the server triple is now done.
After a high capacity storage server and a fast data server, the only thing left was a compute server.
Due to their excellent double precision performance I waited for the availability of the GK110 Kepler GPUs.
The monitor is connected to a small VGA card to keep the 3 Titan's undisturbed crunching.
http://www.pbase.com/andrease/image/...2/original.jpg
rgds,
Andy
^^ Can I borrow just one card? Just one measly one? :D
:-)
Sure, stop by tommorrow for a traditional Vienna coffee and Sachertorte ....
It was quite "hard" to get 4 cards. 8 Asus cards were shipped to Austria in total until now.
4 cards 4 dealers. But they are fast. and silent. and cool. love it.
My son likes them as well. and borrowed one. only for an hour. yesterday :-)
Andy
5 pm. perfect .
Tomorrow is my birthday and family dinner with friends start at 6pm. You are welcome.
Too bad I can't send Wienerschnitzel and Apfelstrudel as email attachments :)
Anyway, how is your Supermicro Hyperserver project doing?
I do have a (serious) question: The E5 Xeons are locked down by Intel on BCLK, frequencies and mem speed (max 1600 MHz) . How is it possible, that Supermicro can run them at 1866 MHz mem speed? Are these special chips shipped by Intel? Any secrets you can share? Just curious.
with kind regards,
Andy
Hey MM, where is your monster rig at? Thought you would have it here by now. Pure E-peen in this thread. Makes my dually :eek2: at what I see and drool at.
Last parts just got here today..Case and these new monster HS
Just to complete the story. The full family is together now.
10.752 CUDA cores
24 GB GDDR5 RAM,
> 1 TB/sec aggregated memory bandwidth (in the cards)
ca. 6 TFlop/s (double precision), ca. 18 TFlop/s (SP) -
(This is roughly comparable to the #1 position of the Top500 list in 2000, the ASCII White machine for approx 110 million US$)
When PCI 3.0 support is turned on, each card can read/write with about 11 GB/sec on the PCI bus.
For full concurrent PCI bandwidth of all 4 cards, a dual socket SB machine is needed with its 80 PCI lanes and better main memory bandwidth
(With 1600 MHz DDR3, my dual socket SB delivers ca. 80 GB/sec with the stream benchmark)
So, depending on the GPU workload, a LGA 2011 system might be ok (when compute / device memory bound) or a dual-SB board is needed when I/O bound.
http://www.pbase.com/andrease/image/...8/original.jpg
cheers,
Andy
There is a known issue with SandyBridge-E CPU's and NVidia cards.
When Intel released their CPU's they where capable of PCIe 3.0 but yet not certified for the 8GT/s speed. NVidia claimed that there where a lot of timing variations in diverse chipsets and forced the Kepler cards on those CPUs and motherboards to PCIe 2.0 speed. Later they released a little utility where users could "switch" their systems to run in the faster PCI 3.0 mode.
Here is the utility:http://nvidia.custhelp.com/app/answe...n-x79-platform
Just use the GPU-Z utility to check which speed your system is using and use the utility.
Generally speaking:
The GTX Titan in its original mode (2.0) had 3,8 GB/sec write speed and 5,2 GB/sec read spead (tested with the utility from the CUDA SDK version 5.0)
After switching the system to 3.0, both read and write are now in the 11 GB/sec range.
People complain often about the sub-linear scaling of SLI and Triple-SLI systems, with sometimes negative scaling when a fourth card is added.
If the application is using a lot of PCI bandwidth, the memory bus gets quickly overloaded by the demands of the graphic cards and the CPU.
Some numbers:
Max theoretical memory bandwidth (Max. theoretical = Guaranteed not to be exceeded)
LGA-1155 socket with DDR3-1600 = 25,6 GB/sec (2 mem channels)
LGA-2011 socket with DDR3-1600 = 51,2 GB/sec (4 mem channels)
Dual LGA -2011 sockets with DDR3-1600 = 102,4 GB/sec (8 mem channels)
Practical limits are strongly impacted by the memory access pattern and can range from 20% to 80% of the max speed.
With the Stream benchmark, 80% seems to be the upper bound.
PCI speed.
Modern CPU feature PCIe 3.0 capabilities, with 1 GB/sec read and (concurrently) 1 GB/sec write speed per PCI-Express lane. So, a x16 PCIe 3.0 socket has a combined I/O speed of 32 GB/sec (16 read and 16 write), completely overwhelming the associated memory speed of an LGA-1155 system. If maximum I/O speed is to be achieved, the bottleneck memory bus has to be upgraded. This can be done with the LGA-2011 socket, which provides up to 40 GB/sec mem speed (measured by stream). "Unfortunately" the LGA-2011 has 40 PCI lanes which - if used effectively - would saturate the 4 memory channels of this system as well. This is what happens when multiple high I/O capable cards are being used (i.e. graphic cards). Even if the memory system would be able to provide enough bandwidth to the PCI subsystem, the CPU does need to compete for memory access as well. A further problem is the cache hierarchy in systems. To maintain memory coherency between what the CPU thinks is stored in the main memory and what devices see, the cache need to be updated or flushed if an I/O card is updating main memory. Which as a consequence, would increase the memory access times of the CPU to these memory addresses significantly (up to 10 fold).
Some relief comes with dual socket LGA 2011 systems. Combined memory bandwidth doubles. Great. If all 4 GTX Titans would transmit data at the same time, there would still be some memory bandwidth be available for the 2 CPUs. To mitigate the aforementioned cache problem, Intel introduced in the dual socket Xeon Systems (Romley platform) a feature called DataDirect. Like in the single socket LGA-2011 system, data from I/O cards are written via the cache to main memory. To avoid that the cache gets completely flushed (easy when the cache is 20 MB and the transfer is i.e. 1 GB ), the hardware reserves only 20 % of the cache capacity for the I/O operation. Leaving enough valid cache content in place that the CPU can work effectively with the remaining capacity. Consequence: Much better and predictable CPU performance during high I/O loads.
One problem is currently not well addressed in these systems. NUMA and I/O affinity. It will take time until applications like games will leverage the available information they could derive via the operating system how the architecture of the system they run on really looks like.
Some examples:
1) If the program thread runs on Core 0 (socket 0) and the main memory is allocated on its own socket = great. if memory is allocated on the other socket, a performance hit settles in.
2) With Sandy/IvyBridge the PCI root complex is on die, creating much better performance on dual socket systems, but also dependencies. If your GPU is connected to a physical PCI slot which is connected to socket 0 and the program in need of the data resides in memory of socket 0, things are great. If the target memory is stored in socket 1, the data of the GPU (connected to socket 0) has somehow to get to socket 1. Here comes QPI (Quick Path Interconnect). If QPI is set in your ROM to energy efficient sleep modes, it always has to wake up to transfer data. Keep it alive for max performance.
It is simple:
For compute bound problems look for the right CPU. For data or IO bound problems look at the memory and IO architecture (and basically forget the CPU)
cheers,
Andy
http://www.pbase.com/andrease/image/...8/original.jpg
geek :banana::banana::banana::banana:! :rofl:
^^ You got that right..I see those four cards and think that one system would have heated my whole house this winter! :D
I was thinking,"Holy cannoli, this rig has more computing power with the GPU's than the early supercomputers".
and, "Man, I paid less for my first car than this rig probably cost".
and finally: "What the heck is he going to do with all of this computing power?"
Thanks for the fact regarding your system, it's mind-blowing to think of the power you can fit into a system a decade later and can only make you wonder what the computer will be capable of in the another 10 years. Only if you could go back in time with your machine and sell it to the highest bidder!!!!
Hey guys.. just finished up the build for my Storage server...
Its nothing compared to the monsters that run loose here :p:
The config is pretty self explanatory from the pictures :)
Basically its the following:-
Cosmos s2
i3-2120
Maximus 5 formula
Lsi 9261-8i Raid Card ( Setup in Raid 5)
24TB of WD Red Hdd's
Plextor 128GB SSD For Boot
Ax 850
16GB Ram
Gtx 650
H80
The Pictures :)
http://i.imgur.com/A3fBrz0.jpg
http://i.imgur.com/rIkYUue.jpg
http://i.imgur.com/BlGni7X.jpg
http://i.imgur.com/Ts9Wuui.jpg
http://i.imgur.com/yRsXzGO.jpg
http://i.imgur.com/FKQhxCu.jpg
http://i.imgur.com/0BrvZYJ.jpg
http://i.imgur.com/zcB0cme.jpg
http://i.imgur.com/59HlPSD.jpg
http://i.imgur.com/rHZNgQs.jpg
http://i.imgur.com/pseAUK8.jpg
http://i.imgur.com/nxmXPWt.jpg
http://i.imgur.com/zNPo1al.jpg
Cheers and Kind Regards Always ! :up: