I need a system with 2TB of working memory. Before you laugh, let me give a little explanation. We are simulating an entire rhesus macaque brain. To do this fast we may use the darwin cluster at cambridge (128 C1060 NVIDA GPU, 7TB RAM cluster wide, mellanox infiniband). We might also try something like GPUGrid using boinc, either with volunteers or on our campus using CPU's.

unfortunately brain activity is not easily parcelled. Everything is related to everything else and happens at the same time. that's sort of why it's different to a computer, certainly a von-neumann computer at any rate. I digress.

What I'd like to do is build the cheapest system I can to test. clearly getting 2TB of RAM together even in multiple machines is very very costly. So, I was thinking of the following strategy:

Tyan make a barebones that can take 8 GPU's at full x16 speed, 2 xeons (one per bus) and I think 148GB of RAM.

Now, bear in mind that the GPU's will

a) be very busy computing things
b) move data to and from host ram across the PCIe bus only at x8 speed (x16 if data moves only in one direction, but it will be moving in both)
c) will be moving data constantly (they can compute at the same time, it's asynchronous, but they cannot compute with data they don't have and they get it over PCie)

the PCIe is a huge bottleneck (and nvidia, intel and co are working hard to try to get parallel co-processing devices like this behind that bus. the intel guy wouldn't say when - his device was cool though btw; basically 32 or 64 or something PIII chips on a pcie card with some onboard RAM, I digress again)

so:

what I thought was, replace two of eight GPU's with two really good pciex8 RAID cards. do x16 exist?

to those cards, attach many small (like 60GB or so) SSDs.

RAID0 them together with a stripe size of 4k (linux's memory page size)

the idea being that each RAID card could now send data at pciex8 speed, and hold 1TB.

format both RAID cards/devices as SWAP (one on each side of the machine, i.e. one per bus, one per cpu, one per bank of DIMMS)

linux can use swap space intelligently across multiple devices.

then bascailly, just pretend to have 2TB of RAM. but in fact just have a PCIex8 speed paging system and 148GB of RAM.

Like a cache hierarchy GPULOCAL<->GPUGLOBAL<->HOSTRAM<->RAIDSWAP

does that sound viable? would the RAID system hold together? what would be a good RAID card to look at?

cheers,

C