# Thread: what kind of cooling would I need to OC 3930K to 5-5.5 GHz?

1. ## what kind of cooling would I need to OC 3930K to 5-5.5 GHz?

what kind of cooling would I need to OC 3930K to 5-5.5 GHz?

I'm going to guess that the Corsair H80/H100 probably won't be sufficient solutions for me to run a 3930K @ upwards of 5.5 GHz 24/7 (it'll be doing a mix of CFD/FEA), so I'm looking for suggestions for suitable alternative solutions.

Board is likely going to be the Asus X79 Sabertooth. It'll also likely have 64 GB of DDR3-1600 as well (I forget the brand, but I think that it's either Geil or G.Skill or something like that. I have to go look through the other threads to fish that info out.)

My engine combustion CFD is taking wayyy too long. (current runs are sitting at a minimum of 5-days-a-piece).

2. A phase change unit.
You won't get to 5.5 and maintain stability unless you have a very decent chip and you don't care about it's life. 5 GHz is doable. You'd have to worry about condensation as well, and say good bye to using a case properly.

http://www.frozencpu.com/products/62...E-48-S-1C.html

Here is a list of overclocking results on HWBOT.org specifying cooling method etc. for the wPrime 1024M benchmark, it is about a 100 second 100% load on all cores. It (the benchmark) doesn't need to be completely stable, but for your application it would probably be okay to say that you could take 300 MHz off of those results and be safe.
http://www.hwbot.org/benchmark/wprim...0#coolingType=

I want to note that these CPUs have "coldbugs" where they won't operate at a certain temperature, it usually ranges from -30 to -80c. Good (or bad) news is, most of them don't scale clockspeed beyond about -30c because of a multiplier limit that can also be different between each CPU. Very few do 57, most are in 51-53 range.

Break a leg.

3. With the Corsair coolers you're looking at closer to the 4.5 GHz range than 5.5 GHz. You'll need to be well sub-zero with a really good chip (some top out at around 5.2 GHz regardless of cooling) to reach 5.5 GHz, and even then stability might be an issue.

4. i vote vapor phase change.

5. I agree with everyone else, 5ghz+ will require phase change.

As BeepBeep said, with phase you will have to worry about condensation. It can work, but you will need to do a very good insulation job, and also keep the humidity low in the room.

6. A phase change with the ability to handle a large load SB-E over 5GHz does not sip power......

If a phase unit is tuned to X watts, asking it to dissipate more does not result in higher temperatures, the unit will *probably* cut out until the pressures drop. A phase builder can add a lot more info.

7. Stupid question - can I run phase change units 24/7/365 @ 100% load?

How much power do phase change units consume?

8. Originally Posted by alpha754293
Stupid question - can I run phase change units 24/7/365 @ 100% load?

How much power do phase change units consume?
Well they work just like a refrigerator...or window air conditioner... etc.
You can run them 24/7, but I don't think anybody would recommend them for a mission critical setup

IMO you are better off just building a second computer like the one you already built and running two of the simulations at a time(?)

9. for what your wanting to do 5ghz is the only sane option and is doable with a small custom water setup, even the xspc rx360 rasa kit would get you there, and really the extra 500mhz is not going to do a whole lot for you. unless your prepared to spend a lot more on power for the cooling let alone the CPU's usage (it's a hungry beast) replacing hardware when it degrades, i'd bet on 6 months at 5.5ghz, maybe less, and making up lost time when it crashes will hold you up more than anything, i'd say it is pretty hard to get rock solid stability at 5.5ghz 24/7.

i'd say beep has it right, just get a 'farm' going on it or throw a couple of xeons in an SR-X, 2 xeons at whatever your max turbo on xeon's is will destroy most sims. i can't remember if your running multiple sims or just the one though, is it the same one over and over, can you set it up so that you can do one iteration on 1 machine and then make some changes and run it on the other, halves your time in one stroke. your simulation isn't single threaded is it, i remember we were talking about single threaded stuff in another forum thread.

whats the likelihood that an i7 3820 will run at 5375mhz on 2 cores at -10*c, not 24/7 but stable for when it's needed in single threaded apps? do most chips scale at -10*c or is that not cold enough yet? do you have to be lucky just to get a chip that will do 5375mhz even with SS or Cascade? that would be 43x100(x1.25 gear ratio) is it possible to get that kind of oc with offset voltage and p-states enabled? this would be with a tec chiller set up that would get me to -10*c with the heat load that an i7 3820 puts out, the tecs would take about 400watts of power though.

is there much difference with max stable oc between 2 and 4 cores enabled? power and heat should be way less with 2 i suppose, that's enough of a benefit. can you select which cores are enabled or is it always core# 0 and 1 when in 2 core mode? do these chips usually have pretty even cores in terms of oc or is there considerable variation between the 4?

10. Originally Posted by Liam_G
for what your wanting to do 5ghz is the only sane option and is doable with a small custom water setup, even the xspc rx360 rasa kit would get you there, and really the extra 500mhz is not going to do a whole lot for you. unless your prepared to spend a lot more on power for the cooling let alone the CPU's usage (it's a hungry beast) replacing hardware when it degrades, i'd bet on 6 months at 5.5ghz, maybe less, and making up lost time when it crashes will hold you up more than anything, i'd say it is pretty hard to get rock solid stability at 5.5ghz 24/7.

i'd say beep has it right, just get a 'farm' going on it or throw a couple of xeons in an SR-X, 2 xeons at whatever your max turbo on xeon's is will destroy most sims. i can't remember if your running multiple sims or just the one though, is it the same one over and over, can you set it up so that you can do one iteration on 1 machine and then make some changes and run it on the other, halves your time in one stroke. your simulation isn't single threaded is it, i remember we were talking about single threaded stuff in another forum thread.

whats the likelihood that an i7 3820 will run at 5375mhz on 2 cores at -10*c, not 24/7 but stable for when it's needed in single threaded apps? do most chips scale at -10*c or is that not cold enough yet? do you have to be lucky just to get a chip that will do 5375mhz even with SS or Cascade? that would be 43x100(x1.25 gear ratio) is it possible to get that kind of oc with offset voltage and p-states enabled? this would be with a tec chiller set up that would get me to -10*c with the heat load that an i7 3820 puts out, the tecs would take about 400watts of power though.

is there much difference with max stable oc between 2 and 4 cores enabled? power and heat should be way less with 2 i suppose, that's enough of a benefit. can you select which cores are enabled or is it always core# 0 and 1 when in 2 core mode? do these chips usually have pretty even cores in terms of oc or is there considerable variation between the 4?
Right now, it's the same sim over and over again because I'm debugging the run. I'm simulating the internal engine combustion ("87 pump gas", spark ignition; compression, ignition, and expansion cycles only though for now) and I've been having a whole slew of problems that I think pertain to the physics of the problem, but it's hard for me to tell because the sims take so long to run that I'm trying to cut back/down on the number of times that I have to run it.

I'm having problems getting the ignition/combustion to go/ignite properly, so I added an addition 360 crank angle degrees to give it more time for the air-fuel mixture to mix before trying to light it again, and with that, I'm able to get the ignition going, but then as it progresses, the flame propagation goes nuts (my maximum Mach number is 609 MILLION and considering that a deorbiting space shuttle re-enters at Mach 21, there's definitely something wrong). I've post-processed the results from that run and it looks like that it's actually "pinching" the air-fuel mixture/flame which is causing a very local node to go faster than the speed of light.

I've got a number of ideas for me to try to see if I can sort/iron this thing out, but at 5-days-per-run (it bombs out after 5 days), my progress is hampered by runtime.

I'm trying to see if there's an alternative to Xeon setups because of cost. (The Xeons systems that I've got priced out right now start at $6500, and goes up to$22,000.) I'm debating on whether I want to cluster systems or whether I want a monolithic install as there are pros and cons for both.

No, this simulation is highly parallelized. (The single threaded stuff was for the CAD design and geometry preparation that feeds into this).

And this isn't like a computer graphics animation where you can start from say...frame 500 and have it render from 501-1000 (while the other system starts at 0 and goes to 499). The solution of the preceeding timestep impacts the current solution.

My biggest 'concern' with clustering is that with highly parallel runs like this - I don't have a good way of measuring intercore traffic. So unless I bite the bullet and get myself QDR IB or FDR IB, using GbE as the network backbone will work if I know that there's not going to be that much traffic going through. And I don't know if I can say that with any degree of certainty because I've never been able to find a way to measure it during the course of a run.

But clustering does give me the flexibility to throw more hardware at it on a as-needed basis, while the other nodes can also be working on other stuff that may or may not necessarily be related to this. (Of all of the engineering involved in designing a car, I'm sure that I can keep the other nodes busy with all the other stuff that's needed to engineer a car).

And no, as it stands, the CFD still cannot run on GPGPUs. I wished it could though, but it can't/doesn't right now.

(Besides, having smaller number of really really fast cores means that there's significantly less intercore traffic (I think that it's either O(n^2) or O(n^3) number of connections).)

And I've done it once at my last job where my 6-core 990X that was OC'd to 4 GHz (using the stock air cooler) with 24 GB of RAM was enough to beat a 48-core AMD Opteron system with 128 GB of RAM. (I think that the Opterons were either 2.2 GHz or 2.4 GHz, no OC capable/available).

Originally Posted by BeepBeep2
Well they work just like a refrigerator...or window air conditioner... etc.
You can run them 24/7, but I don't think anybody would recommend them for a mission critical setup

IMO you are better off just building a second computer like the one you already built and running two of the simulations at a time(?)
I think that my residential (I wanna say that it's a 5 ton AC, but don't quote me on that) I think consumes 5 kWh.

And my last portable AC was about 1 kWh at max. I have no idea what the power consumption for my fridge is like.

11. Originally Posted by alpha754293
...
No, this simulation is highly parallelized. (The single threaded stuff was for the CAD design and geometry preparation that feeds into this).

And this isn't like a computer graphics animation where you can start from say...frame 500 and have it render from 501-1000 (while the other system starts at 0 and goes to 499). The solution of the preceeding timestep impacts the current solution.

My biggest 'concern' with clustering is that with highly parallel runs like this - I don't have a good way of measuring intercore traffic. So unless I bite the bullet and get myself QDR IB or FDR IB, using GbE as the network backbone will work if I know that there's not going to be that much traffic going through. And I don't know if I can say that with any degree of certainty because I've never been able to find a way to measure it during the course of a run.

But clustering does give me the flexibility to throw more hardware at it on a as-needed basis, while the other nodes can also be working on other stuff that may or may not necessarily be related to this. (Of all of the engineering involved in designing a car, I'm sure that I can keep the other nodes busy with all the other stuff that's needed to engineer a car).

And no, as it stands, the CFD still cannot run on GPGPUs. I wished it could though, but it can't/doesn't right now.

(Besides, having smaller number of really really fast cores means that there's significantly less intercore traffic (I think that it's either O(n^2) or O(n^3) number of connections).)

And I've done it once at my last job where my 6-core 990X that was OC'd to 4 GHz (using the stock air cooler) with 24 GB of RAM was enough to beat a 48-core AMD Opteron system with 128 GB of RAM. (I think that the Opterons were either 2.2 GHz or 2.4 GHz, no OC capable/available).

I think that my residential (I wanna say that it's a 5 ton AC, but don't quote me on that) I think consumes 5 kWh.

And my last portable AC was about 1 kWh at max. I have no idea what the power consumption for my fridge is like.
EVGA SR-X is an overclocking/enthusiast motherboard for Xeon processors...no need to buy a pre-built workstation/server, and no need to go with those types of motherboards from SuperMicro etc.
http://www.evga.com/products/moreInf...herboards&sw=5
...however overclocking will be limited compared to your 3930K due to a locked multiplier and such, and the cheaper Xeon E5s are 2-2.8 GHz, which unfortunately would not provide a very large speed boost for you even with the dual socket setup.

If you don't mind me asking, why exactly do you need to monitor inter-core data traffic?

My guess is that a phase change cooler would use several hundred watts, but nowhere near 1Kw, probably in the range of 200-500w and the compressor wouldn't be running all the time. The guys in the phase change section would be able to give you a better answer for sure.

12. You could always wait for this and just drop it in the system you have http://www.brightsideofnews.com/news...processor.aspx not sure how well it would scale on that though if your worried about usage per core.

13. Originally Posted by BeepBeep2
EVGA SR-X is an overclocking/enthusiast motherboard for Xeon processors...no need to buy a pre-built workstation/server, and no need to go with those types of motherboards from SuperMicro etc.
http://www.evga.com/products/moreInf...herboards&sw=5
...however overclocking will be limited compared to your 3930K due to a locked multiplier and such, and the cheaper Xeon E5s are 2-2.8 GHz, which unfortunately would not provide a very large speed boost for you even with the dual socket setup.

If you don't mind me asking, why exactly do you need to monitor inter-core data traffic?

My guess is that a phase change cooler would use several hundred watts, but nowhere near 1Kw, probably in the range of 200-500w and the compressor wouldn't be running all the time. The guys in the phase change section would be able to give you a better answer for sure.
The Supermicro workstations are barebones systems (board, chassis, power supply). The rest (CPU, RAM, hard drives, add-in cards) are up to you. They do it that way because how else are you suppose to get a chassis that'll take a half-width, 16.64" long board?

The other reason for me going to a quad-node, 2U Supermicro system: power efficiency. Fully decked out, I can be running eight 8-core processors (64 physical cores total, 128 logical cores) on 1-1.2 kW of power. If I get the systems with IB, I'll have upto 56 Gbps link in addition to having a dedciated IPMI connection. These are all things that the EVGA SR-X lacks. Plus, the Supermicro systems have a higher RAM limit than the SR-X. (256 GB vs. 96 GB max.). And the absence of GPGPU or other network cards or crypto accelerators or a LP PCIe SSD means that the 7 slots that the SR-X have won't be used at all. Well, I might need it for one video card because it doesn't have onboard integrated GPU either (compared to the Supermicros).

And I'm also pretty sure that when it's all said and done, the cost of getting the SR-X up and running won't be too far off from the Supermicro system, except that I can't bring additional nodes online pending funding/budget availability/allocation. In other words, the cost for me to get 4 nodes up with the SR-X would be the same as getting one node up, but the cost of getting the three other nodes for the Supermicro will be less than the first node (because I've already got the chassis).

I need to monitor the intercore traffic because that will tell me how much information is transferred during the coarse of a multiprocessor run.

If I have two boxes that represent cells of fluid flow, and let's say that the fluid (air, water, whatever) is flowing from left to right - the flow properties is going to transfer from one cell to the next.

Divide the cells in half (lengthwise), and that means that what happens to the fluid in the bottom cell is going to influence what happens to the fluid on the top cell, and then all of that propagates through the flow field.

If I were writing my own MPI code, I would be able to explicitly measure how much data is sent/received by the MPI calls themselves. (Or at least I can calculate it.) But because I don't have any control over COTS codes, so I'll have to rely on other methods like an intercore version of netstat in order to get me the network statistics.

Why is it important?

Because if I use four dual-socket nodes, and they're talking to each other via a GbE network link, I want to know if the GbE network will be enough or will the sheer volume of data going to saturate the link, and thus, the switch. That will also tell me whether I need to invest in something like IB QDR or FDR. (An 18-port IB QDR switch is another $6k). But if the amount of data transfer is low, even a 100 Mbps switch might work. I don't know. The other option (at the expense of expandability) will be to go with a four-socket monothlic install. It'll have 32-cores talking to each other through the QPI link (which has a peak theorectical bandwidth of 204.8 Gbps. (Substantially faster than a 56 Gbps IB FDR link). But that means that I'm going to have thread/process lock/contention issues (or I might), and if I am running two separate and distinct jobs on it, the two separate and distinct jobs are going to be fighting for resources (memory address space, QPI bandwidth since it can't carry multiple threads simultaneously). (Whereas if I bring a second node online, what's running on node0 could be completely independent from what's running on node1.) Originally Posted by Liam_G You could always wait for this and just drop it in the system you have http://www.brightsideofnews.com/news...processor.aspx not sure how well it would scale on that though if your worried about usage per core. More cores isn't ALWAYS better. And I'm probably guessing that I'll have more/better luck with them porting the program to GPGPU before going to that. 14. Interesting post However I was talking about the SR-X motherboard vs. a SuperMicro or Tyan motherboard and the potential to overclock Xeon CPUs with it vs only stock on most server/workstation boards. I wasn't talking about the barebones systems that SuperMicro produce. However you do make good points about how those would better suit your needs. http://www.newegg.com/Store/SubCateg...r-Motherboards 15. mabye chilled liquid might be a good logical route. if you had 1 or 2 cooling units taking the heat out of a large reservoir, then I would imagine you could cool to -20c to -30c 24/7 365 with less power than a dedicated phase change unit. The large reservoir adds the desired capacity, while an over-built cooling unit would be able to dissipate enough heat from the reservoir quick enough to cycle much like a refridgerator and not be in continuous use. once again, a phase builder would have to chime in here, but it doesnt take much to build a simple temp controller. one could probably even be salvaged from an ac unit (that might go into the build itself) if you feel like building a couple of nodes to accomplish your desired task, a chilled liquid system is friendlier to expand upon than a single stage vapor phase change system. traditional vapor phase change will have the compressor running at all times. contant on/off without allowing some time to equalizing the internal pressures will drastically shorten the life of the compressor. 16. Originally Posted by BeepBeep2 Interesting post However I was talking about the SR-X motherboard vs. a SuperMicro or Tyan motherboard and the potential to overclock Xeon CPUs with it vs only stock on most server/workstation boards. I wasn't talking about the barebones systems that SuperMicro produce. However you do make good points about how those would better suit your needs. http://www.newegg.com/Store/SubCateg...r-Motherboards Well, boards....barebones -- the barebones comes with the board. And for some of the boards, it's quite difficult to find a chassis for it otherwise because of the custom form factor. But from the OCing perspective, yes, you're absolutely correct. I'm just guessing at the speeds I would need to match sixteen 2.6 GHz cores (factoring some advantage of NOT having to spread the load over so many cores when using a 3930K -- non-linear scalability.) Besides, from a systems analysis perspective (what would it take for me to get one (or four) systems up and running) - I have to analyze it from that perspective also. For most people, even ONE of these systems is a lot. But I'm not most people. (How many people do you know talk about implementing 4x IB QDR at home? (Which BTW if I do that - the switch alone is about$6k and the NIC (each) are about $1k a pop. And the cable is about$100.))

It's SWaP and then some.

17. CFD work heavily numerical and not terribly difficult to distribute. I wouldn't push clocks to get more output. You should be looking to distribute the work. Invest in building a beowulf or buy one. There are decent oem solutions also. I do a lot of sim work on something called a powerwulf mde by these guys http://www.pssclabs.com/ - I've not actually physically seen it since it was booted. There are a good number of CFD codes written ready to distribute.

EDIT: I've always wanted to build a more up to date version of this: http://www.calvin.edu/~adams/research/microwulf/

18. Originally Posted by meanmoe
CFD work heavily numerical and not terribly difficult to distribute. I wouldn't push clocks to get more output. You should be looking to distribute the work. Invest in building a beowulf or buy one. There are decent oem solutions also. I do a lot of sim work on something called a powerwulf mde by these guys http://www.pssclabs.com/ - I've not actually physically seen it since it was booted. There are a good number of CFD codes written ready to distribute.

EDIT: I've always wanted to build a more up to date version of this: http://www.calvin.edu/~adams/research/microwulf/
Well, the problem with increased parallelization is that you run the risk of having more data variables that needs to be passed back and forth between the partitions. (And yes, there are a whole slew of methods and algorithms available on how to partition and I'll be honest - I haven't spent a great deal of time investigating each one and quantifying their impact on the analyses (still more busy/focussed on getting the physics of the simulation/modelling correct before getting into performance optimizations).)

I'm using MeTiS right now (although for my FEA runs, sometimes I'm using RCB).

Again, that's why I would be interested in measuring the volume or the rate of intercore traffic in order to put some numbers behind that statement. (I mean, it's been said that while you can always keep asking more processors/hardware to it, at some point, the benefits of adding more processors starts to taper off and asymtotically approach a limit before ONLY get marginal gains.)

More hardware isn't always necessarily better.

19. Sure, but even when you're throughput saturated, you're output is likely going to be greater than what you could get from a single board. Otherwise, companies and universities wouldn't be investing so much capital into HPC centers.

And I disagree, more hardware is always better... unless it's crushing you I guess.

20. they have phase change units built for 24/7 there are companies that make them now. one of them uses military grade compressors.

IMO look for one to handle like 250W load at -40Cish. R402A is good, that is what mine had.

21. with x79 chips you have to find the temp they like. for my 3930k to run 5.7ghz+ cpu test in 3dmark11 phase change fails (-45c) but ln2 at -20c passes.
I would recommend finding the temp your chip likes with ln2 and then re evaluating. In my case chilled liquid reaches a higher OC then my single stage.

22. Originally Posted by Splave
with x79 chips you have to find the temp they like. for my 3930k to run 5.7ghz+ cpu test in 3dmark11 phase change fails (-45c) but ln2 at -20c passes.
I would recommend finding the temp your chip likes with ln2 and then re evaluating. In my case chilled liquid reaches a higher OC then my single stage.
what temps is your liquid chiller at? -10*c? lower or higher? I'm hoping 0* to minus 10*c is the sweet spot for a juicy OC on SB-e.

Do you think 5.3 to 5.5ghz is doable on 3820 with a liquid chiller at 0*c, 100% stable but not necessarily 24/7? possibly only 2 cores at up to 5.5, and all 4 at 5.0?

do you guy's see any better scaling with only 2 cores oc'd or does it lower required vcore at all from all 4 cores oc'd?

just trying to establish some characteristics of sb-e ocing.

23. Here is what you can do during the winter in Chicago with cold air and cold water.

24. what were your ambient temps? what was the ambient air temp to full load core temp delta? 1.616+ seems high for 5.3ghz, i was hoping the lower temps would lower the required vcore, i guess its just chip by chip though, did you try to optimize your vcore for that oc at those temps or was it a set to get? i don't want to go past 1.55vcore for extended periods of time, although this is for the 3820 so maybe the characteristics are different to your 3930, especially if i only use 2 cores up to 5.5ghz.

do you guys see any difference with required vcore if you use higher bclock, ie 104.8 bclock x1.25 ratio x 42 multi compared to 100 bclock x 55 multi?

25. The ambient temp was 0C, or 32F.

The voltage was set to 1.6v in order to get 3Dmark06 and Vantage to not crash during the CPU tests @ 5.3GHz.

However, it could run Super Pi at 5.3GHz at 1.56v and 5.5GHz at 1.60v

CPU was ran with 6c/12t.