PDA

View Full Version : Big Trouble: Rendering in Maya with Skulltrail


R124
07-13-2008, 08:52 AM
I have a Skulltrail MB with two QX9775 cpus.

When I launch a render within Maya or a batch render the system will produce a BSOD after being under load. Sometimes it will happen within 5minutes off being under load, sometimes 4hours.

The BSOD is the generic:

Hardware Failure
Contact Vendor
System Halt

I've noticed a few things:

1. I've tested each individual cpu in the machine individually on both sockets and they both work fine ( 1 cpu renders my 2K image in about 50seconds - and in the task manager the cpu holds at 100% steadily). When rendering with both cpus installed I get times about 32 seconds and in the task manager the 8-cores CPU load chart is jumping around between 100-80-60 through out the render of a single image.

2. I've tried Kingston 5300 FB-Dimms and Crucial 6400 FB-Dimms. No change.

3. I've tried different video cards, no change.

4. I bought a good size northbridge cooler ( holds around 34c). No change.

5. My PSU is a corsair 1000W. I tested it. Nothing wrong with it.

6. System runs fine when it's not rendering or doing stressful operations.

So my questions:

1. Is water cooling absolutely necessary?
2. Is anyone else having this problem...is anyone else building this system for rendering?

I've spent so much trying to solve this, I'm about to give up and request a refund or just sell off the parts.

Somebody stop me from going Mac Pro over here!

R124
07-13-2008, 09:01 AM
Also, lots of people point to the RAM right away. I've tested two kinds of RAM. Intel said the Kingston was should work, and the Crucial is the kind used in Mac Pros (same 240pin Fully buffered 800mhz ECC ) .

If anyone has a solid (stable) skulltrail system that you use for rendering...please send me a link to the RAM you are using. Also, if anyone has a stable skulltrail system in general that they regularly put under full load for days at a time...please let me know what kind of hardware you are using and any key bios settings.

Thanks

Speederlander
07-13-2008, 09:25 AM
That ECC RAM will prevent most basic RAM errors but not all. Could be the DIMM connector. Have you run memtest? Do that for a couple days and see what shakes out.

R124
07-15-2008, 11:21 AM
I did the memtest. No errors.

Intel tech support believes the processors need to be replaced.

Here we go...

FUGGER
07-15-2008, 11:39 AM
Hmm, use coretemp and check the cores.

http://www.xtremesystems.org/forums/showthread.php?t=188282

Stock speed?

sounds like the water loops is setup wrong for handling the loaded cpu's.

R124
07-16-2008, 11:37 AM
I actually just mailed the processors back to Intel for replacement.

However I was monitoring core temps during all the troubleshooting and I remember the numbers. Basically all the cores idle around 32-34c using the Cooler Master Hyper 212 heatsinks with 120mm fans. Under load 8 cores average around 58c. This is using SpeedFan to monitor. The Ram under load would reach 70c and idle at 50c. The GPU ran pretty hot usually...65c - Nvidia 7300 GS. I think the northbridge was holding steady around 34c (I doubt that though).

Yes, everything was tested using default bios settings, stock speed.

I don't believe it's a heat issue. (though I've been thinking about the Coolit Pure ST solution - but that doesn't relate). Intel said that the system should run stable using heat sinks and fans at stock speeds...

It's either the motherboard or the processors. I tested each proc individually in both sockets without any problems. It's only when they're both in and rendering on all 8 cores, that I get a problem. In my mind I think the motherboard must have issues, but Intel says it's the processors.

I haven't had a stable situation since I bought all this stuff 1.5 months ago.

R124
07-17-2008, 01:51 PM
I got some new information:

I got my new processors today and they run hot. My previous processors idled at 34c on Hyper 212 heatsinks and fans. These new processors : cpu00 idles at high 60s, and cpu01 idles at 100c. Yeah, for real.

I called Intel, and now they declare that you must use liquid cooling. In fact it looks like the fact that my old CPUs idled at 34c indicated that something was wrong with them. They seem to think that there is no way these processors could idle at 34c using copper/aluminum heatsinks and fans. Interesting.

I'm placing an order for the Coolit Pure ST today. This system is a total BOAT. I had no idea going in that it would be like this. Wow.

BiFfMaN
07-19-2008, 11:35 AM
Actually one thing i noticed with my skulltrail system while testing it was that the mems got really hot, so thats one idea to also check.

But yah im doing the same thing your doing, except i went with duel harpers and muskin mems. Watercooled and a :banana::banana::banana::banana:e load of fans.

R124
07-26-2008, 11:38 PM
Yes my ram was running hot as well. I haven't had a chance to see what the temps are with those thermaltake ram coolers that I recently purchased (the ones with the heat pipe and fins), but I'm getting my coolit pure st on Monday so I'll post how it goes.

Tonucci
07-27-2008, 12:04 AM
Active cool on your ram with an fan and see how it goes.

Your new CPU temps may be high due to an bad mount. Try to check if the new CPU's are concave, if yes, add a little more thermal paste on its center, so you fill the empty gap. Make sure you get good pressure /contact with the heatsink base.

If nothing works it looks like an mobo problem.

R124
07-28-2008, 02:57 PM
I installed the Coolit Pure ST - which oddly enough came in a regular Coolit Pure box...apparently it's the same unit, just with an added cpu block and a northbridge block.

I'm approaching hour 3 of rendering and so far I haven't crashed, and received any BSOD system halt messages.

With the new cooler I noticed that my idle temps for the cpu cores averaged about 44-50c when at idle. When rendering the cores reach 76-80c.

I'm currently testing the system in the case and on the desk laying on it's side with the side panel off (so the top is open). I don't have any case fans running...just the liquid cooler.

Also noted that with the thermaltake ram coolers ($19 each)...I see that under load my ram holds at 53c instead of 70c with the stock heatsinks.

I'm going to keep running tests...see I can reproduce my old errors.

Tonucci
07-28-2008, 05:43 PM
Temps are horrible mate, you should check your mount or RMA the CPU's imo. My guess is, it was your ram causing problems....If it was the CPU's you would still be having problems, temps are too high.

Calmatory
07-28-2008, 05:51 PM
Temps are horrible mate, you should check your mount or RMA the CPU's imo. My guess is, it was your ram causing problems....If it was the CPU's you would still be having problems, temps are too high.

Please define "too high"? You mean 80C under load is too high? :confused:

The first temps (50C idle and 100C idle, wtf?) couldn't be the real temps, unless the mounting was very bad.

Tonucci
07-28-2008, 06:31 PM
Yes, 80ºc is too high imo, what part was hard to understand ?

Ive seen my fair share of similar CPU's having problems around those temps. Anything above 70ºc with these procs is too high imo, specially if it will run fully loaded for big lengths of time.

R124
10-21-2008, 10:22 PM
Here we are in late October and the system is still running fine. Temps remain around 60s idle, and high 70s under load.

I don't understand how the Coolit systems people managed to overclock this setup to 4+ghz. I would have a meltdown in my case. Either way, work is getting through the pipeline at stock speeds, and I've still got over 2 years warranty on the CPUs....over and out.