Project "True 4x4"

**Duh** · 08-16-2009, 11:47 AM

Originally Posted by jcool

WCG has been running pretty crappy so far, it took 24 hours for some HFCC WUs. Like I said, something is definitely off.

I am confident that you and Particle will solve it in a fast way.. if possible keep us updated

**billdavis** · 08-16-2009, 11:49 AM

2128mhz

pretty cool josh

**Chumbucket843** · 08-16-2009, 11:51 AM

Originally Posted by Duh

ballparking: mobo: 350
cpus: 200
memories : 25 bucks per 2gb module ( he said he already had some ).. but he used 1gb sticks .
hdd: using one he had.
psu the same ( an antec which delivers 385w IIRC).

@j: have you seen any significant increase in electric power from the wall after the overclocking? how does it handle file compressing ? ( gzip, winrar, 7z or any of those)

EDIT: once more I ask if we can have automerge in posts please

iirc you have to buy a 8000 series cpu to have a quad socket system. which would set back $1000 a pop. the ones jcool has are 700 but i could imagine one of these with 45nm would be very power efficient.

**jcool** · 08-16-2009, 11:56 AM

Originally Posted by billdavis

2128mhz

pretty cool josh

Thanks mate

It's a wonder that one can overclock a Quad Socket in the first place..

**OldChap** · 08-16-2009, 12:22 PM

So you guys with all the experience.....what will be my chances of getting better than stock on a dual socket Tyan similar if not exactly this http://www.tyan.com/product_SKU_spec...&SKU=600000042

I should have this within the next week or so.

**jcool** · 08-16-2009, 12:40 PM

Well, you can expect 220-230Mhz HTT.. frequency depends on the CPUs mult then.

BTW, it turns out the Opty hates HFCC WUs.. takes 24h for one, it claims 250 credits per WU but only gets <100 average. So I put it on HCC only for now.

**PoppaGeek** · 08-16-2009, 01:02 PM

Originally Posted by jcool

Well, you can expect 220-230Mhz HTT.. frequency depends on the CPUs mult then.

BTW, it turns out the Opty hates HFCC WUs.. takes 24h for one, it claims 250 credits per WU but only gets <100 average. So I put it on HCC only for now.

Takes my Optys 4-5 hours on HFCC.

**CyberDruid** · 08-16-2009, 01:08 PM

Can you run CineBench please? You beat my Wprime by 8 seconds...that's a helluva PC you have there.

**Chumbucket843** · 08-16-2009, 01:39 PM

cinebench would probly only get a ten times speed up because it doesnt scale well with cores.

**jcool** · 08-16-2009, 01:50 PM

Originally Posted by CyberDruid

Can you run CineBench please? You beat my Wprime by 8 seconds...that's a helluva PC you have there.

Cinebench is totally

ed up on this one, for some reason.. 1577 single 14500 multi CPU

And if you think the Opty is fast in wprime.. check out this

Turns out I do have a faster Intel rig for wprime

**stangracin3** · 08-16-2009, 01:55 PM

Originally Posted by CyberDruid

Can you run CineBench please? You beat my Wprime by 8 seconds...that's a helluva PC you have there.

cyberduid how about this?
http://www.hwbot.org/result.do?resultId=874727

**jcool** · 08-16-2009, 02:01 PM

Originally Posted by stangracin3

cyberduid how about this?
http://www.hwbot.org/result.do?resultId=874727

That's a cluster not a single machine...

**Duh** · 08-16-2009, 02:10 PM

Originally Posted by Chumbucket843

iirc you have to buy a 8000 series cpu to have a quad socket system. which would set back $1000 a pop. the ones jcool has are 700 but i could imagine one of these with 45nm would be very power efficient.

no $1000 pop .. just around 200 as said before...http://cgi.ebay.com/AMD-1-9GHZ-Opter...d=p3286.c0.m14

what does 700 stand for? No 700 opties at least in socket F as long as I can recall

**Chumbucket843** · 08-16-2009, 02:28 PM

Originally Posted by Duh

no $1000 pop .. just around 200 as said before...http://cgi.ebay.com/AMD-1-9GHZ-Opter...d=p3286.c0.m14

what does 700 stand for? No 700 opties at least in socket F as long as I can recall

i meant $700. i got the prices from newegg.

Originally Posted by jcool

Cinebench is totally

ed up on this one, for some reason.. 1577 single 14500 multi CPU

And if you think the Opty is fast in wprime.. check out this

Turns out I do have a faster Intel rig for wprime

damn i was pretty close

**stangracin3** · 08-19-2009, 07:48 PM

jcool
did you figure out your memory problem?

**jcool** · 08-20-2009, 01:34 AM

Nope.. still there. No idea what I can still try at this point. Except for contacting Supermicro.

**jcool** · 08-22-2009, 02:11 AM

Some news on the matter,

thanks to 06F150fx4 who runs the same CPUs on a different motherboard and is getting the same bad latency, my suspicion that it may be due to the CPUs being B2 stepping (hello TLB bug) seems confirmed. I will ask Supermicro if there is a workaround to this issue, but they'll probably just answer that Quads aren't supported on my Rev. 1,01 board anyway because it has no split power planes etc.

**Chumbucket843** · 08-22-2009, 09:16 AM

no wonder those cpu's didnt cost as much as the other optis on the egg.

**jcool** · 08-22-2009, 09:18 AM

Yeah. Ever wonder why they are so cheap? Now you know

**Chumbucket843** · 08-22-2009, 04:06 PM

heres why its so damn slow.http://en.wikipedia.org/wiki/Transla...okaside_Buffer similar to branch prediction in p4 but not quite as bad.
Miss penalty: 10 - 30 clock cycles

**jcool** · 08-24-2009, 11:20 AM

Got a reply from SM tech support:

Hi Sir,

For the memory speed question, please disable the “CPU Page Translation Table” option in the BIOS. Go to BIOS, Advanced / CPU Configuration / CPU Page Translation Table

Neat idea, I was getting all excited when I read it earlier today, but now I tried and.. well. Same result, nothing has changed, at least not in Everest/Sandra.

**rcofell** · 08-24-2009, 02:16 PM

Hmmm, so you're certain it's due to the TLB bug and not how it's dealing with NUMA? Sort of curious how a software managed TLB would do, tho I don't know what the normal hit is for such... you could mess with Linux if you want to find out

An aside:
Chumbucket, a better comparison would be to the L1 Cache, if there's a miss then it'll be looking at a rather similar amount of cycles for either getting it from the L2 [, L3] or main memory. The reason for it being a better comparison is the page table is also a resident of memory*, it's just segregated to it's own spot and specialized (whereas L1 caches are for general program Instruction/Data bytewords which are used in a different context); so if the entry [from the page table] being requested not cached in the TLB then it'll have to take the slow route and go find it. Now the thing is just like the general caches, there should be a fairly low miss rate, so the delay shouldn't make too much of an impact in the grant scheme of things... unless there's a bug/workaround involved

*The reason I bring this up is because branch-prediction is just that, it predicts what's going to happen based on accumulated history, whereas a page table is just a bunch of entries saying each Effective[Virtual] address really points to this Real[Physical] address. The former deals with guessing where a computation(branch evaluation) will lead and pushes instructions into the pipeline ahead of time based on that, whereas E-A translation (page table lookup) is a strict correlation and must be looked up.

EDIT: hopefully this explanation reads through a little better...

Back to the issue:
I guess I never delved too deep into the Barcelona TLB bug, but I thought it was you either ran without the BIOS fix and it went pretty much full bore (for the architecture/implementation) with the risk of failure (freezing is what I remember hearing) under high load, or else you ran with the fix and encountered a 10-20% hit. I could be completely wrong on this, so if anyone knows please correct me.

I assume the BIOS feature you flipped puts you in the former situation (or latter, since you're disabling it??? confuses me now), hence I'm curious if it has to do with NUMA or some other part...

**jcool** · 08-24-2009, 03:13 PM

Originally Posted by rcofell

Hmmm, so you're certain it's due to the TLB bug and not how it's dealing with NUMA? Sort of curious how a software managed TLB would do, tho I don't know what the normal hit is for such... you could mess with Linux if you want to find out

I am not certain about anything here, except for the fact that this rig has a HUGE problem with memory latency causing it to suck ass in some apps. Fortunately, it runs HCC WUs decently.

Unfortunately I neither have the time or nerve to start wrestling with Linux...

An aside:
Chumbucket, a better comparison would be to the L1 Cache, if there's a miss then it'll be looking at a rather similar amount of cycles for either getting it from the L2 [, L3] or main memory. The reason for it being a better comparison is the page table is also a resident of memory*, it's just segregated to it's own spot and specialized (whereas L1 caches are for general program Instruction/Data bytewords which are used in a different context); so if the entry [from the page table] being requested not cached in the TLB then it'll have to take the slow route and go find it. Now the thing is just like the general caches, there should be a fairly low miss rate, so the delay shouldn't make too much of an impact in the grant scheme of things... unless there's a bug/workaround involved

*The reason I bring this up is because branch-prediction is just that, it predicts what's going to happen based on accumulated history, whereas a page table is just a bunch of entries saying each Effective[Virtual] address really points to this Real[Physical] address. The former deals with guessing where a computation(branch evaluation) will lead and pushes instructions into the pipeline ahead of time based on that, whereas E-A translation (page table lookup) is a strict correlation and must be looked up.

EDIT: hopefully this explanation reads through a little better...

Erm.. wut?

Back to the issue:
I guess I never delved too deep into the Barcelona TLB bug, but I thought it was you either ran without the BIOS fix and it went pretty much full bore (for the architecture/implementation) with the risk of failure (freezing is what I remember hearing) under high load, or else you ran with the fix and encountered a 10-20% hit. I could be completely wrong on this, so if anyone knows please correct me.

I assume the BIOS feature you flipped puts you in the former situation (or latter, since you're disabling it??? confuses me now), hence I'm curious if it has to do with NUMA or some other part...

No idea really, it definitely doesn't freeze tho (unless I overclock it too high

)
No idea if switching that one setting made any impact on performance, will find out about that soon I guess. But since it changed absolutely nothing in the synthetic benchies, I am guessing there won't be any real world difference here.

Maybe SM enabled the TLB fix permanently in their bios, who knows.

**Chumbucket843** · 08-24-2009, 04:08 PM

Originally Posted by rcofell

An aside:
Chumbucket, a better comparison would be to the L1 Cache, if there's a miss then it'll be looking at a rather similar amount of cycles for either getting it from the L2 [, L3] or main memory. The reason for it being a better comparison is the page table is also a resident of memory*, it's just segregated to it's own spot and specialized (whereas L1 caches are for general program Instruction/Data bytewords which are used in a different context); so if the entry [from the page table] being requested not cached in the TLB then it'll have to take the slow route and go find it. Now the thing is just like the general caches, there should be a fairly low miss rate, so the delay shouldn't make too much of an impact in the grant scheme of things... unless there's a bug/workaround involved

*The reason I bring this up is because branch-prediction is just that, it predicts what's going to happen based on accumulated history, whereas a page table is just a bunch of entries saying each Effective[Virtual] address really points to this Real[Physical] address. The former deals with guessing where a computation(branch evaluation) will lead and pushes instructions into the pipeline ahead of time based on that, whereas E-A translation (page table lookup) is a strict correlation and must be looked up.

lol i know BP and TLB are very different things. i was comparing the miss penalty(even though the penalty can be much worse for p4). wouldnt a full pipeline flush be worse than a cache miss though?
the fix should already be enabled in the bios. here is an article for the patch.http://techreport.com/articles.x/13741/ latency is actually worse with it on but its better than a system hang.

**Sparky** · 08-24-2009, 05:25 PM

Does the board have the option of turning the TLB workaround off? Because if so, I'd do it. For crunching, it isn't an issue, but the workaround in the bios for the TLB does hinder performance greatly. I had a 9600BE crunching and made sure the TLB thing wasn't enabled, and it crunched fine.

I've heard rumors that certain windows OSes on certain service packs automatically force the TLB thing (though maybe that was just rumors).

Thread: Project "True 4x4"

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions