Project "True 4x4"

Printable View

Show 100 post(s) from this thread on one page

08-16-2009, 11:47 AM
Duh

Quote:

Originally Posted by jcool

WCG has been running pretty crappy so far, it took 24 hours for some HFCC WUs. Like I said, something is definitely off.

I am confident that you and Particle will solve it in a fast way.. if possible keep us updated :)
08-16-2009, 11:49 AM
billdavis

2128mhz

pretty cool josh
08-16-2009, 11:51 AM
Chumbucket843

Quote:

Originally Posted by Duh

ballparking: mobo: 350
cpus: 200
memories : 25 bucks per 2gb module ( he said he already had some ).. but he used 1gb sticks .
hdd: using one he had.
psu the same ( an antec which delivers 385w IIRC).

@j: have you seen any significant increase in electric power from the wall after the overclocking? how does it handle file compressing ? ( gzip, winrar, 7z or any of those)

EDIT: once more I ask if we can have automerge in posts please :)

iirc you have to buy a 8000 series cpu to have a quad socket system. which would set back $1000 a pop. the ones jcool has are 700 but i could imagine one of these with 45nm would be very power efficient.
08-16-2009, 11:56 AM
jcool

Quote:

Originally Posted by billdavis

2128mhz

pretty cool josh

Thanks mate :up:
It's a wonder that one can overclock a Quad Socket in the first place.. :eek:
08-16-2009, 12:22 PM
OldChap

So you guys with all the experience.....what will be my chances of getting better than stock on a dual socket Tyan similar if not exactly this http://www.tyan.com/product_SKU_spec...&SKU=600000042

I should have this within the next week or so.
08-16-2009, 12:40 PM
jcool

Well, you can expect 220-230Mhz HTT.. frequency depends on the CPUs mult then.

BTW, it turns out the Opty hates HFCC WUs.. takes 24h for one, it claims 250 credits per WU but only gets <100 average. So I put it on HCC only for now.
08-16-2009, 01:02 PM
PoppaGeek

Quote:

Originally Posted by jcool

Well, you can expect 220-230Mhz HTT.. frequency depends on the CPUs mult then.

BTW, it turns out the Opty hates HFCC WUs.. takes 24h for one, it claims 250 credits per WU but only gets <100 average. So I put it on HCC only for now.

Takes my Optys 4-5 hours on HFCC.:shrug:
08-16-2009, 01:08 PM
CyberDruid

Can you run CineBench please? You beat my Wprime by 8 seconds...that's a helluva PC you have there.
08-16-2009, 01:39 PM
Chumbucket843

cinebench would probly only get a ten times speed up because it doesnt scale well with cores.
08-16-2009, 01:50 PM
jcool

Quote:

Originally Posted by CyberDruid

Can you run CineBench please? You beat my Wprime by 8 seconds...that's a helluva PC you have there.

Cinebench is totally :banana::banana::banana::banana:ed up on this one, for some reason.. 1577 single 14500 multi CPU :rofl:

And if you think the Opty is fast in wprime.. check out this ;)

Turns out I do have a faster Intel rig for wprime :wasntme:
08-16-2009, 01:55 PM
stangracin3

Quote:

Originally Posted by CyberDruid

Can you run CineBench please? You beat my Wprime by 8 seconds...that's a helluva PC you have there.

cyberduid how about this?
http://www.hwbot.org/result.do?resultId=874727
08-16-2009, 02:01 PM
jcool

Quote:

Originally Posted by stangracin3

cyberduid how about this?
http://www.hwbot.org/result.do?resultId=874727

That's a cluster not a single machine...
08-16-2009, 02:10 PM
Duh

Quote:

Originally Posted by Chumbucket843

iirc you have to buy a 8000 series cpu to have a quad socket system. which would set back $1000 a pop. the ones jcool has are 700 but i could imagine one of these with 45nm would be very power efficient.

no $1000 pop .. just around 200 as said before...http://cgi.ebay.com/AMD-1-9GHZ-Opter...d=p3286.c0.m14

what does 700 stand for? No 700 opties at least in socket F as long as I can recall
08-16-2009, 02:28 PM
Chumbucket843

Quote:

Originally Posted by Duh

no $1000 pop .. just around 200 as said before...http://cgi.ebay.com/AMD-1-9GHZ-Opter...d=p3286.c0.m14

what does 700 stand for? No 700 opties at least in socket F as long as I can recall

i meant $700. i got the prices from newegg.

Quote:

Originally Posted by jcool

Cinebench is totally :banana::banana::banana::banana:ed up on this one, for some reason.. 1577 single 14500 multi CPU :rofl:

And if you think the Opty is fast in wprime.. check out this ;)

Turns out I do have a faster Intel rig for wprime :wasntme:

damn i was pretty close
08-19-2009, 07:48 PM
stangracin3

jcool
did you figure out your memory problem?
08-20-2009, 01:34 AM
jcool

Nope.. still there. No idea what I can still try at this point. Except for contacting Supermicro.
08-22-2009, 02:11 AM
jcool

Some news on the matter,

thanks to 06F150fx4 who runs the same CPUs on a different motherboard and is getting the same bad latency, my suspicion that it may be due to the CPUs being B2 stepping (hello TLB bug) seems confirmed. I will ask Supermicro if there is a workaround to this issue, but they'll probably just answer that Quads aren't supported on my Rev. 1,01 board anyway because it has no split power planes etc. :rolleyes:
08-22-2009, 09:16 AM
Chumbucket843

no wonder those cpu's didnt cost as much as the other optis on the egg.
08-22-2009, 09:18 AM
jcool

Yeah. Ever wonder why they are so cheap? Now you know :rolleyes:
08-22-2009, 04:06 PM
Chumbucket843

heres why its so damn slow.http://en.wikipedia.org/wiki/Transla...okaside_Buffer similar to branch prediction in p4 but not quite as bad.
Miss penalty: 10 - 30 clock cycles
08-24-2009, 11:20 AM
jcool

Got a reply from SM tech support:

Quote:

Hi Sir,

For the memory speed question, please disable the “CPU Page Translation Table” option in the BIOS. Go to BIOS, Advanced / CPU Configuration / CPU Page Translation Table

Neat idea, I was getting all excited when I read it earlier today, but now I tried and.. well. Same result, nothing has changed, at least not in Everest/Sandra.
08-24-2009, 02:16 PM
rcofell

Hmmm, so you're certain it's due to the TLB bug and not how it's dealing with NUMA? Sort of curious how a software managed TLB would do, tho I don't know what the normal hit is for such... you could mess with Linux if you want to find out :D

An aside:
Chumbucket, a better comparison would be to the L1 Cache, if there's a miss then it'll be looking at a rather similar amount of cycles for either getting it from the L2 [, L3] or main memory. The reason for it being a better comparison is the page table is also a resident of memory*, it's just segregated to it's own spot and specialized (whereas L1 caches are for general program Instruction/Data bytewords which are used in a different context); so if the entry [from the page table] being requested not cached in the TLB then it'll have to take the slow route and go find it. Now the thing is just like the general caches, there should be a fairly low miss rate, so the delay shouldn't make too much of an impact in the grant scheme of things... unless there's a bug/workaround involved :p:

*The reason I bring this up is because branch-prediction is just that, it predicts what's going to happen based on accumulated history, whereas a page table is just a bunch of entries saying each Effective[Virtual] address really points to this Real[Physical] address. The former deals with guessing where a computation(branch evaluation) will lead and pushes instructions into the pipeline ahead of time based on that, whereas E-A translation (page table lookup) is a strict correlation and must be looked up.

EDIT: hopefully this explanation reads through a little better...

Back to the issue:
I guess I never delved too deep into the Barcelona TLB bug, but I thought it was you either ran without the BIOS fix and it went pretty much full bore (for the architecture/implementation) with the risk of failure (freezing is what I remember hearing) under high load, or else you ran with the fix and encountered a 10-20% hit. I could be completely wrong on this, so if anyone knows please correct me.

I assume the BIOS feature you flipped puts you in the former situation (or latter, since you're disabling it??? confuses me now), hence I'm curious if it has to do with NUMA or some other part...
08-24-2009, 03:13 PM
jcool

Quote:

Originally Posted by rcofell

Hmmm, so you're certain it's due to the TLB bug and not how it's dealing with NUMA? Sort of curious how a software managed TLB would do, tho I don't know what the normal hit is for such... you could mess with Linux if you want to find out :D

I am not certain about anything here, except for the fact that this rig has a HUGE problem with memory latency causing it to suck ass in some apps. Fortunately, it runs HCC WUs decently.

Unfortunately I neither have the time or nerve to start wrestling with Linux...

Quote:

An aside:
Chumbucket, a better comparison would be to the L1 Cache, if there's a miss then it'll be looking at a rather similar amount of cycles for either getting it from the L2 [, L3] or main memory. The reason for it being a better comparison is the page table is also a resident of memory*, it's just segregated to it's own spot and specialized (whereas L1 caches are for general program Instruction/Data bytewords which are used in a different context); so if the entry [from the page table] being requested not cached in the TLB then it'll have to take the slow route and go find it. Now the thing is just like the general caches, there should be a fairly low miss rate, so the delay shouldn't make too much of an impact in the grant scheme of things... unless there's a bug/workaround involved :p:

*The reason I bring this up is because branch-prediction is just that, it predicts what's going to happen based on accumulated history, whereas a page table is just a bunch of entries saying each Effective[Virtual] address really points to this Real[Physical] address. The former deals with guessing where a computation(branch evaluation) will lead and pushes instructions into the pipeline ahead of time based on that, whereas E-A translation (page table lookup) is a strict correlation and must be looked up.

EDIT: hopefully this explanation reads through a little better...

Erm.. wut? :confused: :shrug: :rofl: :p:

Quote:

Back to the issue:
I guess I never delved too deep into the Barcelona TLB bug, but I thought it was you either ran without the BIOS fix and it went pretty much full bore (for the architecture/implementation) with the risk of failure (freezing is what I remember hearing) under high load, or else you ran with the fix and encountered a 10-20% hit. I could be completely wrong on this, so if anyone knows please correct me.

I assume the BIOS feature you flipped puts you in the former situation (or latter, since you're disabling it??? confuses me now), hence I'm curious if it has to do with NUMA or some other part...

No idea really, it definitely doesn't freeze tho (unless I overclock it too high :cool: )
No idea if switching that one setting made any impact on performance, will find out about that soon I guess. But since it changed absolutely nothing in the synthetic benchies, I am guessing there won't be any real world difference here.

Maybe SM enabled the TLB fix permanently in their bios, who knows.
08-24-2009, 04:08 PM
Chumbucket843

Quote:

Originally Posted by rcofell

An aside:
Chumbucket, a better comparison would be to the L1 Cache, if there's a miss then it'll be looking at a rather similar amount of cycles for either getting it from the L2 [, L3] or main memory. The reason for it being a better comparison is the page table is also a resident of memory*, it's just segregated to it's own spot and specialized (whereas L1 caches are for general program Instruction/Data bytewords which are used in a different context); so if the entry [from the page table] being requested not cached in the TLB then it'll have to take the slow route and go find it. Now the thing is just like the general caches, there should be a fairly low miss rate, so the delay shouldn't make too much of an impact in the grant scheme of things... unless there's a bug/workaround involved :p:

*The reason I bring this up is because branch-prediction is just that, it predicts what's going to happen based on accumulated history, whereas a page table is just a bunch of entries saying each Effective[Virtual] address really points to this Real[Physical] address. The former deals with guessing where a computation(branch evaluation) will lead and pushes instructions into the pipeline ahead of time based on that, whereas E-A translation (page table lookup) is a strict correlation and must be looked up.

lol i know BP and TLB are very different things. i was comparing the miss penalty(even though the penalty can be much worse for p4). wouldnt a full pipeline flush be worse than a cache miss though?
the fix should already be enabled in the bios. here is an article for the patch.http://techreport.com/articles.x/13741/ latency is actually worse with it on but its better than a system hang.
08-24-2009, 05:25 PM
Sparky

Does the board have the option of turning the TLB workaround off? Because if so, I'd do it. For crunching, it isn't an issue, but the workaround in the bios for the TLB does hinder performance greatly. I had a 9600BE crunching and made sure the TLB thing wasn't enabled, and it crunched fine.

I've heard rumors that certain windows OSes on certain service packs automatically force the TLB thing (though maybe that was just rumors).
08-24-2009, 06:39 PM
mreuter80

Quote:

Originally Posted by SparkyJJO

I've heard rumors that certain windows OSes on certain service packs automatically force the TLB thing (though maybe that was just rumors).

that's not a rumor ... it's a fact :mad:
IF AMD uses the same register settings as for the Phenom than you might can use this tool http://xtreview.com/images/TLB_ver1.04.rar (Phenom tlb fix disable tool - for certain OSes :yepp:)
It works very well on my Phenom.
08-24-2009, 06:46 PM
Sparky

Quote:

Originally Posted by mreuter80

that's not a rumor ... it's a fact :mad:
IF AMD uses the same register settings as for the Phenom than you might can use this tool http://xtreview.com/images/TLB_ver1.04.rar (Phenom tlb fix disable tool - for certain OSes :yepp:)
It works very well on my Phenom.

Bah. Stupid MS.

What OSes/service packs are to blame, do you know? Just for future reference... My bro has a 9600BE running XP SP3, wondering if he's getting affected by this or not.
08-24-2009, 07:15 PM
mreuter80

Quote:

Originally Posted by SparkyJJO

Bah. Stupid MS.

What OSes/service packs are to blame, do you know? Just for future reference... My bro has a 9600BE running XP SP3, wondering if he's getting affected by this or not.

Vista with service pack 1 and later. I don't know about XP, though :shrug:

He can test it very easy. When he runs the benchmark in WINRAR (ALT+B) and gets more than 1000 kb/s (can be also a little bit lower depending on the current software running) then the TLB bug fix is disabled. if he only gets around 300/400 kb/s then the fix is enabled (then it's time for the tool I mentioned above).
08-24-2009, 07:21 PM
Sparky

Got it. Thanks :D
08-24-2009, 09:28 PM
rcofell

Sorry jcool if you feel I'm detracting too much from the thread, it's the WCG forum after all :eek:

I would say try that program mreuter80 linked to, since the registers SHOULD be the same, except there's the fact you're running 4physical cpus, so hopefully it knows how to address/configure all of them. I don't know the specific registers involved, but doing a registry dump might shed some light on the matter as well, but the winrar trick might be much easier.

The only reason I bring up NUMA is because if your tests are trying to pull from the wrong bank with the wrong cpu, then you'll definately feel a performance hit (CPU0-> HT -> CPU1's MemCtrlr/RAM -> HT -> CPU0. I've never had the chance to use a NUMA machine myself tho, so I don't know if it would be a problem by default or not.

Quote:

Originally Posted by Chumbucket843

lol i know BP and TLB are very different things. i was comparing the miss penalty(even though the penalty can be much worse for p4). wouldnt a full pipeline flush be worse than a cache miss though?
the fix should already be enabled in the bios. here is an article for the patch.http://techreport.com/articles.x/13741/ latency is actually worse with it on but its better than a system hang.

K, I just found the analogy a little bit on the far side at the time, so I had to say my 2cents :shakes: :shrug: :) :p:
(was bored at work waiting for a simulation to finish ;) )

As for pipeline flush vs cache misses, that all depends on the pipeline length and memory subsystem design, quite situation/implementation dependent. Also, I assume you're mainly referring to flushes caused by branch mispredicts, though quite a few other things can cause them as well.
You could say a branch mispredict caused pipeline flush is often (but not always, as below) cycle bounded by the pipeline's length (baaad for P4), whereas a cache miss could potentially cause a pipeline flush (since the subsequent instructions issued might depend on that load/save hit), plus the cache miss will have to wait for the retrieval from L2/Ram/Storage, which can also vary based on outstanding requests (transaction could take tens to hundreds-of-thousands of cycles, so probably much longer than the pipeline flush).
Note that I'm mainly referring to data cache misses, since if there's an instruction cache miss then you just plain have to wait for it to load from memory before you can fill the pipeline again, which could happen on the mispredict if the prefetcher didn't do its job well enough :)

Now if you'd throw in SMT, trying to compare things get even more fun, everything pulling from the same caches/etc., except pipeline flushes can now be marginalized by being thread specific. The kick-back is total throughput/efficiency gets a good boost, something we can see with having all the WCG workunit threads go, where we care about the total output :up:
08-25-2009, 01:31 AM
mreuter80

Quote:

Originally Posted by rcofell

I would say try that program mreuter80 linked to, since the registers SHOULD be the same, except there's the fact you're running 4physical cpus, so hopefully it knows how to address/configure all of them. I don't know the specific registers involved, but doing a registry dump might shed some light on the matter as well, but the winrar trick might be much easier.

Damn, I forgot this is the 4x4 thread :brick: I guess this program will only run for one CPU.
You also need to install Crsytal CPU ID before running the program (such a long time back when I installed it, that I forgot it now -- getting older ... who is Dave :rofl:).

Oooh, the benchmark numbers above are for a Phenom. I'm sure your Opteron system will show different numbers.

OK, with Crustal CPU ID comes the MSR editor where you can access the register settings directly (manually). Here is the link to the download page: http://crystalmark.info/download/ind...l#CrystalCPUID

Then apply the same steps for each "CORE" ... which is for your True 4x4 quite a bit :rofl:

Quote:

Select the Core in the main window.

Enter C0010015 in the MSR Number field and hit RDMSR.
Change the last hex digit. Bit Nr. 3 (8h) must be unset. If the last digit is 8h use 0h if it's 9h use 1h. Hit WRMSR to apply the changes.

Now enter C0011023 in the MSR Number field and hit RDMSR.
Change the last hex digit. Bit Nr. 1 (2h) must be unset. If the last digit is 2h change it to 0h. Hit WRMSR to apply the changes.

Close the MSR Editor, select the next core start the MSR editor and change the registers the same way as described above.

here is the link to our own xtremesystems guide with some nice pictures http://www.xtremesystems.org/forums/...d.php?t=171105

There was also a way to do this as a batch in this guide. So if it works you might want to look into it to have a batch running when you start the machine.

I hope that will help. Good luck man ... I keep my fingers crossed :up:
08-25-2009, 01:40 AM
jcool

Quote:

Originally Posted by mreuter80

that's not a rumor ... it's a fact :mad:
IF AMD uses the same register settings as for the Phenom than you might can use this tool http://xtreview.com/images/TLB_ver1.04.rar (Phenom tlb fix disable tool - for certain OSes :yepp:)
It works very well on my Phenom.

Thanks for that, unfortunately the program doesn't load. It says "vcl60.bpl missing" if I fire up the TLB disable exe and "unable to load dll" if I try the enable. Server 08 x64 SP2.

Winrar sucks ass BTW, 230 kb/s... :rolleyes:

And I haven't seen a bios option directly advertising a TLB fix, but I guess it's the Translation table thingy, why else would SM support tell me to disable it?

Maybe the motherboard disabled it but windooze won't.. argh!
08-25-2009, 01:48 AM
mreuter80

Quote:

Originally Posted by jcool

Thanks for that, unfortunately the program doesn't load. It says "vcl60.bpl missing" if I fire up the TLB disable exe and "unable to load dll" if I try the enable. Server 08 x64 SP2.

see my last post.

Quote:

Originally Posted by jcool

Winrar sucks ass BTW, 230 kb/s... :rolleyes:

:shocked:
08-25-2009, 01:53 AM
jcool

Quote:

Originally Posted by mreuter80

Damn, I forgot this is the 4x4 thread :brick: I guess this program will only run for one CPU.
You also need to install Crsytal CPU ID before running the program (such a long time back when I installed it, that I forgot it now -- getting older ... who is Dave :rofl:).

Oooh, the benchmark numbers above are for a Phenom. I'm sure your Opteron system will show different numbers.

OK, with Crustal CPU ID comes the MSR editor where you can access the register settings directly (manually). Here is the link to the download page: http://crystalmark.info/download/ind...l#CrystalCPUID

Then apply the same steps for each "CORE" ... which is for your True 4x4 quite a bit :rofl:

here is the link to our own xtremesystems guide with some nice pictures http://www.xtremesystems.org/forums/...d.php?t=171105

There was also a way to do this as a batch in this guide. So if it works you might want to look into it to have a batch running when you start the machine.

I hope that will help. Good luck man ... I keep my fingers crossed :up:

Hey mreuter,

thanks, this seems to fire up at least.

But I don't really understand what I need to change the values to. The guide says:

"Change the last hex digit. Bit Nr. 3 (8h) must be unset. If the last digit is 8h use 0h if it's 9h use 1h. Hit WRMSR to apply the changes."

Which field are they talking about? And how do I convert hex code to actual numers that I have to enter? :shrug:

Entering MSR number 0xC0010015 gives me 0x01000018 for EAX.
Entering MSR number 0xC0011023 gives me 0x00A00022 for EAX.

So, into what do I change them?

And by the way.. damn. Doing that 16 times will be tedious, I need to use that batch file if it works :rofl:
08-25-2009, 02:39 AM
[XC] riptide

jcool.... long shot but is the MCP55 cooled enough? Did you remove its HS to make sure its all nicely TIMed? I was thinking a bit of thermal throttling on the chipsets.... :shrug:
08-25-2009, 02:52 AM
jcool

Holy :banana::banana::banana::banana: Jeesus, it worked :D

I just followed this:

"Hi,

Guess i found a way to disable the tlb fix if aod does not work and there is no option in the bios.

The latest bios for my M2A-VM included the tlb-fix. My everest memory read bandwidth dropped around 20%.

I expected bit nr 3 in the MSR register C0010015 to be responsible for the fix. So I compared the values between the two bios versions.

The old version showed 0x00000000 0x01000010 the new one 0x00000000 0x01000018 (bit nr 3 set)."

Since it showed 0x01000018 for mine as well, I just changed all the 16 entries to 0x01000010 and bam...

1630kb/s in winrar instead of 270kb/s :D

Quite the performance increase..

mreuter, do you think I need to change that 2nd entry too? The 0xC0011023 register, that is.

Edit: Memory latency improved from 280ns to 151ns. Still crappy, but better.
08-25-2009, 03:13 AM
jcool

Ok guys,

I got both fixes to work now, using this batch (and extending it until CPU 16)

Quote:

Originally Posted by mibo

>cpu 1
>wrmsr 0xc0010015 0 0x01000010
>wrmsr 0xc0011023 0 0x00200020
>cpu 2
>wrmsr 0xc0010015 0 0x01000010
>wrmsr 0xc0011023 0 0x00200020
>cpu 3
>wrmsr 0xc0010015 0 0x01000010
>wrmsr 0xc0011023 0 0x00200020
>cpu 4
>wrmsr 0xc0010015 0 0x01000010
>wrmsr 0xc0011023 0 0x00200020
>rwexit

To give you an idea of what changed - first up, stock (well not really stock :D ) Quad Opteron 8347HE:

http://database.he-computer.de/Bilde...tencyissue.jpg

Yeah, it sucks. Big time.

Next up: Changing the 0xC0010015 register from 0x01000018 to 0x01000010 on all cores:

http://database.he-computer.de/Bilde...erest_1fix.jpg

Yay! Latency still sucks, but overall a big improvement.

And, one step further: Changing 0xC0011023 register from 0x00A00022 to 0x00200020 (not sure if I should change the A in there? oh well it works)

http://database.he-computer.de/Bilde...erest_2fix.jpg

Now that's even better. Note how it improves the L3 cache latency.

Some real world number improvements:

1. Winrar: No fix: 270KB/s - Fix 1: 1630KB/s - Fix 2: 1660KB/s
2. Cinebench: No fix: 14600 xCPU - Fix 2 - 19000 xCPU

Will try more :)

A HUGE thank you goes out to mreuter80 for being spot-on with his analysis and pointing me in the right direction! :toast:
08-25-2009, 03:18 AM
s0lid

Glad you got it working :clap: :up:
08-25-2009, 04:30 AM
Otis11

Sweet! Nice to see things are going in the right direction!
08-25-2009, 04:34 AM
mreuter80

{Coffee sipping} MOIN

Quote:

Originally Posted by jcool

Ok guys,

I got both fixes to work now, using this batch (and extending it until CPU 16)

Great to see it works.

Quote:

Originally Posted by jcool

And, one step further: Changing 0xC0011023 register from 0x00A00022 to 0x00200020 (not sure if I should change the A in there? oh well it works)

Don't change the A. I checked on my Phenom and the value should be 0x00A00020

Quote:

Originally Posted by jcool

1. Winrar: No fix: 270KB/s - Fix 1: 1630KB/s - Fix 2: 1660KB/s
2. Cinebench: No fix: 14600 xCPU - Fix 2 - 19000 xCPU

:shocked: ... :woot:

Quote:

Originally Posted by jcool

A HUGE thank you goes out to mreuter80 for being spot-on with his analysis and pointing me in the right direction! :toast:

Thanks for the flowers, but I didn't do the analysis. I just gave you the hint with the software.
I'm very glad it works and the numbers are pretty cool. I'm curious whether it will work fine with all cores crunching. Most of the processors can do it, but it is a bug and might have an effect.

Now I wonder whether PoppaGeek's opterons might have that issue as well and he can increase his numbers. I will send him a PM to check.
08-25-2009, 05:55 AM
Sparky

Awesome.

Stupid MS and forcing that TLB crap!
08-25-2009, 06:07 AM
jcool

Quote:

Originally Posted by mreuter80

Don't change the A. I checked on my Phenom and the value should be 0x00A00020

I just changed the file to write 0x00A00020 instead of 0x00200020 for the 2nd register. Performance decreased slightly, about in par with Fix 1 (without writing anything to the 2nd register).

So you should maybe try 0x00200020, it seems faster for me. No stability issues so far, been running benches and crunching for a while now.
08-25-2009, 07:53 AM
mreuter80

Quote:

Originally Posted by jcool

I just changed the file to write 0x00A00020 instead of 0x00200020 for the 2nd register. Performance decreased slightly, about in par with Fix 1 (without writing anything to the 2nd register).

So you should maybe try 0x00200020, it seems faster for me. No stability issues so far, been running benches and crunching for a while now.

Hmm, I will check and give it a try. But I actually use the tool and not Crsytal CPU ID.
08-25-2009, 11:25 AM
jcool

Re-enabled for the machine to run HFCC WUs as well. Shouldn't take 25h per WU now! :rolleyes:

@Riptide: Yeah the MCP is getting pretty toasty, that's probably the reason why this damned mobo won't go any higher than 211 HTT ATM for stable operation.

I already removed the stock fans (those 2 tiny HSFs are actually 1 piece, cooling the MCP and an AMD PCI-X bridge chip). Unfortunately they use an extremely thick thermal pad for the MCP, like 5mm :rofl:

Explains the :banana::banana::banana::banana:ty temps, but due to the HSF sitting on the PCI-X bridge as well I can't just remove it and put real TIM on there. I'll have to find new, individual heatsinks (thinking something real big for the MCP ^^ )

Right now I'd love to put my phase head on the MCP and see how it clocks at -45C :p: :rofl:

Vmods for the chipset, anyone?
08-25-2009, 07:50 PM
mreuter80

Quote:

Originally Posted by jcool

I just changed the file to write 0x00A00020 instead of 0x00200020 for the 2nd register. Performance decreased slightly, about in par with Fix 1 (without writing anything to the 2nd register).

So you should maybe try 0x00200020, it seems faster for me. No stability issues so far, been running benches and crunching for a while now.

I tried it and Winrar reports faster numbers. I let it run a little bit and check whether there is some performance increase with the WCG WUs.
08-25-2009, 08:53 PM
PoppaGeek

According to this 1354 does not have tlb-bug.
08-26-2009, 01:47 AM
jcool

Poppa, all B2 step Opterons suffer from the TLB Bug, regardless of their model number. B3 and newer procs don't.

By the way, the 4x4 has passed the night crunching just fine and it seems that even the stoically ram-ignoring HCC project has gained a little, 16 WUs now complete in 7:20h instead of 8h (at 2Ghz CPU speed)
08-26-2009, 02:01 AM
PoppaGeek

Quote:

Originally Posted by jcool

Poppa, all B2 step Opterons suffer from the TLB Bug, regardless of their model number. B3 and newer procs don't.

By the way, the 4x4 has passed the night crunching just fine and it seems that even the stoically ram-ignoring HCC project has gained a little, 16 WUs now complete in 7:20h instead of 8h (at 2Ghz CPU speed)

1354 is B3. 1352 was B2.

And another.
08-26-2009, 02:11 AM
jcool

Ah, so there are no 1354's with B2 step? That's fine then :)
08-26-2009, 08:09 PM
STEvil

2090mhz vs 2004mhz

unfair!!!

:p
08-27-2009, 12:38 AM
jcool

Too hot for 2,1 :p:

Show 100 post(s) from this thread on one page

All times are GMT -8. The time now is 10:12 PM.

XtremeSystems