Page 6 of 7 FirstFirst ... 34567 LastLast
Results 126 to 150 of 151

Thread: Project "True 4x4"

  1. #126
    Xtreme Cruncher
    Join Date
    May 2008
    Location
    Roswell
    Posts
    479
    Quote Originally Posted by SparkyJJO View Post
    I've heard rumors that certain windows OSes on certain service packs automatically force the TLB thing (though maybe that was just rumors).
    that's not a rumor ... it's a fact
    IF AMD uses the same register settings as for the Phenom than you might can use this tool http://xtreview.com/images/TLB_ver1.04.rar (Phenom tlb fix disable tool - for certain OSes )
    It works very well on my Phenom.

  2. #127
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    Quote Originally Posted by mreuter80 View Post
    that's not a rumor ... it's a fact
    IF AMD uses the same register settings as for the Phenom than you might can use this tool http://xtreview.com/images/TLB_ver1.04.rar (Phenom tlb fix disable tool - for certain OSes )
    It works very well on my Phenom.
    Bah. Stupid MS.

    What OSes/service packs are to blame, do you know? Just for future reference... My bro has a 9600BE running XP SP3, wondering if he's getting affected by this or not.
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  3. #128
    Xtreme Cruncher
    Join Date
    May 2008
    Location
    Roswell
    Posts
    479
    Quote Originally Posted by SparkyJJO View Post
    Bah. Stupid MS.

    What OSes/service packs are to blame, do you know? Just for future reference... My bro has a 9600BE running XP SP3, wondering if he's getting affected by this or not.
    Vista with service pack 1 and later. I don't know about XP, though

    He can test it very easy. When he runs the benchmark in WINRAR (ALT+B) and gets more than 1000 kb/s (can be also a little bit lower depending on the current software running) then the TLB bug fix is disabled. if he only gets around 300/400 kb/s then the fix is enabled (then it's time for the tool I mentioned above).

  4. #129
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    Got it. Thanks
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  5. #130
    Xtreme Cruncher
    Join Date
    Apr 2005
    Location
    TX, USA
    Posts
    898
    Sorry jcool if you feel I'm detracting too much from the thread, it's the WCG forum after all

    I would say try that program mreuter80 linked to, since the registers SHOULD be the same, except there's the fact you're running 4physical cpus, so hopefully it knows how to address/configure all of them. I don't know the specific registers involved, but doing a registry dump might shed some light on the matter as well, but the winrar trick might be much easier.

    The only reason I bring up NUMA is because if your tests are trying to pull from the wrong bank with the wrong cpu, then you'll definately feel a performance hit (CPU0-> HT -> CPU1's MemCtrlr/RAM -> HT -> CPU0. I've never had the chance to use a NUMA machine myself tho, so I don't know if it would be a problem by default or not.


    Quote Originally Posted by Chumbucket843 View Post
    lol i know BP and TLB are very different things. i was comparing the miss penalty(even though the penalty can be much worse for p4). wouldnt a full pipeline flush be worse than a cache miss though?
    the fix should already be enabled in the bios. here is an article for the patch.http://techreport.com/articles.x/13741/ latency is actually worse with it on but its better than a system hang.
    K, I just found the analogy a little bit on the far side at the time, so I had to say my 2cents
    (was bored at work waiting for a simulation to finish )

    As for pipeline flush vs cache misses, that all depends on the pipeline length and memory subsystem design, quite situation/implementation dependent. Also, I assume you're mainly referring to flushes caused by branch mispredicts, though quite a few other things can cause them as well.
    You could say a branch mispredict caused pipeline flush is often (but not always, as below) cycle bounded by the pipeline's length (baaad for P4), whereas a cache miss could potentially cause a pipeline flush (since the subsequent instructions issued might depend on that load/save hit), plus the cache miss will have to wait for the retrieval from L2/Ram/Storage, which can also vary based on outstanding requests (transaction could take tens to hundreds-of-thousands of cycles, so probably much longer than the pipeline flush).
    Note that I'm mainly referring to data cache misses, since if there's an instruction cache miss then you just plain have to wait for it to load from memory before you can fill the pipeline again, which could happen on the mispredict if the prefetcher didn't do its job well enough

    Now if you'd throw in SMT, trying to compare things get even more fun, everything pulling from the same caches/etc., except pipeline flushes can now be marginalized by being thread specific. The kick-back is total throughput/efficiency gets a good boost, something we can see with having all the WCG workunit threads go, where we care about the total output
    Last edited by rcofell; 08-24-2009 at 09:33 PM.



  6. #131
    Xtreme Cruncher
    Join Date
    May 2008
    Location
    Roswell
    Posts
    479
    Quote Originally Posted by rcofell View Post

    I would say try that program mreuter80 linked to, since the registers SHOULD be the same, except there's the fact you're running 4physical cpus, so hopefully it knows how to address/configure all of them. I don't know the specific registers involved, but doing a registry dump might shed some light on the matter as well, but the winrar trick might be much easier.
    Damn, I forgot this is the 4x4 thread I guess this program will only run for one CPU.
    You also need to install Crsytal CPU ID before running the program (such a long time back when I installed it, that I forgot it now -- getting older ... who is Dave ).

    Oooh, the benchmark numbers above are for a Phenom. I'm sure your Opteron system will show different numbers.

    OK, with Crustal CPU ID comes the MSR editor where you can access the register settings directly (manually). Here is the link to the download page: http://crystalmark.info/download/ind...l#CrystalCPUID

    Then apply the same steps for each "CORE" ... which is for your True 4x4 quite a bit

    Select the Core in the main window.

    Enter C0010015 in the MSR Number field and hit RDMSR.
    Change the last hex digit. Bit Nr. 3 (8h) must be unset. If the last digit is 8h use 0h if it's 9h use 1h. Hit WRMSR to apply the changes.


    Now enter C0011023 in the MSR Number field and hit RDMSR.
    Change the last hex digit. Bit Nr. 1 (2h) must be unset. If the last digit is 2h change it to 0h. Hit WRMSR to apply the changes.


    Close the MSR Editor, select the next core start the MSR editor and change the registers the same way as described above.
    here is the link to our own xtremesystems guide with some nice pictures http://www.xtremesystems.org/forums/...d.php?t=171105

    There was also a way to do this as a batch in this guide. So if it works you might want to look into it to have a batch running when you start the machine.

    I hope that will help. Good luck man ... I keep my fingers crossed
    Last edited by mreuter80; 08-25-2009 at 01:36 AM.

  7. #132
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Quote Originally Posted by mreuter80 View Post
    that's not a rumor ... it's a fact
    IF AMD uses the same register settings as for the Phenom than you might can use this tool http://xtreview.com/images/TLB_ver1.04.rar (Phenom tlb fix disable tool - for certain OSes )
    It works very well on my Phenom.
    Thanks for that, unfortunately the program doesn't load. It says "vcl60.bpl missing" if I fire up the TLB disable exe and "unable to load dll" if I try the enable. Server 08 x64 SP2.

    Winrar sucks ass BTW, 230 kb/s...

    And I haven't seen a bios option directly advertising a TLB fix, but I guess it's the Translation table thingy, why else would SM support tell me to disable it?

    Maybe the motherboard disabled it but windooze won't.. argh!
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  8. #133
    Xtreme Cruncher
    Join Date
    May 2008
    Location
    Roswell
    Posts
    479
    Quote Originally Posted by jcool View Post
    Thanks for that, unfortunately the program doesn't load. It says "vcl60.bpl missing" if I fire up the TLB disable exe and "unable to load dll" if I try the enable. Server 08 x64 SP2.
    see my last post.

    Quote Originally Posted by jcool View Post
    Winrar sucks ass BTW, 230 kb/s...

  9. #134
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Quote Originally Posted by mreuter80 View Post
    Damn, I forgot this is the 4x4 thread I guess this program will only run for one CPU.
    You also need to install Crsytal CPU ID before running the program (such a long time back when I installed it, that I forgot it now -- getting older ... who is Dave ).

    Oooh, the benchmark numbers above are for a Phenom. I'm sure your Opteron system will show different numbers.

    OK, with Crustal CPU ID comes the MSR editor where you can access the register settings directly (manually). Here is the link to the download page: http://crystalmark.info/download/ind...l#CrystalCPUID

    Then apply the same steps for each "CORE" ... which is for your True 4x4 quite a bit


    here is the link to our own xtremesystems guide with some nice pictures http://www.xtremesystems.org/forums/...d.php?t=171105

    There was also a way to do this as a batch in this guide. So if it works you might want to look into it to have a batch running when you start the machine.

    I hope that will help. Good luck man ... I keep my fingers crossed
    Hey mreuter,

    thanks, this seems to fire up at least.

    But I don't really understand what I need to change the values to. The guide says:

    "Change the last hex digit. Bit Nr. 3 (8h) must be unset. If the last digit is 8h use 0h if it's 9h use 1h. Hit WRMSR to apply the changes."

    Which field are they talking about? And how do I convert hex code to actual numers that I have to enter?

    Entering MSR number 0xC0010015 gives me 0x01000018 for EAX.
    Entering MSR number 0xC0011023 gives me 0x00A00022 for EAX.

    So, into what do I change them?

    And by the way.. damn. Doing that 16 times will be tedious, I need to use that batch file if it works
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  10. #135
    Moderator
    Join Date
    Mar 2006
    Posts
    8,556
    jcool.... long shot but is the MCP55 cooled enough? Did you remove its HS to make sure its all nicely TIMed? I was thinking a bit of thermal throttling on the chipsets....

  11. #136
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Holy Jeesus, it worked

    I just followed this:

    "Hi,

    Guess i found a way to disable the tlb fix if aod does not work and there is no option in the bios.

    The latest bios for my M2A-VM included the tlb-fix. My everest memory read bandwidth dropped around 20%.

    I expected bit nr 3 in the MSR register C0010015 to be responsible for the fix. So I compared the values between the two bios versions.

    The old version showed 0x00000000 0x01000010 the new one 0x00000000 0x01000018 (bit nr 3 set)."

    Since it showed 0x01000018 for mine as well, I just changed all the 16 entries to 0x01000010 and bam...

    1630kb/s in winrar instead of 270kb/s

    Quite the performance increase..

    mreuter, do you think I need to change that 2nd entry too? The 0xC0011023 register, that is.

    Edit: Memory latency improved from 280ns to 151ns. Still crappy, but better.
    Last edited by jcool; 08-25-2009 at 02:55 AM.
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  12. #137
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Ok guys,

    I got both fixes to work now, using this batch (and extending it until CPU 16)

    Quote Originally Posted by mibo
    >cpu 1
    >wrmsr 0xc0010015 0 0x01000010
    >wrmsr 0xc0011023 0 0x00200020
    >cpu 2
    >wrmsr 0xc0010015 0 0x01000010
    >wrmsr 0xc0011023 0 0x00200020
    >cpu 3
    >wrmsr 0xc0010015 0 0x01000010
    >wrmsr 0xc0011023 0 0x00200020
    >cpu 4
    >wrmsr 0xc0010015 0 0x01000010
    >wrmsr 0xc0011023 0 0x00200020
    >rwexit
    To give you an idea of what changed - first up, stock (well not really stock ) Quad Opteron 8347HE:



    Yeah, it sucks. Big time.

    Next up: Changing the 0xC0010015 register from 0x01000018 to 0x01000010 on all cores:




    Yay! Latency still sucks, but overall a big improvement.

    And, one step further: Changing 0xC0011023 register from 0x00A00022 to 0x00200020 (not sure if I should change the A in there? oh well it works)



    Now that's even better. Note how it improves the L3 cache latency.


    Some real world number improvements:

    1. Winrar: No fix: 270KB/s - Fix 1: 1630KB/s - Fix 2: 1660KB/s
    2. Cinebench: No fix: 14600 xCPU - Fix 2 - 19000 xCPU

    Will try more

    A HUGE thank you goes out to mreuter80 for being spot-on with his analysis and pointing me in the right direction!
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  13. #138
    Wuf
    Join Date
    Jul 2007
    Location
    Finland/Tampere
    Posts
    2,400
    Glad you got it working
    You use IRC and Crunch in Xs WCG team? Join #xs.wcg @ Quakenet
    [22:53:09] [@Jaco-XS] i'm gonna overclock this damn box!
    Ze gear:
    Main rig: W3520 + 12GB ddr3 + Gigabyte X58A-UD3R rev2.0! + HD7970 + HD6350 DMS59 + HX520 + 2x X25-E 32gig R0 + Bunch of HDDs.
    ESXI: Dell C6100 XS23-TY3 Node - 1x L5630 + 24GB ECC REG + Brocade 1020 10GbE
    ZFS Server: Supermicro 826E1 + Supermicro X8DAH+-F + 1x L5630 + 24GB ECC REG + 10x 3TB HDDs + Brocade 1020 10GbE
    Lappy!: Lenovo Thinkpad W500: T9600 + 8GB + FireGL v5700 + 128GB Samsung 830 + 320GB 2.5" in ze dvd slot + 1920x1200 @ 15.4"


  14. #139
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    5,152
    Sweet! Nice to see things are going in the right direction!

  15. #140
    Xtreme Cruncher
    Join Date
    May 2008
    Location
    Roswell
    Posts
    479
    {Coffee sipping} MOIN

    Quote Originally Posted by jcool View Post
    Ok guys,

    I got both fixes to work now, using this batch (and extending it until CPU 16)
    Great to see it works.

    Quote Originally Posted by jcool View Post
    And, one step further: Changing 0xC0011023 register from 0x00A00022 to 0x00200020 (not sure if I should change the A in there? oh well it works)
    Don't change the A. I checked on my Phenom and the value should be 0x00A00020

    Quote Originally Posted by jcool View Post
    1. Winrar: No fix: 270KB/s - Fix 1: 1630KB/s - Fix 2: 1660KB/s
    2. Cinebench: No fix: 14600 xCPU - Fix 2 - 19000 xCPU
    ...

    Quote Originally Posted by jcool View Post
    A HUGE thank you goes out to mreuter80 for being spot-on with his analysis and pointing me in the right direction!
    Thanks for the flowers, but I didn't do the analysis. I just gave you the hint with the software.
    I'm very glad it works and the numbers are pretty cool. I'm curious whether it will work fine with all cores crunching. Most of the processors can do it, but it is a bug and might have an effect.

    Now I wonder whether PoppaGeek's opterons might have that issue as well and he can increase his numbers. I will send him a PM to check.

  16. #141
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    Awesome.

    Stupid MS and forcing that TLB crap!
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  17. #142
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Quote Originally Posted by mreuter80 View Post

    Don't change the A. I checked on my Phenom and the value should be 0x00A00020
    I just changed the file to write 0x00A00020 instead of 0x00200020 for the 2nd register. Performance decreased slightly, about in par with Fix 1 (without writing anything to the 2nd register).

    So you should maybe try 0x00200020, it seems faster for me. No stability issues so far, been running benches and crunching for a while now.
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  18. #143
    Xtreme Cruncher
    Join Date
    May 2008
    Location
    Roswell
    Posts
    479
    Quote Originally Posted by jcool View Post
    I just changed the file to write 0x00A00020 instead of 0x00200020 for the 2nd register. Performance decreased slightly, about in par with Fix 1 (without writing anything to the 2nd register).

    So you should maybe try 0x00200020, it seems faster for me. No stability issues so far, been running benches and crunching for a while now.
    Hmm, I will check and give it a try. But I actually use the tool and not Crsytal CPU ID.

  19. #144
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Re-enabled for the machine to run HFCC WUs as well. Shouldn't take 25h per WU now!

    @Riptide: Yeah the MCP is getting pretty toasty, that's probably the reason why this damned mobo won't go any higher than 211 HTT ATM for stable operation.

    I already removed the stock fans (those 2 tiny HSFs are actually 1 piece, cooling the MCP and an AMD PCI-X bridge chip). Unfortunately they use an extremely thick thermal pad for the MCP, like 5mm

    Explains the ty temps, but due to the HSF sitting on the PCI-X bridge as well I can't just remove it and put real TIM on there. I'll have to find new, individual heatsinks (thinking something real big for the MCP ^^ )

    Right now I'd love to put my phase head on the MCP and see how it clocks at -45C

    Vmods for the chipset, anyone?
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  20. #145
    Xtreme Cruncher
    Join Date
    May 2008
    Location
    Roswell
    Posts
    479
    Quote Originally Posted by jcool View Post
    I just changed the file to write 0x00A00020 instead of 0x00200020 for the 2nd register. Performance decreased slightly, about in par with Fix 1 (without writing anything to the 2nd register).

    So you should maybe try 0x00200020, it seems faster for me. No stability issues so far, been running benches and crunching for a while now.
    I tried it and Winrar reports faster numbers. I let it run a little bit and check whether there is some performance increase with the WCG WUs.

  21. #146
    Xtreme Cruncher
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    4,162
    According to this 1354 does not have tlb-bug.

  22. #147
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Poppa, all B2 step Opterons suffer from the TLB Bug, regardless of their model number. B3 and newer procs don't.

    By the way, the 4x4 has passed the night crunching just fine and it seems that even the stoically ram-ignoring HCC project has gained a little, 16 WUs now complete in 7:20h instead of 8h (at 2Ghz CPU speed)
    Last edited by jcool; 08-26-2009 at 01:49 AM.
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  23. #148
    Xtreme Cruncher
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    4,162
    Quote Originally Posted by jcool View Post
    Poppa, all B2 step Opterons suffer from the TLB Bug, regardless of their model number. B3 and newer procs don't.

    By the way, the 4x4 has passed the night crunching just fine and it seems that even the stoically ram-ignoring HCC project has gained a little, 16 WUs now complete in 7:20h instead of 8h (at 2Ghz CPU speed)
    1354 is B3. 1352 was B2.

    And another.

  24. #149
    Back from the Dead
    Join Date
    Oct 2007
    Location
    Stuttgart, Germany
    Posts
    6,602
    Ah, so there are no 1354's with B2 step? That's fine then
    World Community Grid - come join a great team and help us fight for a better tomorrow![size=1]


  25. #150
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    2090mhz vs 2004mhz

    unfair!!!


    :p

    All along the watchtower the watchmen watch the eternal return.

Page 6 of 7 FirstFirst ... 34567 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •