Page 2 of 5 FirstFirst 12345 LastLast
Results 26 to 50 of 111

Thread: The Book of Bulldozer - Revelations: Episode 2 (SuperPI / x87)

  1. #26
    Xtreme Enthusiast
    Join Date
    Mar 2005
    Location
    Buenos Aires, Argentina
    Posts
    644
    I would suggest to try to get an ASM programmer to make a x87 benchmarking program. I recall seeing several mini programs that did things like that. Here is one.

  2. #27
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    Gentlemen...

    We have a Ridgeback down!



    Few "goosfraba's" were required for the obvious reason in the end of the video...

  3. #28
    Registered User
    Join Date
    May 2008
    Location
    USA
    Posts
    36
    Stilt you are the man!

    I have some questions and some (maybe) helpful information for everyone.

    I have been playing with recompiling software used in benchmarks in Gentoo and comparing them to my Windows scores.

    So far I have tested:

    x264: about 5% to 10% increase in Gentoo
    LAME: about 60% increase in Gentoo
    Blender: about twice as fast in Gentoo.

    Can you perhaps give LAME a shot with this? And maybe Blender?

    Looking forward to the release. Is it possible to make a Linux version? Perhaps GCC with bdver2 flags can compile around this issue and in Windows we don't get this sort of optimization. Or perhaps I could even add more to it.

    If you are looking for data I would gladly play with this patch and compile open source benchmarks with whatever settings and share the data.

    I have a theory that the benchmarks where AMD does really poorly on (LAME, SuperPI, etc) are being artificially limited by software or whatever. Recompiling in GCC and seeing 60% or 100% or better gains is pretty much unheard of in the Gentoo user world.

    FX 8350 @ 5.11ghz | Gigabyte 990FXA UD5 | 16GB Mushkin Blackline | 7970 @ 1.2ghz
    core i7 920 @ 4.05ghz | asus p6t deluxe | 6GB G. Skill @ ~1.6ghz | 7970 @ 1.2ghz - 6ghz - 1.2v
    Opteron 165 @ 2.7Ghz | 1gb G. Skill @ ~520mhz |4870 1GB | asus a8n32 sli

  4. #29
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    I've wrapped up my tests on LN2 with Richland and everything else too, so I can put some hours on this matter in the weekend.
    I'll try the fix with games in which AMD especially is performing poorly at.

    After the fix has been released, I'll start fine tuning prefetchers and predictors to see if there is any general or application specific improvements to be juiced out of Bulldozer

  5. #30
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    Man, you are the MAN! Awesome, I have now happy day because you ! Wow. IF you have still some LN2, can you test Superpi 32M for "wr" with Vishera?
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  6. #31
    Xtreme Addict
    Join Date
    Dec 2002
    Location
    Sweden
    Posts
    1,261
    awesome with such tuning in advanced level. im impressed!

    I wonder if there are other PI versions that use newest instructions and compare to Intel's?
    Vishera 8320@ 5ghz | Gigabyte UD3 | 8gb TridentX 2400 c10| Powercolor 6850 | Thermalight Silver Arrow (bench Super KAZE 3k) | Samsung 830 128gbx2 Raid 0| Fractal case

  7. #32
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Hihi,

    just remembered another application. There is a Boinc project called "simap" (http://boincsimap.org/boincsimap/), they have not changed their program since ages, it is still x87 based. The core algorithm they used exists already in an SSE2 version, which is ~8 times faster according to a scientific paper of the programmer, but they dont use it. Maybe your patch can help Bulldozer a bit. All work-units are calculated twice for testing on another machine, so you should be able to detect potential calculation bugs. Just check if your patch has a positive effect on the speed and then let it run a few days ;-)

    cheers
    Last edited by Opteron146; 06-17-2013 at 03:07 AM.

  8. #33
    Registered User
    Join Date
    Nov 2010
    Posts
    26
    I'm going to try and and write a program that calculates the digits of Pi and then compile it with GCC for different optimization flags.

  9. #34
    Xtreme Member
    Join Date
    Dec 2012
    Location
    Buenos Aires
    Posts
    306
    Wow, so much knowledge here...I'm really interested to see how this pans out. Major props!

  10. #35
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    There are two kind of news bad and good ones.

    Let's get rid of the bad ones first:

    Originally I tested this fix on three different CPU/APUs (Richland, Trinity and Vishera).
    When I went to verify the effects of the fix on Zambezi the system crashed immediately once the necessary changes were written.

    After some research I noticed that these registers do not respond on Zambezi based CPUs.
    Upon reading all of them return null values and crash the system unless a special method is used.
    At first it appeared that these registers do not exist on Zambezi, however after digging a bit deeper I found indication that the registers are there... But for some reason AMD seem to have protected them with a ESI/EDI password on Zambezi.

    They do not require any passwords on any Piledriver based APU/CPU.

    So the fix will not be available for Zambezi users.
    Sorry for the massive let-down

    The the good news:

    The software is pretty much finished.
    It should be available for download within this week.

    After the let-down on Zambezi I felt that something had to be done for Zambezi too.
    While it does not result as massive boost as the original fix does it still gives something:

    SuperPI 1M: > 1 second improvement
    SuperPI 8M: > 10 second improvement
    SuperPI 32M: > 35 second improvement

    It is called as "Zambezi Stack Special (PD)".
    Note: There might also be some performance retardation in some applications when enabled (Zambezi vs. Vishera effect).

    Zambezi is significantly faster than Vishera in SuperPI by default so the difference between a "fixed" Vishera and a tuned Zambezi won't be that massive after the "Zambezi Stack Special" configuration.

    Last edited by The Stilt; 06-17-2013 at 10:06 PM.

  11. #36
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Quote Originally Posted by The Stilt View Post
    Zambezi is significantly faster than Vishera in SuperPI by default
    You surely mean it the other way around, dont you?

  12. #37
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    Quote Originally Posted by Opteron146 View Post
    You surely mean it the other way around, dont you?
    No.

    Zambezi is faster than Vishera by:

    SuperPI 1M: 4.84%
    SuperPI 8M: 11.93%
    SuperPI 32M: 14.61%
    Last edited by The Stilt; 06-18-2013 at 01:11 AM.

  13. #38
    Registered User
    Join Date
    Sep 2010
    Location
    Poland
    Posts
    47
    Quote Originally Posted by The Stilt View Post
    After some research I noticed that these registers do not respond on Zambezi based CPUs.
    Upon reading all of them return null values and crash the system unless a special method is used.
    At first it appeared that these registers do not exist on Zambezi, however after digging a bit deeper I found indication that the registers are there... But for some reason AMD seem to have protected them with a ESI/EDI password on Zambezi.
    did you try the old magic 0x9c5a203a/9C5A203A ? I wonder if Opteron cpus behave the same, or is that hidden x87 common for all AMD cpus.

  14. #39
    Champion
    Join Date
    Mar 2005
    Location
    Warsaw, Poland
    Posts
    476
    As I stated on HWBOT - you are a magician
    MSI MOA 2009 POLAND #3
    Gigabyte GOOC 2010 POLAND #2
    MSI MOA 2010 POLAND - NOT ORGANISED
    ASUS Polish Overclocking Championship 2010 #1
    MSI MOA 2011 POLAND - NOT INVITED
    HWBOT Country Cup 2011 - POLAND! #1
    MSI MOA 2012 EMEA - #1
    MSI MOA 2012 WW FINALS - #7
    ASUS Open Overclocking Cup AOOC 2012 - #1
    HWBOT Country Cup 2012 - POLAND! #2
    MSI MOA 2013 EMEA Qualifier - #2
    ASUS Open Overclocking Cup AOOC 2013 - #?
    MSI MOA 2013 WW FINALS - #?

  15. #40
    Moderator
    Join Date
    Oct 2007
    Location
    Oregon - USA
    Posts
    830
    My wife's 8350 waits in anticipation of your hard work.
    I'm very enthusiastic about this, and for amd at the moment.
    Asus Rampage IV Extreme
    4930k @4.875
    G.Skill Trident X 2666 Cl10
    Gtx 780 SC
    1600w Lepa Gold
    Samsung 840 Pro 256GB


  16. #41
    Xtremely Bad Overclocker
    Join Date
    Jan 2005
    Location
    East Blue
    Posts
    3,596
    Awesome research and dev!
    | '12 IvyBridge - "ticks different"... | AwardFabrik IvyBridge round I by SoF | AwardFabrik IvyBridge round II by angoholic & stummerwinter
    | '11 The SandyBridge madness... | AwardFabrik / Team LDK OC-Season 2011/2012 Opening Event
    | '10 Gulftown LaunchDay OC round up @ASUS RIIE | 3DM05 2x GPU WR LIVE @Cebit 2010 @ASUS MIIIE | SandyBridge arrived @ASUS P8P67

    | '09 Foxconn Avenger | E8600 | Foxconn A79A-S | Phenom II 940 BE | LaunchDay Phenom II OC round up
    | '08 7.438s 1m LN2 | AMD 1m WR LN2 | 2nd AOCM | Phenom II teasing
    | '07 100% E2140 | 106.5% E2160 | 100% E4500 | 103% E4400 | 5508 MHZ E6850 | 7250 MHZ P4 641 126.5% by SoF and AwardFabrik Crew all on Gigabyte DS3P c? and LN2...
    | '06 3800+ X2 Manchester 0531TPEW noHS 3201MHZ c? | 3200+ Venice noHS 3279MHZ c? | Opteron 148 0536CABYE 3405MHZ c? all on Gigabyte K8NXP-SLI compressorcooled

    | '05 3500+[NC], 3000+[W], 2x 3200+[W], 3500+[NC], 3200+[V] 0516GPDW

    Quote Originally Posted by saaya
    sof pulled a fermi on all of us !!!

  17. #42
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    So, it is friday today isn't it

    Bulldozer Conditioner R1.00B

    The checksum (MD5) for the zip file is: 418522A93F241CF14EB1D775839AB083
    If the checksum does not match the package has been tampered with = delete and re-download from another location.
    The checksum can be calculated online if you don't have a suitable software on your computer.
    http://onlinemd5.com/

    There is not a single bit of malicious code either in the driver or the software itself.
    If you are unsure, please check the contents with https://www.virustotal.com

    Supported OS: Windows XP / Windows Vista / Windows 7 / Windows 8 (32 & 64-bit)

    The x86 version works in both 32 & 64-bit operating systems, while the x64 version is 64-bit only.
    The functionality itself is identical between the versions.

    Known limitations: Up to 16 CUs (32 cores) supported at the moment. Support for 32CUs (64 cores) will be added in the next version.

    Also the R1.00B (Beta) version does not contain the feature to patch the microcode block as I could not make it work stable enough.

    The "Errata Fix" button will fix the major errata which can be patched without updating the microcode.
    This feature should not be used as a permanent solution, the bios update should still be used as a primary method (updated AGESA + microcode).

    Note: Enabling "Zambezi Stack Special (PD)" feature might cause undefined behavior, however each user should test it's functionality on their own. Some applications might indicate a minor retardation in performance, however SuperPI for example receives a nice boost.

    Note: "x87 instruction (NRAC) block" -> Enabled means that the instruction is blocked (default on all 15h family APU/CPU/NPUs). Disabling it make the SuperPI "a bit" faster.

    There are most certainly some bugs, so in case you come across one, please report them to this thread.
    The experiences are very welcome also.

    No it is time for the midsummer parties so I might be away for a day or two.
    Depending on how epic the headache shall be

    Update on 06/22/2013: Bulldozer Conditioner R1.01B
    Last edited by The Stilt; 06-22-2013 at 06:47 AM.

  18. #43
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    First, thanks very much for sharing this with us. It's very interesting.

    Some quick dirty tests on an FX 8320 @ 4Ghz, before I go to bed.

    ..Definatly an improvement in super pi, which despite it's irrelevance now as an overall perf metric is still amazing to see the effect of a simple register change.

    It also shows how little AMD care about this bench, despite the odd website still persisting with it when reviewing against the competition. Perhaps AMD's PR should be more vocal about why their CPU's are so slow at this piece of software, because there's a surprising amount of pseudo-enthusiasts out there that still judge by it.




  19. #44
    Xtreme Enthusiast
    Join Date
    Jun 2008
    Location
    Hong Kong
    Posts
    967
    I just test it with my FX-6300 ( at stock with Turbo Core Enable)
    Running SuperPi mod 1.5 XS edition

    Before the patching
    1M takes around 24s~

    and after the patch applied
    1M takes 20s now

    Really cool and great job!

    Gaming Rig
    CPU : AMD Ryzen 7 3700X (45W ECO mode)
    HSF : Noctua C14S
    MB : ASRock X470 Taichi Ultimate
    RAM : G.Skill F4-3000C14-16GTZR x4 @ DDR4-3000 CL14
    VGA : MSI RTX2070
    PSU : Antec NeoECO Gold 650W
    Case : Corsair 100R ATX
    SSD : Samsung PM981a 1TB + Corsair MP510 1.9GB M.2 SSD

  20. #45
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Awesome work Stilt . I tried on my 750K Piledriver and indeed I get 2.5 seconds lower score .
    I tried some other (non-x87 dominated) workloads and difference is within margin of error. Whatever the limitation was doing is not affecting other commercial benchmarks and software as much as it affect super pi.
    Congrats on beating Fam. 10h record too Stilt

    PS What I find amazing is that users on other ( "inteltech" forum ) already found "reasons" why this is "fail". Instead rooting for people like Stilt, they do opposite of this .Very sad. Thankfully that forum has turned into a niche for intel trolls and shills (mod section too). Here we can show real appreciation for your work.

  21. #46
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    I cant download it ...I got only pop ups :-/. Can u help me? Thank you.

    //edit//:now is OK, some link worked at my side. Awesome, in 4M more than 30s better!
    Last edited by FlanK3r; 06-21-2013 at 02:07 PM.
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  22. #47
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Location
    Hawaii
    Posts
    611
    Has anyone tried using this with other applications yet? Photoshop/lightroom cinebanche, etc?
    Xeon E3-1245 @ Stock | Gigabyte H87N-Wifi | 16GB Crucial Ballistix LP @ 1600Mhz | R7 260x | Much and varied storage

  23. #48
    NooB MOD
    Join Date
    Jan 2006
    Location
    South Africa
    Posts
    5,799
    I don't know about you guys, but I hate waiting for downloads. I've mirrored the file here.

    Many many many thanks to The Stilt for adding some Xtreme to Xtreme Systems for the first time in ages
    Xtreme SUPERCOMPUTER
    Nov 1 - Nov 8 Join Now!


    Quote Originally Posted by Jowy Atreides View Post
    Intel is about to get athlon'd
    Athlon64 3700+ KACAE 0605APAW @ 3455MHz 314x11 1.92v/Vapochill || Core 2 Duo E8500 Q807 @ 6060MHz 638x9.5 1.95v LN2 @ -120'c || Athlon64 FX-55 CABCE 0516WPMW @ 3916MHz 261x15 1.802v/LN2 @ -40c || DFI LP UT CFX3200-DR || DFI LP UT NF4 SLI-DR || DFI LP UT NF4 Ultra D || Sapphire X1950XT || 2x256MB Kingston HyperX BH-5 @ 290MHz 2-2-2-5 3.94v || 2x256MB G.Skill TCCD @ 350MHz 3-4-4-8 3.1v || 2x256MB Kingston HyperX BH-5 @ 294MHz 2-2-2-5 3.94v

  24. #49
    Registered User
    Join Date
    May 2007
    Posts
    20
    Thank you The Stilt, you have made such a great tool to the OC community!

    I have tried with my A10-6800K, it gets 5 seconds lower on my rig.


    Last edited by Tommi_Vercetti; 06-21-2013 at 11:16 PM.

  25. #50
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    Back again.

    Some of the users have been asking why the "Errata Fix" feature doesn't work (i.e. "Fix required" stated even after the Fix button has been pressed). The feature itself is working fine, however I forgot to add a check in the GUI. Also some claims that the software is wrong when it states that the microcode is outdated has emerged.

    So:

    A small update: Bulldozer Conditioner R1.01B

    Original package checksum (MD5): C3C4E3492B3FBFE1079AE5D57C25172B

    Changes:

    - Added a hardware flag to indicate that the errata has been fixed.
    - Changed the way how the software is accessing the cores, the tasks are completed quicker than before
    - An APU specific bug fixed
    - Added information about the most recent microcode and AGESA versions under Info menu.
    - Some small changes to the GUI
    Last edited by The Stilt; 06-22-2013 at 06:44 AM.

Page 2 of 5 FirstFirst 12345 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •