I would suggest to try to get an ASM programmer to make a x87 benchmarking program. I recall seeing several mini programs that did things like that. Here is one.
I would suggest to try to get an ASM programmer to make a x87 benchmarking program. I recall seeing several mini programs that did things like that. Here is one.
Gentlemen...
We have a Ridgeback down!
Few "goosfraba's" were required for the obvious reason in the end of the video...
Stilt you are the man!
I have some questions and some (maybe) helpful information for everyone.
I have been playing with recompiling software used in benchmarks in Gentoo and comparing them to my Windows scores.
So far I have tested:
x264: about 5% to 10% increase in Gentoo
LAME: about 60% increase in Gentoo
Blender: about twice as fast in Gentoo.
Can you perhaps give LAME a shot with this? And maybe Blender?
Looking forward to the release. Is it possible to make a Linux version? Perhaps GCC with bdver2 flags can compile around this issue and in Windows we don't get this sort of optimization. Or perhaps I could even add more to it.
If you are looking for data I would gladly play with this patch and compile open source benchmarks with whatever settings and share the data.
I have a theory that the benchmarks where AMD does really poorly on (LAME, SuperPI, etc) are being artificially limited by software or whatever. Recompiling in GCC and seeing 60% or 100% or better gains is pretty much unheard of in the Gentoo user world.
FX 8350 @ 5.11ghz | Gigabyte 990FXA UD5 | 16GB Mushkin Blackline | 7970 @ 1.2ghz
core i7 920 @ 4.05ghz | asus p6t deluxe | 6GB G. Skill @ ~1.6ghz | 7970 @ 1.2ghz - 6ghz - 1.2v
Opteron 165 @ 2.7Ghz | 1gb G. Skill @ ~520mhz |4870 1GB | asus a8n32 sli
I've wrapped up my tests on LN2 with Richland and everything else too, so I can put some hours on this matter in the weekend.
I'll try the fix with games in which AMD especially is performing poorly at.
After the fix has been released, I'll start fine tuning prefetchers and predictors to see if there is any general or application specific improvements to be juiced out of Bulldozer
Man, you are the MAN! Awesome, I have now happy day because you ! Wow. IF you have still some LN2, can you test Superpi 32M for "wr" with Vishera?
ROG Power PCs - Intel and AMD
CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread
awesome with such tuning in advanced level. im impressed!
I wonder if there are other PI versions that use newest instructions and compare to Intel's?
Vishera 8320@ 5ghz | Gigabyte UD3 | 8gb TridentX 2400 c10| Powercolor 6850 | Thermalight Silver Arrow (bench Super KAZE 3k) | Samsung 830 128gbx2 Raid 0| Fractal case
Hihi,
just remembered another application. There is a Boinc project called "simap" (http://boincsimap.org/boincsimap/), they have not changed their program since ages, it is still x87 based. The core algorithm they used exists already in an SSE2 version, which is ~8 times faster according to a scientific paper of the programmer, but they dont use it. Maybe your patch can help Bulldozer a bit. All work-units are calculated twice for testing on another machine, so you should be able to detect potential calculation bugs. Just check if your patch has a positive effect on the speed and then let it run a few days ;-)
cheers
Last edited by Opteron146; 06-17-2013 at 03:07 AM.
I'm going to try and and write a program that calculates the digits of Pi and then compile it with GCC for different optimization flags.
Wow, so much knowledge here...I'm really interested to see how this pans out. Major props!
There are two kind of news bad and good ones.
Let's get rid of the bad ones first:
Originally I tested this fix on three different CPU/APUs (Richland, Trinity and Vishera).
When I went to verify the effects of the fix on Zambezi the system crashed immediately once the necessary changes were written.
After some research I noticed that these registers do not respond on Zambezi based CPUs.
Upon reading all of them return null values and crash the system unless a special method is used.
At first it appeared that these registers do not exist on Zambezi, however after digging a bit deeper I found indication that the registers are there... But for some reason AMD seem to have protected them with a ESI/EDI password on Zambezi.
They do not require any passwords on any Piledriver based APU/CPU.
So the fix will not be available for Zambezi users.
Sorry for the massive let-down
The the good news:
The software is pretty much finished.
It should be available for download within this week.
After the let-down on Zambezi I felt that something had to be done for Zambezi too.
While it does not result as massive boost as the original fix does it still gives something:
SuperPI 1M: > 1 second improvement
SuperPI 8M: > 10 second improvement
SuperPI 32M: > 35 second improvement
It is called as "Zambezi Stack Special (PD)".
Note: There might also be some performance retardation in some applications when enabled (Zambezi vs. Vishera effect).
Zambezi is significantly faster than Vishera in SuperPI by default so the difference between a "fixed" Vishera and a tuned Zambezi won't be that massive after the "Zambezi Stack Special" configuration.
Last edited by The Stilt; 06-17-2013 at 10:06 PM.
As I stated on HWBOT - you are a magician
MSI MOA 2009 POLAND #3
Gigabyte GOOC 2010 POLAND #2
MSI MOA 2010 POLAND - NOT ORGANISED
ASUS Polish Overclocking Championship 2010 #1
MSI MOA 2011 POLAND - NOT INVITED
HWBOT Country Cup 2011 - POLAND! #1
MSI MOA 2012 EMEA - #1
MSI MOA 2012 WW FINALS - #7
ASUS Open Overclocking Cup AOOC 2012 - #1
HWBOT Country Cup 2012 - POLAND! #2
MSI MOA 2013 EMEA Qualifier - #2
ASUS Open Overclocking Cup AOOC 2013 - #?
MSI MOA 2013 WW FINALS - #?
Awesome research and dev!
| '12 IvyBridge - "ticks different"... | AwardFabrik IvyBridge round I by SoF | AwardFabrik IvyBridge round II by angoholic & stummerwinter
| '11 The SandyBridge madness... | AwardFabrik / Team LDK OC-Season 2011/2012 Opening Event
| '10 Gulftown LaunchDay OC round up @ASUS RIIE | 3DM05 2x GPU WR LIVE @Cebit 2010 @ASUS MIIIE | SandyBridge arrived @ASUS P8P67
| '09 Foxconn Avenger | E8600 | Foxconn A79A-S | Phenom II 940 BE | LaunchDay Phenom II OC round up
| '08 7.438s 1m LN2 | AMD 1m WR LN2 | 2nd AOCM | Phenom II teasing
| '07 100% E2140 | 106.5% E2160 | 100% E4500 | 103% E4400 | 5508 MHZ E6850 | 7250 MHZ P4 641 126.5% by SoF and AwardFabrik Crew all on Gigabyte DS3P c? and LN2...
| '06 3800+ X2 Manchester 0531TPEW noHS 3201MHZ c? | 3200+ Venice noHS 3279MHZ c? | Opteron 148 0536CABYE 3405MHZ c? all on Gigabyte K8NXP-SLI compressorcooled
| '05 3500+[NC], 3000+[W], 2x 3200+[W], 3500+[NC], 3200+[V] 0516GPDW
Originally Posted by saaya
So, it is friday today isn't it
Bulldozer Conditioner R1.00B
The checksum (MD5) for the zip file is: 418522A93F241CF14EB1D775839AB083
If the checksum does not match the package has been tampered with = delete and re-download from another location.
The checksum can be calculated online if you don't have a suitable software on your computer.
http://onlinemd5.com/
There is not a single bit of malicious code either in the driver or the software itself.
If you are unsure, please check the contents with https://www.virustotal.com
Supported OS: Windows XP / Windows Vista / Windows 7 / Windows 8 (32 & 64-bit)
The x86 version works in both 32 & 64-bit operating systems, while the x64 version is 64-bit only.
The functionality itself is identical between the versions.
Known limitations: Up to 16 CUs (32 cores) supported at the moment. Support for 32CUs (64 cores) will be added in the next version.
Also the R1.00B (Beta) version does not contain the feature to patch the microcode block as I could not make it work stable enough.
The "Errata Fix" button will fix the major errata which can be patched without updating the microcode.
This feature should not be used as a permanent solution, the bios update should still be used as a primary method (updated AGESA + microcode).
Note: Enabling "Zambezi Stack Special (PD)" feature might cause undefined behavior, however each user should test it's functionality on their own. Some applications might indicate a minor retardation in performance, however SuperPI for example receives a nice boost.
Note: "x87 instruction (NRAC) block" -> Enabled means that the instruction is blocked (default on all 15h family APU/CPU/NPUs). Disabling it make the SuperPI "a bit" faster.
There are most certainly some bugs, so in case you come across one, please report them to this thread.
The experiences are very welcome also.
No it is time for the midsummer parties so I might be away for a day or two.
Depending on how epic the headache shall be
Update on 06/22/2013: Bulldozer Conditioner R1.01B
Last edited by The Stilt; 06-22-2013 at 06:47 AM.
First, thanks very much for sharing this with us. It's very interesting.
Some quick dirty tests on an FX 8320 @ 4Ghz, before I go to bed.
..Definatly an improvement in super pi, which despite it's irrelevance now as an overall perf metric is still amazing to see the effect of a simple register change.
It also shows how little AMD care about this bench, despite the odd website still persisting with it when reviewing against the competition. Perhaps AMD's PR should be more vocal about why their CPU's are so slow at this piece of software, because there's a surprising amount of pseudo-enthusiasts out there that still judge by it.
I just test it with my FX-6300 ( at stock with Turbo Core Enable)
Running SuperPi mod 1.5 XS edition
Before the patching
1M takes around 24s~
and after the patch applied
1M takes 20s now
Really cool and great job!
Awesome work Stilt . I tried on my 750K Piledriver and indeed I get 2.5 seconds lower score .
I tried some other (non-x87 dominated) workloads and difference is within margin of error. Whatever the limitation was doing is not affecting other commercial benchmarks and software as much as it affect super pi.
Congrats on beating Fam. 10h record too Stilt
PS What I find amazing is that users on other ( "inteltech" forum ) already found "reasons" why this is "fail". Instead rooting for people like Stilt, they do opposite of this .Very sad. Thankfully that forum has turned into a niche for intel trolls and shills (mod section too). Here we can show real appreciation for your work.
I cant download it ...I got only pop ups :-/. Can u help me? Thank you.
//edit//:now is OK, some link worked at my side. Awesome, in 4M more than 30s better!
Last edited by FlanK3r; 06-21-2013 at 02:07 PM.
ROG Power PCs - Intel and AMD
CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread
Has anyone tried using this with other applications yet? Photoshop/lightroom cinebanche, etc?
Xeon E3-1245 @ Stock | Gigabyte H87N-Wifi | 16GB Crucial Ballistix LP @ 1600Mhz | R7 260x | Much and varied storage
I don't know about you guys, but I hate waiting for downloads. I've mirrored the file here.
Many many many thanks to The Stilt for adding some Xtreme to Xtreme Systems for the first time in ages
Xtreme SUPERCOMPUTER
Nov 1 - Nov 8 Join Now!
Athlon64 3700+ KACAE 0605APAW @ 3455MHz 314x11 1.92v/Vapochill || Core 2 Duo E8500 Q807 @ 6060MHz 638x9.5 1.95v LN2 @ -120'c || Athlon64 FX-55 CABCE 0516WPMW @ 3916MHz 261x15 1.802v/LN2 @ -40c || DFI LP UT CFX3200-DR || DFI LP UT NF4 SLI-DR || DFI LP UT NF4 Ultra D || Sapphire X1950XT || 2x256MB Kingston HyperX BH-5 @ 290MHz 2-2-2-5 3.94v || 2x256MB G.Skill TCCD @ 350MHz 3-4-4-8 3.1v || 2x256MB Kingston HyperX BH-5 @ 294MHz 2-2-2-5 3.94v
Back again.
Some of the users have been asking why the "Errata Fix" feature doesn't work (i.e. "Fix required" stated even after the Fix button has been pressed). The feature itself is working fine, however I forgot to add a check in the GUI. Also some claims that the software is wrong when it states that the microcode is outdated has emerged.
So:
A small update: Bulldozer Conditioner R1.01B
Original package checksum (MD5): C3C4E3492B3FBFE1079AE5D57C25172B
Changes:
- Added a hardware flag to indicate that the errata has been fixed.
- Changed the way how the software is accessing the cores, the tasks are completed quicker than before
- An APU specific bug fixed
- Added information about the most recent microcode and AGESA versions under Info menu.
- Some small changes to the GUI
Last edited by The Stilt; 06-22-2013 at 06:44 AM.
Bookmarks