I would suggest to try to get an ASM programmer to make a x87 benchmarking program. I recall seeing several mini programs that did things like that. Here is one.
I would suggest to try to get an ASM programmer to make a x87 benchmarking program. I recall seeing several mini programs that did things like that. Here is one.
Gentlemen...
We have a Ridgeback down!
http://www.youtube.com/watch?v=mElg7ioNmIM
Few "goosfraba's" were required for the obvious reason in the end of the video...
Stilt you are the man!
I have some questions and some (maybe) helpful information for everyone.
I have been playing with recompiling software used in benchmarks in Gentoo and comparing them to my Windows scores.
So far I have tested:
x264: about 5% to 10% increase in Gentoo
LAME: about 60% increase in Gentoo
Blender: about twice as fast in Gentoo.
Can you perhaps give LAME a shot with this? And maybe Blender?
Looking forward to the release. Is it possible to make a Linux version? Perhaps GCC with bdver2 flags can compile around this issue and in Windows we don't get this sort of optimization. Or perhaps I could even add more to it.
If you are looking for data I would gladly play with this patch and compile open source benchmarks with whatever settings and share the data.
I have a theory that the benchmarks where AMD does really poorly on (LAME, SuperPI, etc) are being artificially limited by software or whatever. Recompiling in GCC and seeing 60% or 100% or better gains is pretty much unheard of in the Gentoo user world.
I've wrapped up my tests on LN2 with Richland and everything else too, so I can put some hours on this matter in the weekend.
I'll try the fix with games in which AMD especially is performing poorly at.
After the fix has been released, I'll start fine tuning prefetchers and predictors to see if there is any general or application specific improvements to be juiced out of Bulldozer :)
Man, you are the MAN! Awesome, I have now happy day because you :)! Wow. IF you have still some LN2, can you test Superpi 32M for "wr" with Vishera?
awesome with such tuning in advanced level. im impressed! :up:
I wonder if there are other PI versions that use newest instructions and compare to Intel's?
Hihi,
just remembered another application. There is a Boinc project called "simap" (http://boincsimap.org/boincsimap/), they have not changed their program since ages, it is still x87 based. The core algorithm they used exists already in an SSE2 version, which is ~8 times faster according to a scientific paper of the programmer, but they dont use it. Maybe your patch can help Bulldozer a bit. All work-units are calculated twice for testing on another machine, so you should be able to detect potential calculation bugs. Just check if your patch has a positive effect on the speed and then let it run a few days ;-)
cheers
I'm going to try and and write a program that calculates the digits of Pi and then compile it with GCC for different optimization flags.
Wow, so much knowledge here...I'm really interested to see how this pans out. Major props!
There are two kind of news bad and good ones.
Let's get rid of the bad ones first:
Originally I tested this fix on three different CPU/APUs (Richland, Trinity and Vishera).
When I went to verify the effects of the fix on Zambezi the system crashed immediately once the necessary changes were written.
After some research I noticed that these registers do not respond on Zambezi based CPUs.
Upon reading all of them return null values and crash the system unless a special method is used.
At first it appeared that these registers do not exist on Zambezi, however after digging a bit deeper I found indication that the registers are there... But for some reason AMD seem to have protected them with a ESI/EDI password on Zambezi.
They do not require any passwords on any Piledriver based APU/CPU.
So the fix will not be available for Zambezi users.
Sorry for the massive let-down :(
The the good news:
The software is pretty much finished.
It should be available for download within this week.
After the let-down on Zambezi I felt that something had to be done for Zambezi too.
While it does not result as massive boost as the original fix does it still gives something:
SuperPI 1M: > 1 second improvement
SuperPI 8M: > 10 second improvement
SuperPI 32M: > 35 second improvement
It is called as "Zambezi Stack Special (PD)".
Note: There might also be some performance retardation in some applications when enabled (Zambezi vs. Vishera effect).
Zambezi is significantly faster than Vishera in SuperPI by default so the difference between a "fixed" Vishera and a tuned Zambezi won't be that massive after the "Zambezi Stack Special" configuration.
http://imageshack.us/a/img526/3776/k7ez.jpg
As I stated on HWBOT - you are a magician :)
My wife's 8350 waits in anticipation of your hard work.
I'm very enthusiastic about this, and for amd at the moment.
Awesome research and dev! :clap:
So, it is friday today isn't it ;)
Bulldozer Conditioner R1.00B
The checksum (MD5) for the zip file is: 418522A93F241CF14EB1D775839AB083
If the checksum does not match the package has been tampered with = delete and re-download from another location.
The checksum can be calculated online if you don't have a suitable software on your computer.
http://onlinemd5.com/
There is not a single bit of malicious code either in the driver or the software itself.
If you are unsure, please check the contents with https://www.virustotal.com
Supported OS: Windows XP / Windows Vista / Windows 7 / Windows 8 (32 & 64-bit)
The x86 version works in both 32 & 64-bit operating systems, while the x64 version is 64-bit only.
The functionality itself is identical between the versions.
Known limitations: Up to 16 CUs (32 cores) supported at the moment. Support for 32CUs (64 cores) will be added in the next version.
Also the R1.00B (Beta) version does not contain the feature to patch the microcode block as I could not make it work stable enough.
The "Errata Fix" button will fix the major errata which can be patched without updating the microcode.
This feature should not be used as a permanent solution, the bios update should still be used as a primary method (updated AGESA + microcode).
Note: Enabling "Zambezi Stack Special (PD)" feature might cause undefined behavior, however each user should test it's functionality on their own. Some applications might indicate a minor retardation in performance, however SuperPI for example receives a nice boost.
Note: "x87 instruction (NRAC) block" -> Enabled means that the instruction is blocked (default on all 15h family APU/CPU/NPUs). Disabling it make the SuperPI "a bit" faster.
There are most certainly some bugs, so in case you come across one, please report them to this thread.
The experiences are very welcome also.
No it is time for the midsummer parties so I might be away for a day or two.
Depending on how epic the headache shall be ;)
Update on 06/22/2013: Bulldozer Conditioner R1.01B
First, thanks very much for sharing this with us. It's very interesting.
Some quick dirty tests on an FX 8320 @ 4Ghz, before I go to bed.
..Definatly an improvement in super pi, which despite it's irrelevance now as an overall perf metric is still amazing to see the effect of a simple register change.
It also shows how little AMD care about this bench, despite the odd website still persisting with it when reviewing against the competition. Perhaps AMD's PR should be more vocal about why their CPU's are so slow at this piece of software, because there's a surprising amount of pseudo-enthusiasts out there that still judge by it.
http://i1149.photobucket.com/albums/...ps3bf16e4b.png
http://i1149.photobucket.com/albums/...ps2572d329.png
I just test it with my FX-6300 ( at stock with Turbo Core Enable)
Running SuperPi mod 1.5 XS edition
Before the patching
1M takes around 24s~
and after the patch applied
1M takes 20s now
Really cool and great job! :up:
Awesome work Stilt :). I tried on my 750K Piledriver and indeed I get 2.5 seconds lower score :).
I tried some other (non-x87 dominated) workloads and difference is within margin of error. Whatever the limitation was doing is not affecting other commercial benchmarks and software as much as it affect super pi.
Congrats on beating Fam. 10h record too Stilt :up:
PS What I find amazing is that users on other ( "inteltech" forum ;)) already found "reasons" why this is "fail". Instead rooting for people like Stilt, they do opposite of this .Very sad. Thankfully that forum has turned into a niche for intel trolls and shills (mod section too). Here we can show real appreciation for your work.
I cant download it :(...I got only pop ups :-/. Can u help me? Thank you.
//edit//:now is OK, some link worked at my side:). Awesome, in 4M more than 30s better!
Has anyone tried using this with other applications yet? Photoshop/lightroom cinebanche, etc?
I don't know about you guys, but I hate waiting for downloads. I've mirrored the file here.
Many many many thanks to The Stilt for adding some Xtreme to Xtreme Systems for the first time in ages :up:
Thank you The Stilt, you have made such a great tool to the OC community!
I have tried with my A10-6800K, it gets 5 seconds lower on my rig.
http://i265.photobucket.com/albums/i...abled_nrac.png
http://i265.photobucket.com/albums/i...abled_nrac.png
Back again.
Some of the users have been asking why the "Errata Fix" feature doesn't work (i.e. "Fix required" stated even after the Fix button has been pressed). The feature itself is working fine, however I forgot to add a check in the GUI. Also some claims that the software is wrong when it states that the microcode is outdated has emerged.
So:
A small update: Bulldozer Conditioner R1.01B
Original package checksum (MD5): C3C4E3492B3FBFE1079AE5D57C25172B
Changes:
- Added a hardware flag to indicate that the errata has been fixed.
- Changed the way how the software is accessing the cores, the tasks are completed quicker than before
- An APU specific bug fixed
- Added information about the most recent microcode and AGESA versions under Info menu.
- Some small changes to the GUI