Results 1 to 23 of 23

Thread: Bulldozer benchmarks across review sites

  1. #1

  2. #2
    Registered User
    Join Date
    Oct 2011
    Location
    Romania
    Posts
    6
    Indeed...

    Main RIG - [I5 2500K + HR-02 Macho][MSI Z68A GD80 G3][8 GB G.Skill RipJaws 1600+][Intel HD3000 (for now)][1TB F3 +750 GB WD HDD +60 GB F60][SEASONIC X-660][A4Tech X7+MS NK4000][BenQ 2420HDBL+DELL 22"][BluRay LG BH10LS30][NZXT Phantom Black]
    Media RIG - [E8400@3600 + Ninja][ASUS P5K][4 GB ADATA EE 800+][8800 GT GS][640 GB WD HDD][Enermax 370W][Dell Keyboard][Horizon 22"][Pioneer DVD-RW][Antec PlusView II]
    Old RIG - [AMD Athlon XP 2200+ ][ASROCK K880 Upgrade][1 GB DDR400][GeForce 2 MX400][20 GB + 2x80 GB RAID 0][No Name PSU][Chicony KB][MS Intellimouse 1.0][LG DVD-R][Clio 2 Case]

  3. #3
    Xtreme Addict
    Join Date
    May 2005
    Posts
    1,341
    i am sure you are delighted to post these here, it is very clear that something is really hitting the single threaded performance and the reason to the rather low BD results....
    Quote Originally Posted by Movieman View Post
    Fanboyitis..
    Comes in two variations and both deadly.
    There's the green strain and the blue strain on CPU.. There's the red strain and the green strain on GPU..

  4. #4
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by duploxxx View Post
    i am sure you are delighted to post these here, it is very clear that something is really hitting the single threaded performance and the reason to the rather low BD results....
    Delighted? If I'm delighted is at interpreting correctly and making an accurate assesement of BD's performance over 1 year ago. I can't be delighted at AMD's misery or the sorrow experienced by its fans.
    What I'm pissed at are the numerous fanboys who constantly booed, insulted and mocked me and others and now have dissapeared without a trace.


    And the results are "rather low" only for those who didn't follow the whole saga.

    -the uarch was simplified, less execution INT units, FPU shared, longer instruction latencies , loosing 5% IPC for 20% more frequency ( this was the plan )
    -the implementation sucked because if was done with synthetised tools ( 20% larger blocks and 20% slower than hand crafted )
    -the process is borked, variability is high, yields abysimal and power is out of control ( frequency is lower than expected and parts are more power hungry )

    This things were known for a year. Only one who did not want to see the truth could be surprised now at the performance.

    They expected the a slight loss in IPC for existing code compensated by the increased frequency. The frequency gains were reduced in the implementation phase due to a high use of automated placing and routing tools, the end result being a part 20% larger and 20% slower than it could have been had it been carefully optimized. Yet, on top of that, the process failed to live to the expectations. It's a whole chain of events which killed BD.

    Had everything worked right, taking a base Thuban of 3.3GHz, a 32nm BD should have lost 5% on IPC, gain 20% frewquency by uarch ( 4GHz ) and another 20% from process ( 45->32nm ). We would have had a 4.5-4.8GHz BD with slightly less IPC than a Thuban, in other words a part performing 30-100% ( in AVX/FMA cases ) better than Thuban. What we ended up with is a part that is +/- 10% of Thuban while being hotter.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  5. #5
    Registered User
    Join Date
    Jul 2009
    Posts
    62
    this is netburst all over again, hopefully future revisions fix this because at current it isn't even worth its price compared to a 1100T, let alone 2500k which is still cheaper.

  6. #6
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    http://www.kitguru.net/components/cp...te-990fxa-ud7/

    ^ Gigabyte UD7 F5 bios. was a nice read at least for me :/
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  7. #7
    Xtreme Member
    Join Date
    Jul 2010
    Location
    EU, USA
    Posts
    150
    Just wait for C3

  8. #8
    Registered User
    Join Date
    Aug 2011
    Posts
    73
    As an AMD consumer I'm pretty sad about these results, but lets face it, those that call themselves CPU experts (Hans de Vries & the rest of the fun factory) were the ones that made the most wrong predictions about BD performance.
    JF-AMD / Hans de Vries / informal posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (12th October 2011)

  9. #9
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by savantu View Post
    And the results are "rather low" only for those who didn't follow the whole saga.

    -the uarch was simplified, less execution INT units, FPU shared, longer instruction latencies , loosing 5% IPC for 20% more frequency ( this was the plan )
    -the implementation sucked because if was done with synthetised tools ( 20% larger blocks and 20% slower than hand crafted )
    -the process is borked, variability is high, yields abysimal and power is out of control ( frequency is lower than expected and parts are more power hungry )

    This things were known for a year. Only one who did not want to see the truth could be surprised now at the performance.

    They expected the a slight loss in IPC for existing code compensated by the increased frequency. The frequency gains were reduced in the implementation phase due to a high use of automated placing and routing tools, the end result being a part 20% larger and 20% slower than it could have been had it been carefully optimized. Yet, on top of that, the process failed to live to the expectations. It's a whole chain of events which killed BD.
    If these are your points, then you got a seemingly right result by using some false input

    I agree with you on the process thing.

    But

    1. The execution hasn't been hit as much as one might think. I've shown that here:
    http://translate.google.com/translat...#content_start
    Performance problems very likely a result of legacy code "enjoying" many pitfalls, which in turn cause bottlenecks, e.g. right in the decoding stage. But engineers didn't develop an revolutionary architecture to improve execution of K8/K10/NHLM/SB code. If they'd wanted to do that, they'd further improved 10h at less costs.

    2. You mix the synthetized design of Bobcat with Bulldozer. And even in case of Bobcat they can still use hand crafted macros to mitigate that effect. Same for ARM.

    Quote Originally Posted by Shadowized View Post
    this is netburst all over again, hopefully future revisions fix this because at current it isn't even worth its price compared to a 1100T, let alone 2500k which is still cheaper.
    I agree with you, that this is a "Pentium-4-Launch"-like situation. But that hasn't been fixed via hardware (Northwood after Willamette still was a Netburst design), but by software (more recent software has been optimized for this design).
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  10. #10
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by Brice MJ View Post
    As an AMD consumer I'm pretty sad about these results, but lets face it, those that call themselves CPU experts (Hans de Vries & the rest of the fun factory) were the ones that made the most wrong predictions about BD performance.
    I'm not aware of Bulldozer performance predictions by Hans de Vries. Can you point me to such predictions?
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  11. #11
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Dresdenboy View Post
    If these are your points, then you got a seemingly right result by using some false input
    Mine ? 2 of them are from ex-AMD senior architects and the 3rd, GF is a compilation of available data. Given the input, you are surprised of coming to the right result ? Shocking.
    But

    1. The execution hasn't been hit as much as one might think. I've shown that here:
    http://translate.google.com/translat...#content_start
    Performance problems very likely a result of legacy code "enjoying" many pitfalls, which in turn cause bottlenecks, e.g. right in the decoding stage. But engineers didn't develop an revolutionary architecture to improve execution of K8/K10/NHLM/SB code. If they'd wanted to do that, they'd further improved 10h at less costs.
    Legacy code ? What exactly is legacy ? Everything not compiled with BD uarch in mind, XOP and FMA ? I'm not mentioning AVX since the implementation seems extremely fragile and causes performance degradation when used. ( see thread at RWT )

    In x86 you need to maintain performance and old, legacy ( that is SW from 5-10 years ago ), current one and the future ( AVX, FMA3 ). There is only one culprit to blame if BD does poorly on legacy and current SW. That's AMD management.

    They will learn the hard way the Pentium 4 lesson : even Intel couldn't get code optimized fast enough ( and back then there weren't so many ISVs and standards, OpenCL,Java, uarch-independent source code, etc ). By the time SSEx SW and TLP started to catch on, Pentium 4 was on the chopping block and Core was on the horizon.

    AMD launches BD now. By the time FMA code will be common, BD will be like P4 today, a distant, ugly memory. As for XOP and BD optimized code paths, given its lackluster reception, most developers will avoid investing too much time and resources.

    Btw, BD reinforces the fact that optimizing for product famillies and not just checking for features ( like Intel's compiler does ) is the right way with BD. BD has AVX for example. But its implementation is suboptimal . It should be avoided. If the Intel compiler would check the feature flags, it will churn auto AVX 256bit code which will have a terrible performance for BD. Eric Bron mentioned this interesting twist.

    http://wwww.realworldtech.com/beta/f...22980&roomid=2
    2. You mix the synthetized design of Bobcat with Bulldozer. And even in case of Bobcat they can still use hand crafted macros to mitigate that effect. Same for ARM.
    You're basically arguing with an AMD senior architect,Cliff Meier, the ex-guy in charge of the design flow.

    I don't know. It happened before I left, and there was very little cross-engineering going on. What did happen is that management decided there SHOULD BE such cross-engineering ,which meant we had to stop hand-crafting our CPU designs and switch to an SoC design style. This results in giving up a lot of performance, chip area, and efficiency. The reason DEC Alphas were always much faster than anything else is they designed each transistor by hand. Intel and AMD had always done so at least for the critical parts of the chip. That changed before I left - they started to rely on synthesis tools, automatic place and route tools, etc. I had been in charge of our design flow in the years before I left, and I had tested these tools by asking the companies who sold them to design blocks (adders, multipliers, etc.) using their tools. I let them take as long as they wanted. They always came back to me with designs that were 20% bigger, and 20% slower than our hand-crafted designs, and which suffered from electromigration and other problems.
    His comments are directly related to BD.

    On paper bulldozer is a lovely chip. Bulldozer was on the drawing board (people were even working on it) even back when I was there. All I can say is that by the time you see silicon for sale, it will be a lot less impressive, both in its own terms and when compared to what Intel will be offering. I don't really want to reveal what I know about Bulldozer from my time at AMD, nor do I want to go into details of what my friends are telling me.
    He was spot on 2 years ago.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  12. #12
    Xtreme Mentor
    Join Date
    Feb 2009
    Location
    Bangkok,Thailand (DamHot)
    Posts
    2,693
    many people don't rush to buy fx after read those review
    Intel Core i5 6600K + ASRock Z170 OC Formula + Galax HOF 4000 (8GBx2) + Antec 1200W OC Version
    EK SupremeHF + BlackIce GTX360 + Swiftech 655 + XSPC ResTop
    Macbook Pro 15" Late 2011 (i7 2760QM + HD 6770M)
    Samsung Galaxy Note 10.1 (2014) , Huawei Nexus 6P
    [history system]80286 80386 80486 Cyrix K5 Pentium133 Pentium II Duron1G Athlon1G E2180 E3300 E5300 E7200 E8200 E8400 E8500 E8600 Q9550 QX6800 X3-720BE i7-920 i3-530 i5-750 Semp140@x2 955BE X4-B55 Q6600 i5-2500K i7-2600K X4-B60 X6-1055T FX-8120 i7-4790K

  13. #13
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by savantu View Post
    Mine ? 2 of them are from ex-AMD senior architects and the 3rd, GF is a compilation of available data. Given the input, you are surprised of coming to the right result ? Shocking.
    OK. An article citing AMD's design methology for Bobcat (incl. the 20% remark):
    http://tech.icrontic.com/news/amd-sa...tests-ontario/
    And cmaier citing how AMD management asked engineers to use new methods and how this costs 20%:
    http://forums.macrumors.com/showpost...&postcount=559

    But you also know how synthesized logic does look like? Bobcat clearly shows this. I can assure you it doesn't look as regular logic in a Bulldozer module. OTOH I know that there are tools to generate a multiplier etc. (still looking regular) - this has for example been used by one of AMD's FPU architects for his dissertation research.


    Quote Originally Posted by savantu View Post
    Legacy code ? What exactly is legacy ? Everything not compiled with BD uarch in mind, XOP and FMA ? I'm not mentioning AVX since the implementation seems extremely fragile and causes performance degradation when used. ( see thread at RWT )

    In x86 you need to maintain performance and old, legacy ( that is SW from 5-10 years ago ), current one and the future ( AVX, FMA3 ). There is only one culprit to blame if BD does poorly on legacy and current SW. That's AMD management.

    They will learn the hard way the Pentium 4 lesson : even Intel couldn't get code optimized fast enough ( and back then there weren't so many ISVs and standards, OpenCL,Java, uarch-independent source code, etc ). By the time SSEx SW and TLP started to catch on, Pentium 4 was on the chopping block and Core was on the horizon.

    AMD launches BD now. By the time FMA code will be common, BD will be like P4 today, a distant, ugly memory. As for XOP and BD optimized code paths, given its lackluster reception, most developers will avoid investing too much time and resources.

    Btw, BD reinforces the fact that optimizing for product famillies and not just checking for features ( like Intel's compiler does ) is the right way with BD. BD has AVX for example. But its implementation is suboptimal . It should be avoided. If the Intel compiler would check the feature flags, it will churn auto AVX 256bit code which will have a terrible performance for BD. Eric Bron mentioned this interesting twist.
    http://wwww.realworldtech.com/beta/f...22980&roomid=2[/QUOTE]
    With "legacy software" I mean the existing code base. And even unless there is a lot of ASM code in it (as in x264 or Prime95) it shouldn't be too difficult to adapt it. It's a similar thing as AMD's Fusion approach: They heavily depend on software supporting it. And they told us so on slides since 2 or more years.

    That 256b AVX might perform worse than 128b AVX is not new to me and you surely know that. SB needs AVX to access additional throughput. BD does not. Instead it just adds overhead and some complexity. This is why for GCC there was a patch to go AVX128 which gains some percent in performance. 128b AVX has the advantage of the 3 operand form.

    I already discussed Dark_Shikari's findings at several places. And even AMD's compiler guide suggests to avoid the AVX flag when using the Intel compiler Problem here very likely is that AVX for ICC means to use 256b if possible.

    As I wrote on P3D some managed code environments might benefit early since they contain JIT compilers, which optimze byte or p code on the fly. Adapting the compilers is enough to improve performance of existing apps.

    And regarding P4:
    IIRC it didn't take that long for software updates. There are still the regular upgrade cycles of software and often with the next scheduled cycle there came the necessary update.

    OTOH it seems that "Bulldozer" what we are currently looking at is just the first working incarnation. You know the roadmap and I also posted early enough about BDv2, Steamroller and other bits. Further there are things, which were mentioned in the patents but didn't show up so far, while 80% or so became real silicon.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  14. #14
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Using my Pentium D 945 as an example, it works friggin great in Vista/7 64bit. We all know P4s were terrible with XP for the most part. It seems sofware caught up with the P4s design. Unfortunately software needs to catch up to Bulldozer as well, but AMD isn't Intel. Software engineers likely aren't as motivated to code for this radical new design.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  15. #15
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by Mechromancer View Post
    Using my Pentium D 945 as an example, it works friggin great in Vista/7 64bit. We all know P4s were terrible with XP for the most part. It seems sofware caught up with the P4s design. Unfortunately software needs to catch up to Bulldozer as well, but AMD isn't Intel. Software engineers likely aren't as motivated to code for this radical new design.
    As long as they don't use assembler most of the time, they just might have to update their compiler and check the correct flags. I see the problem in the instruction scheduling, sth that happens in the compiler back end invisible to the user.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  16. #16
    Registered User
    Join Date
    Oct 2006
    Posts
    12
    Remember back when AMD used to release drivers for the CPU in Windows XP days? Why can't they do the same thing now with Windows Vista or Windows 7?? Wouldn't this help to resolve these processor single-threaded traffic issues??
    Antec 1200 Case
    Thermaltake TR2-850watt Black Widow PSU
    AMD Phenom II 965 Revision C2
    Asus Crosshair IV Formula
    16gb Kingston HyperX DDR3 1600mhz RAM
    2 x 120gb Kingston HyperX SSD in Raid-0
    WD Velicoraptor 300gb 10000rpm HDD x1
    WD 320gb 7200rpm HDD x2
    Seagate 250gb 7200rpm HDD x1
    Windows 7 Ultimate
    HP LP2475w LCD
    XFX ATI Radeon HD 6970

  17. #17
    Xtreme Cruncher
    Join Date
    Apr 2008
    Location
    Ohio
    Posts
    3,119
    anyone see this review? Thought it was pretty good.

    http://www.kitguru.net/components/cp...te-990fxa-ud7/
    ~1~
    AMD Ryzen 9 3900X
    GigaByte X570 AORUS LITE
    Trident-Z 3200 CL14 16GB
    AMD Radeon VII
    ~2~
    AMD Ryzen ThreadRipper 2950x
    Asus Prime X399-A
    GSkill Flare-X 3200mhz, CAS14, 64GB
    AMD RX 5700 XT

  18. #18
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by demonkevy666 View Post
    http://www.kitguru.net/components/cp...te-990fxa-ud7/

    ^ Gigabyte UD7 F5 bios. was a nice read at least for me :/
    Quote Originally Posted by charged3800z24 View Post
    anyone see this review? Thought it was pretty good.

    http://www.kitguru.net/components/cp...te-990fxa-ud7/
    ME at least! LOL at me being ignored here
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  19. #19
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    It's kitguru...
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  20. #20
    Xtreme Cruncher
    Join Date
    Apr 2008
    Location
    Ohio
    Posts
    3,119
    Quote Originally Posted by demonkevy666 View Post
    ME at least! LOL at me being ignored here
    Must have missed it, lol... sorry
    ~1~
    AMD Ryzen 9 3900X
    GigaByte X570 AORUS LITE
    Trident-Z 3200 CL14 16GB
    AMD Radeon VII
    ~2~
    AMD Ryzen ThreadRipper 2950x
    Asus Prime X399-A
    GSkill Flare-X 3200mhz, CAS14, 64GB
    AMD RX 5700 XT

  21. #21
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    I didn't think kitguru was all that reliable.
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  22. #22
    Xtreme Member
    Join Date
    Jul 2009
    Posts
    146
    http://www.hardwareheaven.com/review...roduction.html

    I cant belive no one mention this..

    Look really good in some games.
    MBO: ASRock 990FX Fatal1ty Professional
    CPU: AMD FX-8120 @ 4GHz 1.2V
    Cooling: Thermalright HR-02 w/ Zalman ZM-SF3 120mm
    RAM: 2x 4GB 2133MHz - Patriot Viper Xtreme Division 2
    GPU: XFX R7770 Core Edition
    SSD: Kingston V200+ 90GB
    HDD: 2x WD 2TB Green + WD 3TB Green
    Drive: LiteON iHAS122 @ iHAS324
    PSU: Corsair VX550
    Case: LanCool Dragonlord K-62

    Display/TV: Samsung LE32D550 32''
    Sound System: ASUS Xonar D2X + Logitech Z-906

    Mouse A: Micro$oft Natural Ergonomic Wifi 6000
    Mouse B: Logitech MX518
    Mousepad: XFX Warpad XXL
    Keyboard: Micro$oft Natural Ergonomic Wired 4000
    Joystick: Logitech Freedom 2.4GHz Cordless

  23. #23
    Registered User
    Join Date
    Oct 2008
    Posts
    23
    What I've noticed is that BD performs decently (not a world beater by any means, but in the 2500K class) as long as your not using an ASUS board. No surprise there.

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •