Results 1 to 12 of 12

Thread: Bobcat Core performance analysis - Bobcat vs K10 vs K8

  1. #1
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264

    Bobcat Core performance analysis - Bobcat vs K10 vs K8

    Hi guys

    This should have been posted some time ago, but I was intending to include more data. Now the platforms are set up again (minus K8 at the moment) I can add to this, but I should get the thread rolling regardless. Before I say any more in order to increase the amount of data.. I'm after suggestions on benchmarks to run, as long as its free.


    The idea behind this little write up is to look at the strengths and weaknesses of bobcat next to its only Current (and superseded) cousins in the AMD lineup, K10 and K8.
    As we know bobcat is an interesting design, and remains mostly two issue throughout, with a very "light weight" FPU . A lot of details are missing from the disclosed details, such as buffer sizes etc, but it likely to be quite small next to both K8 and K10 given the size and power targets.

    It offsets some of these power and size saving features with architectural enhancments missing from K10, and more so from K8. So I thought it interesting to see where the bottlenecks lie in all uArchs. K8 was brought in simply beause of its age, This was once a high power, high performance uArch considered to have High IPC prior to Core 2..

    I've done up a very basic high level overview of the core next to its desktop cousins, minus the details we don't know:







    To do this comparo i've chosen the following CPU's , and configurations to compare the architectures as best possible on a clk/clk basis.

    Bobcat:

    * E-350 'Zacate' @1.6ghz , 512KB L2 Cache (Gigabyte E350N mb)
    * DDR3 1066-7-7-7-18

    K8:

    * AMD Athlon X2 5200+ Brisbane @ 1.6ghz, 2x 512KB L2 Cache
    * DDR2 800 5-5-5-15

    K10:

    * AMD Athlon II X4 630 @ 1.6Ghz 2 Cores disabled in BIOS, 2x 512KB L2 Cache
    * DDR3 1066-7-7-7-18
    * CPU-NB 1.6Ghz

    All these platforms use a discrete HD5570, so no sharing of mem bw with the GPU / IGP's.

    Performance: Clk/Clk

    The Benchmarks chosen were mostly at hand, and common place. Lots of synthetics but hopefully the more software orientated people can shed light on the type of code some of these would generally be pushing. I've segregated best I can to my knowledge into Integer, FPU , and SIMD


    Now, unfortunatly, whilst I have an Atom Netbook, it's not running windows 7, has little RAM and is only single core so I cannot include meaningful data to throw it into the mix, but maybe down the line I will


    Anyway, on with performance results first up.

    Now, this is not perfect. I probably should have limited benchmarks to a single core (affinity) to take multi core scaling out of the equation since we're looking purely at core performance (as opposed to looking at the CPU performance as a whole) but when i realised this it was a bit late, i'd already moved to the other platform, so I carried on. The differences would be very minor as none of these CPU's share cache. On followup benches I'll do this though.





    Plenty to discuss. What I'd like is suggestions PLEASE on some more integer heavy benchmarks in particular. The small selection so far aren't enough to make the percentage totals meaningful enough.

    The highlights are:

    Super Pi 1M: Faster than K8, and almost as fast as K10

    Now we know, as I've always suspected, that Cache, Decoder width, and even raw execution resources / FPU size are not bottlenecks for the very old super Pi.

    SSE performance:, Very low.. clearly this is the biggest comprimise here. Rendering, encoding all perform poorly. This architecture being aimed at providing strong general performance over encoding rendering, SSE heavy apps is evident.. Even K8 has a sizable lead in some cases.

    Quite a few interesitng outliers in both directions.


    Power consumption:


    Now for this part, it gets a bit complicated. We have to throw out the K8, as it's built on 65nm, and also the propus, as its two disabled cores draw excess power. Also the Discrete card gets the flick and we look at IGP consumption on both platforms.

    Instead, our nearest competitor is the Athlon II X2 , with 1MB L2 / core. Sitting into the lowest consumption IGP board I had available, the Gigabyte GA-MA78-LMT mainboard with the 760G chipset.

    Powering everything is a 150XT Pico-psu running of 12v. Power consumption of the system (Mainboard and HDD) is measured at the input to the pico PSU using a calibrated Multimeter. Because the 12V rail is directed sraight through from my power supply, it's 100% efficient. only the 5 and 3.3v rails are regulated at ~90% efficiency. So, our end power consumption measurement should be pretty damn accurate.

    E-350 motherboard choice.

    In order to cater for the different goals, I've had to use two different motherboards. Why? , Well because the Gigabyte E350N uses a 3 phase PWM with 4pin 12v supply, plus it contains a USB3 chip. These pump up the platform consumption next to the Asrock E350M which uses what appears to be a very simple Single or 2 phase PWM, and has no USB3 chip. The resulting power consumption (as you'll see below) is quite a bit lower than Gigabyte, so its the better choice to look at the APU's power consumption on its own.

    However, this board doesn't allow any undervolting, so my comparisons to an undervolted Athlon II have to be done on the gigabyte.. This boards contribution to power consumption is probably more like that of the AM3 board, so it's
    better off anyway.



    HDD power subtraction:

    For the posted results, HDD power consumption was subtracted. This was measured with the HDD isolated in idle spinning. What we're left with then, is a fairly accurate measure of board+CPU power only.



    Power distribution - CPU + GPU (Asrock MB)

    First up lets see how balanced the CPU and GPU are in their share of the TDP..

    to do this I've loaded up both cores with Prime95 for the CPU, and used OCCT's GPU stress test, which is very harsh on GPU, drawing possibly more than typical, but it uses virtually 0% CPU power.

    Finally I run both these programs at once.




    Uploaded with ImageShack.us


    Talk about well balanced.. With each component of the APU stressed to the max, they draw almost identical amounts of power under load.. The combined power consumption is essentially minus the Memory controller and NB power consumption (since the MC would be stressed in both GPU and CPU load)



    Platform power consumption comparison - Zacate Vs AM3 undervolted.
    Now we know Zacate walks all over any Energy efficent AM3 off the shelf. but what about when underclocked to equal clockspeeds, and undervolted ? a 1.6 Athlon II is after all a lot faster.

    At 1.6Ghz, the Athlon was still stable right down to an impressive 0.875v. Below this it was getting flakey.

    under the same conditions, the E-350 could be taken down to 1.15v

    Being different processes, I think this is more fair than matching Vcore's. Clearly TSMC's HP process requires this sort of voltage, and I think it would also be fair to say AMD/GF's 45nm SOI is actually a lot better. It would be intesting to see Bobcat on a GF process!

    Anyway the results show, the AM3 board with its older 760G IGP sitting there uses quite a bit of juice, even with NO CPU fitted at all, when powered up it draws just over 16W.

    Of course we can't remove the APU (or just the CPU portion even!) of the Zacate boards, but idle power of the APU itself I can tell you is very very low. From yrs of experience in Electronics industry, probably 'a watt or two' judging by the heat output of the tiny APU heatsink on the Asrock board when no fan is fitted.




    With the above in mind, you can see the Athlon II's Idle power is quite high even at this low voltage.

    under load though it's quite impressive. Power shoots by up only 7.4w for the Athlon II compared to 7.2w of the undervolted Gigabye E350 .

    Again though, as posted further up, the Asrock trumps everything even NOT undervolted (so at its default 1.25-1.3v vcore)


    Conclusion:

    Well, It's over yet, I'd like to post more benchmark results, so those interested bring on the requests.


    Clearly though the results show for the sheer size of this core, it does quite well, especially in integer / legacy type code. Power consumption at first glance is not brilliant when compared to a downcloked desktop chip, even an old core like K10, but consider its generic TSMC process, and you'd have to think twice about that one. These are NOT 'ULV' chips, binned for low leakage, and undervolted to hell, but instead run a fairly high Vcore (which is actually a good thing as it means cheaper lower current VRM's can be utilized for a given power consumption) , get pumped out on a generic process at a cheap price, yet still offer perf/watt of heavily undervolted desktop platforms. For their intended purpose of a general purpose chip to bridging the gap between netbook and notebook/Desktop, they certainly succeeded.
    Last edited by mAJORD; 06-26-2011 at 04:36 AM.

  2. #2
    Xtreme Member
    Join Date
    Jul 2009
    Location
    England
    Posts
    406
    Nice! Very informative
    My pot is bigger than your pot

    WHAU!!!!

  3. #3
    Xtreme Member
    Join Date
    Jan 2005
    Location
    Vancouver, Canada
    Posts
    168
    Great writeup. Thanks for this!
    Laptop

  4. #4
    Xtreme Mentor
    Join Date
    Dec 2007
    Location
    State of Confusion, USA
    Posts
    2,513
    Nice work mAJORD...

    I didn't realize Bobcat was done on bulk silicon (or is it?).

    Thanks for the testing!
    AMD FX-8350 (1237 PGN) | Asus Crosshair V Formula (bios 1703) | G.Skill 2133 CL9 @ 2230 9-11-10 | Sapphire HD 6870 | Samsung 830 128Gb SSD / 2 WD 1Tb Black SATA3 storage | Corsair TX750 PSU
    Watercooled ST 120.3 & TC 120.1 / MCP35X XSPC Top / Apogee HD Block | WIN7 64 Bit HP | Corsair 800D Obsidian Case








    First Computer: Commodore Vic 20 (circa 1981).

  5. #5
    Xtreme Addict
    Join Date
    Feb 2005
    Location
    OZtralia
    Posts
    2,051
    Thanks for the thread, interesting
    lots and lots of cores and lots and lots of tuners,HTPC's boards,cases,HDD's,vga's,DDR1&2&3 etc etc all powered by Corsair PSU's

  6. #6
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    As always top notch analysis mAJORD!
    From integer benches it looks like predication in Bobcat is a lot better than on older AMD architectures. Obviously FP is so weak no decoder magic would help here to close the gap to even K8. But the idea is for fGPU to take over most of FP heavy tasks like encoding video, and with help of OpenCL even portions of rendering. But this is one of future paths programmers will need to take for Fusion to shine.

    As for additional benchmarks, I would like to see:
    AIDA64 - selection of integer and FP tests, most interested in PhotoWorxx and ZLib
    AraunaBench - if only to see how slow Bobcat can be

    Thank you very much for this thread
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  7. #7
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    From a first glance: Good work!

    One first note: The Bobcat int schedulers don't feed ALU/AGU pairs, but one feeds 2 ALUs and the other feeds a Load AGU (LAGU) and a store AGU (SAGU): http://pcper.com/images/reviews/1036/bob_over.jpg
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  8. #8
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    Thanks for the correction Dreseden, I'll fix that up!

    Lightman, I was going tobuy an Aida licence anyway since my old everest licence is out, so stay tuned for that one

  9. #9
    Xtreme Mentor
    Join Date
    Feb 2009
    Location
    Bangkok,Thailand (DamHot)
    Posts
    2,693
    wowwwwwww
    Intel Core i5 6600K + ASRock Z170 OC Formula + Galax HOF 4000 (8GBx2) + Antec 1200W OC Version
    EK SupremeHF + BlackIce GTX360 + Swiftech 655 + XSPC ResTop
    Macbook Pro 15" Late 2011 (i7 2760QM + HD 6770M)
    Samsung Galaxy Note 10.1 (2014) , Huawei Nexus 6P
    [history system]80286 80386 80486 Cyrix K5 Pentium133 Pentium II Duron1G Athlon1G E2180 E3300 E5300 E7200 E8200 E8400 E8500 E8600 Q9550 QX6800 X3-720BE i7-920 i3-530 i5-750 Semp140@x2 955BE X4-B55 Q6600 i5-2500K i7-2600K X4-B60 X6-1055T FX-8120 i7-4790K

  10. #10
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Great work as always man!
    What is amazing is general integer performance of this small chip Vs K10 and K8.It does remarkably well in pure integer workloads for 2way design.In fp there is a compromise,but all of this bodes great for Bulldozer which has unified integer scheduler and beefier prefetchers,branch prediction logic and full speed L2 cache that is 4x the size of Bobcat's.

    Power draw of Bobcat is the main thing in the whole story since for the little power it draws it has an amazing performance level.

  11. #11
    Registered User
    Join Date
    Sep 2007
    Posts
    58
    I bet moving to DDR3-1333 would give Bobcat a solid performance boost.

  12. #12
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    Quote Originally Posted by mAJORD View Post
    Thanks for the correction Dreseden, I'll fix that up!

    Lightman, I was going tobuy an Aida licence anyway since my old everest licence is out, so stay tuned for that one
    Fritz Chess and CPU Mark99 are pure integer benchmarks. Please correct the sheet. Sandra MM Int uses FP units for SIMD execution and x264 contain a lot of int. simd code.
    BF2 is memory bound.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •