Results 1 to 12 of 12

Thread: Bobcat Core performance analysis - Bobcat vs K10 vs K8

Threaded View

  1. #1
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264

    Bobcat Core performance analysis - Bobcat vs K10 vs K8

    Hi guys

    This should have been posted some time ago, but I was intending to include more data. Now the platforms are set up again (minus K8 at the moment) I can add to this, but I should get the thread rolling regardless. Before I say any more in order to increase the amount of data.. I'm after suggestions on benchmarks to run, as long as its free.


    The idea behind this little write up is to look at the strengths and weaknesses of bobcat next to its only Current (and superseded) cousins in the AMD lineup, K10 and K8.
    As we know bobcat is an interesting design, and remains mostly two issue throughout, with a very "light weight" FPU . A lot of details are missing from the disclosed details, such as buffer sizes etc, but it likely to be quite small next to both K8 and K10 given the size and power targets.

    It offsets some of these power and size saving features with architectural enhancments missing from K10, and more so from K8. So I thought it interesting to see where the bottlenecks lie in all uArchs. K8 was brought in simply beause of its age, This was once a high power, high performance uArch considered to have High IPC prior to Core 2..

    I've done up a very basic high level overview of the core next to its desktop cousins, minus the details we don't know:







    To do this comparo i've chosen the following CPU's , and configurations to compare the architectures as best possible on a clk/clk basis.

    Bobcat:

    * E-350 'Zacate' @1.6ghz , 512KB L2 Cache (Gigabyte E350N mb)
    * DDR3 1066-7-7-7-18

    K8:

    * AMD Athlon X2 5200+ Brisbane @ 1.6ghz, 2x 512KB L2 Cache
    * DDR2 800 5-5-5-15

    K10:

    * AMD Athlon II X4 630 @ 1.6Ghz 2 Cores disabled in BIOS, 2x 512KB L2 Cache
    * DDR3 1066-7-7-7-18
    * CPU-NB 1.6Ghz

    All these platforms use a discrete HD5570, so no sharing of mem bw with the GPU / IGP's.

    Performance: Clk/Clk

    The Benchmarks chosen were mostly at hand, and common place. Lots of synthetics but hopefully the more software orientated people can shed light on the type of code some of these would generally be pushing. I've segregated best I can to my knowledge into Integer, FPU , and SIMD


    Now, unfortunatly, whilst I have an Atom Netbook, it's not running windows 7, has little RAM and is only single core so I cannot include meaningful data to throw it into the mix, but maybe down the line I will


    Anyway, on with performance results first up.

    Now, this is not perfect. I probably should have limited benchmarks to a single core (affinity) to take multi core scaling out of the equation since we're looking purely at core performance (as opposed to looking at the CPU performance as a whole) but when i realised this it was a bit late, i'd already moved to the other platform, so I carried on. The differences would be very minor as none of these CPU's share cache. On followup benches I'll do this though.





    Plenty to discuss. What I'd like is suggestions PLEASE on some more integer heavy benchmarks in particular. The small selection so far aren't enough to make the percentage totals meaningful enough.

    The highlights are:

    Super Pi 1M: Faster than K8, and almost as fast as K10

    Now we know, as I've always suspected, that Cache, Decoder width, and even raw execution resources / FPU size are not bottlenecks for the very old super Pi.

    SSE performance:, Very low.. clearly this is the biggest comprimise here. Rendering, encoding all perform poorly. This architecture being aimed at providing strong general performance over encoding rendering, SSE heavy apps is evident.. Even K8 has a sizable lead in some cases.

    Quite a few interesitng outliers in both directions.


    Power consumption:


    Now for this part, it gets a bit complicated. We have to throw out the K8, as it's built on 65nm, and also the propus, as its two disabled cores draw excess power. Also the Discrete card gets the flick and we look at IGP consumption on both platforms.

    Instead, our nearest competitor is the Athlon II X2 , with 1MB L2 / core. Sitting into the lowest consumption IGP board I had available, the Gigabyte GA-MA78-LMT mainboard with the 760G chipset.

    Powering everything is a 150XT Pico-psu running of 12v. Power consumption of the system (Mainboard and HDD) is measured at the input to the pico PSU using a calibrated Multimeter. Because the 12V rail is directed sraight through from my power supply, it's 100% efficient. only the 5 and 3.3v rails are regulated at ~90% efficiency. So, our end power consumption measurement should be pretty damn accurate.

    E-350 motherboard choice.

    In order to cater for the different goals, I've had to use two different motherboards. Why? , Well because the Gigabyte E350N uses a 3 phase PWM with 4pin 12v supply, plus it contains a USB3 chip. These pump up the platform consumption next to the Asrock E350M which uses what appears to be a very simple Single or 2 phase PWM, and has no USB3 chip. The resulting power consumption (as you'll see below) is quite a bit lower than Gigabyte, so its the better choice to look at the APU's power consumption on its own.

    However, this board doesn't allow any undervolting, so my comparisons to an undervolted Athlon II have to be done on the gigabyte.. This boards contribution to power consumption is probably more like that of the AM3 board, so it's
    better off anyway.



    HDD power subtraction:

    For the posted results, HDD power consumption was subtracted. This was measured with the HDD isolated in idle spinning. What we're left with then, is a fairly accurate measure of board+CPU power only.



    Power distribution - CPU + GPU (Asrock MB)

    First up lets see how balanced the CPU and GPU are in their share of the TDP..

    to do this I've loaded up both cores with Prime95 for the CPU, and used OCCT's GPU stress test, which is very harsh on GPU, drawing possibly more than typical, but it uses virtually 0% CPU power.

    Finally I run both these programs at once.




    Uploaded with ImageShack.us


    Talk about well balanced.. With each component of the APU stressed to the max, they draw almost identical amounts of power under load.. The combined power consumption is essentially minus the Memory controller and NB power consumption (since the MC would be stressed in both GPU and CPU load)



    Platform power consumption comparison - Zacate Vs AM3 undervolted.
    Now we know Zacate walks all over any Energy efficent AM3 off the shelf. but what about when underclocked to equal clockspeeds, and undervolted ? a 1.6 Athlon II is after all a lot faster.

    At 1.6Ghz, the Athlon was still stable right down to an impressive 0.875v. Below this it was getting flakey.

    under the same conditions, the E-350 could be taken down to 1.15v

    Being different processes, I think this is more fair than matching Vcore's. Clearly TSMC's HP process requires this sort of voltage, and I think it would also be fair to say AMD/GF's 45nm SOI is actually a lot better. It would be intesting to see Bobcat on a GF process!

    Anyway the results show, the AM3 board with its older 760G IGP sitting there uses quite a bit of juice, even with NO CPU fitted at all, when powered up it draws just over 16W.

    Of course we can't remove the APU (or just the CPU portion even!) of the Zacate boards, but idle power of the APU itself I can tell you is very very low. From yrs of experience in Electronics industry, probably 'a watt or two' judging by the heat output of the tiny APU heatsink on the Asrock board when no fan is fitted.




    With the above in mind, you can see the Athlon II's Idle power is quite high even at this low voltage.

    under load though it's quite impressive. Power shoots by up only 7.4w for the Athlon II compared to 7.2w of the undervolted Gigabye E350 .

    Again though, as posted further up, the Asrock trumps everything even NOT undervolted (so at its default 1.25-1.3v vcore)


    Conclusion:

    Well, It's over yet, I'd like to post more benchmark results, so those interested bring on the requests.


    Clearly though the results show for the sheer size of this core, it does quite well, especially in integer / legacy type code. Power consumption at first glance is not brilliant when compared to a downcloked desktop chip, even an old core like K10, but consider its generic TSMC process, and you'd have to think twice about that one. These are NOT 'ULV' chips, binned for low leakage, and undervolted to hell, but instead run a fairly high Vcore (which is actually a good thing as it means cheaper lower current VRM's can be utilized for a given power consumption) , get pumped out on a generic process at a cheap price, yet still offer perf/watt of heavily undervolted desktop platforms. For their intended purpose of a general purpose chip to bridging the gap between netbook and notebook/Desktop, they certainly succeeded.
    Last edited by mAJORD; 06-26-2011 at 04:36 AM.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •