MMM
Results 1 to 25 of 713

Thread: K10 Scores starting to surface

Hybrid View

  1. #1
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by SEA View Post
    If L2 disabled but L3 is enabled - this will change things from slideshow significantly.
    Oh sure , the magic 38 cycle latency L3 will come to the rescue.

    Sorry , but only an idiot could believe any of the caches or execution units to be disabled when the chip performs +/- 10% of C2D.

    Reality is this : the chip is perfectly fine, but it's "bugs" prevent it from scaling , that is reaching higher clockspeed while maintaining data integrity.In other words ; you're fine at 2GHz , but at 2.3GHz* due to speedpath problems you get silent data corruption or other nasty surprises.

    * Example.

  2. #2
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    Quote Originally Posted by savantu View Post
    Oh sure , the magic 38 cycle latency L3 will come to the rescue.

    Sorry , but only an idiot could believe any of the caches or execution units to be disabled when the chip performs +/- 10% of C2D.

    Reality is this : the chip is perfectly fine, but it's "bugs" prevent it from scaling , that is reaching higher clockspeed while maintaining data integrity.In other words ; you're fine at 2GHz , but at 2.3GHz* due to speedpath problems you get silent data corruption or other nasty surprises.

    * Example.
    Reality is this:
    You, nor I, nor most people here know enough to say for certain how the chip is going to perform. We have conflicting benchmarks from different sources. Just wait until after Sept. 10th and then we'll know for sure.

    And no reason to be calling people idiots just gonna raise tempers and not do any good. Geeez why can't people be respectful enough of each other to refrain from name-calling....
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  3. #3
    Xtreme Member
    Join Date
    Nov 2006
    Posts
    324
    Quote Originally Posted by savantu View Post
    Oh sure , the magic 38 cycle latency L3 will come to the rescue.

    Sorry , but only an idiot could believe any of the caches or execution units to be disabled when the chip performs +/- 10% of C2D.

    Reality is this : the chip is perfectly fine, but it's "bugs" prevent it from scaling , that is reaching higher clockspeed while maintaining data integrity.In other words ; you're fine at 2GHz , but at 2.3GHz* due to speedpath problems you get silent data corruption or other nasty surprises.

    * Example.
    L2 is still twice faster than memory.
    Also it has some smart prefetch.

    Is there any such test showing how much performance would degrade with L1 and L3 enabled for other chips that already on market?

    And finally, why K10 having lots of improvments against K8 shows exactly same performance per core? Where is 15% over K8???
    Windows 8.1
    Asus M4A87TD EVO + Phenom II X6 1055T @ 3900MHz + HD3850
    APUs

  4. #4
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by SEA View Post
    And finally, why K10 having lots of improvments against K8 shows exactly same performance per core?
    Actualy it is faster than K8 per core (if these tests are true).

  5. #5
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by kl0012 View Post
    Actualy it is faster than K8 per core (if these tests are true).
    Yeah the whole 3-8%.

  6. #6
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by informal View Post
    Yeah the whole 3-8%.
    I doubt we can expect more in non-SSE2-vectorized aplications. As was said before it is only up to 15% IPC improvements.

  7. #7
    Xtreme Member
    Join Date
    Nov 2006
    Posts
    324
    Quote Originally Posted by kl0012 View Post
    Actualy it is faster than K8 per core (if these tests are true).
    Actually it is within 4% difference for me.
    Need screeny?
    Last edited by SEA; 08-30-2007 at 11:12 AM. Reason: my mistake
    Windows 8.1
    Asus M4A87TD EVO + Phenom II X6 1055T @ 3900MHz + HD3850
    APUs

  8. #8
    Xtreme Member
    Join Date
    Feb 2004
    Posts
    381
    Quote Originally Posted by kl0012 View Post
    Actualy it is faster than K8 per core (if these tests are true).
    not really. In Cinebench is exactly the same, what doesn't make any sense, keeping in mine K10 improvements.

  9. #9
    Registered User
    Join Date
    Jan 2007
    Posts
    60
    Quote Originally Posted by PetNorth View Post
    not really. In Cinebench is exactly the same, what doesn't make any sense, keeping in mine K10 improvements.
    Somehow the massive core improvements amount to a 3-4% increase against K8.

    Doesn't make any sense at all.

  10. #10
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by jabway View Post
    Somehow the massive core improvements amount to a 3-4% increase against K8.

    Doesn't make any sense at all.
    Yes they do.

    The K8 core is already strong to begin with.The law of diminishing returns is in full force here.

    When you already start high , the improvements you make will bring little advantages.

    Compare that with the P4 core which was far more fragile ( code quality was vital while the K8 eats just about anything ). Core brought massive improvements over the P4 , the score jumped a lot.Even compared to the K8 , core has a huge number of improvements , in the end it is only 20% better on average.

    There are a lot of situations where K8 core+ 1MB L2 will be as fast or faster than K10 core + 512Kb L2 + 2MB L3 , especially where latency counts.In multithreaded apps , this will be more pronounced as the L3 will get trashed by different threads.

  11. #11
    Xtreme Mentor
    Join Date
    Aug 2005
    Location
    Boston, MA, USA
    Posts
    2,883
    Quote Originally Posted by PetNorth View Post
    not really. In Cinebench is exactly the same, what doesn't make any sense, keeping in mine K10 improvements.
    The numbers are about 7% better per-core with the clockspeed normalized.
    Last edited by uOpt; 08-30-2007 at 11:29 AM.

  12. #12
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by savantu View Post
    Oh sure , the magic 38 cycle latency L3 will come to the rescue.

    Sorry , but only an idiot could believe any of the caches or execution units to be disabled when the chip performs +/- 10% of C2D.

    Reality is this : the chip is perfectly fine, but it's "bugs" prevent it from scaling , that is reaching higher clockspeed while maintaining data integrity.In other words ; you're fine at 2GHz , but at 2.3GHz* due to speedpath problems you get silent data corruption or other nasty surprises.

    * Example.
    +1
    I don't understand why ppl thinks that it must be faster then C2D/C2Q in non-bandwidth dependent aplications.

  13. #13
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    Quote Originally Posted by savantu View Post
    Oh sure , the magic 38 cycle latency L3 will come to the rescue.
    The AMD stated working latency is "less than 38 cycles and depends on the clock speed of the southbridge". Higher clock speeds offset the latency as in all processors. L3 cache is just the shared victim cache for the L2 cache, nothing more. It operates to reduce latency very well between RAM<->CPU for the K10 as the larger L2 does in Core 2.

    K8 had a 12 stage pipeline, Barcelona a 12 stage, and Core 2 a 14 stage.

    K8 L2 latency is 12 clock cycles, Core 2 is 14 and Barcelona is 12.

    K8 L2 cache bus width is 128-bit, Core 2 is 256-bit, Barcelona is 128-bit.

    SSE engine width of K8 was 64-bit (2 per clock), Core 2 was 128-bit (3 per clock) and Barcelona is 128-bit (2 per clock).

    L1+L2 cache latency is 15 cycles for the K8, 17 cycles for Core 2, and 15 for Barcelona IIRC.

    Correction: L1+L2 cache access combined latency is median 13 cycles for Core 2 and Barcelona. That's twice as much data in the same time frame accessed by Core 2 due to the double bus width between L2<->Core.

    There's much more improvements with larger stack load and reordering of load/store of the many which have the potential to make the most difference. Many of the improvements are identical to what was done with Yonah -> Core 2. Many more specific, and even some more advanced.

    Based on the technicalities, the improvement seems like this:

    K8 > K10 as with Yonah > Core 2. Like I forementioned, I think its a retail clock speed yield race, nothing more. We'll wait and see how it pans out in reality.
    Last edited by KTE; 08-30-2007 at 01:53 PM.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •