MMM
Page 1 of 3 123 LastLast
Results 1 to 25 of 56

Thread: Macro/Micro architectural imporvements of K8L(K10) over K8

  1. #1
    Banned
    Join Date
    May 2006
    Location
    Skopje, Macedonia
    Posts
    1,716

    Macro/Micro architectural imporvements of K8L(K10) over K8

    Quad-core
    - Native quad-core design
    - Redesigned and improved crossbar(northbridge)
    - Improved power management
    - New level of cache added, L3 VICTIM
    Power management - DICE(Dynamic Independent Core Engagement)
    - Supports separate CPU core and memory controller power planes to allow CPU to lower its power state while the memory controller is running full bore
    - Enhanced AMD's PowerNow allows individual core frequencies to lower while other cores may be running full bore
    Virtualization improvements
    - Nested Paging(NP):
    * Guest and Host page tables both exist in memory.(The CPU walks both page tables)
    * Nested walk can have up to 24 memory acesses! (Hardware caching accelerates the walk)
    * "Wire-to-wire" translations are cached in TLBs
    * NP eliminates Hypervisor cycles spent managing shadow pages(As much as 75% Hypervisor time)
    - Reduced world-switch time by 25%:
    * World-switch time: round-trup to Hypervisor and back
    Dedicated L1 cache
    - 256bit 64kB (32kB instruction/32kB data)
    - 2 x 128bit loads/cycle
    - lowest latency
    Dedicated L2 cache
    - 128bit 512kB
    - 128bit bus to northbridge
    - reduced latency
    - eliminates conflicts common in shared caches - better for virtualization
    Shared L3 cache
    - 128bit 2MB
    - Victim-cache architecture maximizes efficiency of cache hierarchy
    - Fills from L3 leave likely shared lines in the L3
    - Sharing-aware replacement policy
    - Expandable
    Independent DRAM controllers
    - Concurrency
    - More DRAM banks reduces page conflicts
    - Longer burst length improves command efficiency
    - Dual channel unbuffered 1066 support(applies to socket AM2+ and s1207+ QFX only)
    - Channel Interleaving
    Optimized DRAM paging
    - Increase page hits
    - Decrease page conflicts
    Re-architect northbridge for higher bandwidth
    - Increase buffer sizes
    - Optimize schedulers
    - Ready to support future DRAM technologies
    Write bursting
    - Minimize Rd/Wr Turnaround
    DRAM prefetcher
    - Track positive and negative, unit and non-unit strides
    - Dedicated buffer for prefetched data
    - Aggressively fill idle DRAM cycles
    Core prefetchers
    - DC Prefetcher fills directly to L1 Cache
    - IC Prefetcher more flexible
    * 2 outstanding requests to any address
    HyperTransport 3
    - up to four 16bit cHT links
    - up to 5200MT/s per link
    - un-ganging mode: each 16bit HT link can be divided in two 8bit virutal links

    CPU Core IPC Enhancements:
    Advanced branch prediction
    - Dedicated 512-entry Indirect Predictor
    - Double return stacksize
    - More branch history bits and improved branch hashing
    History-based pattern predictor
    32B instruction fetch
    - Benefits integer code too
    - Reduced split-fetch instruction cases
    Sideband Stack Optimizer
    - Perform stack adjustments for PUSH/POP operations “on the side”
    - Stack adjustments don’t occupy functional unit bandwidth
    - Breaks serial dependence chains for consecutive PUSH/POPs
    Out-of-order load execution
    - New technology allows load instructions to bypass:
    * Other loads
    * Other stores which are known not to alias with the load
    - Significantly mitigates L2 cache latency
    TLB Optimisations
    - Support for 1G pages
    - 48bit physical address (256TB)
    - Larger TLBs key for:
    * Virtualized workloads
    * Large-footprint databases and
    * transaction processing
    - DTLB:
    * Fully-associative 48-way TLB (4K, 2M, 1G)
    * Backed by L2 TLBs: 512 x 4K, 128 x 2M
    - ITLB:
    * 16 x 2M entries
    Data-dependent divide latency
    Additional fastpath instructions
    – CALL and RET-Imm instructions
    – Data movement between FP & INT
    Bit Manipulation extensions
    - LZCNT/POPCNT
    SSE extensions
    - EXTRQ/INSERTQ (SSE4A)
    - MOVNTSD/MOVNTSS (SSE4A)
    - MWAIT/MONITOR (SSE3)
    Comprehensive Upgrades for SSE
    - Dual 128-bit SSE dataflow
    - Up to 4 dual precision FP OPS/cycle
    - Dual 128-bit loads per cycle
    - New vector code, SSE128
    - Can perform SSE MOVs in the FP “store” pipe
    - Execute two generic SSE ops + SSE MOV each cycle (+ two 128-bit SSE loads)
    - FP Scheduler can hold 36 Dedicated x 128-bit ops
    - SSE Unaligned Load-Execute mode:
    * Remove alignment requirements for SSE ld-op instructions
    * Eliminate awkward pairs of separate load and compute instructions
    * To improve instruction packing and decoding efficiency

    K8L(K10) large die Shot

    Source: AMD slides, and other sources from Internet

    P.S. Any additional data or informations will be highly appreciated
    Last edited by gOJDO; 02-09-2007 at 09:12 AM.

  2. #2
    Xtreme Member
    Join Date
    Mar 2005
    Posts
    225
    A source would be nice.
    Heatware
    DFI NF4 SLI-DR Expert | Opty 170 0530 @ 2.8Ghz | Scythe Mine | 2x1Gb Mushkin Redline CE-5 @ 280mhz (3-3-2-5)
    2x EVGA Nvidia 8800 GTS 640 SLI @ 660/1960 | Viewsonic VX2025 | 74Gb Raptor & WD 320Gb SE16 | Silverstone Zeus 750 watt

    ASUS DRW-1814BLT W/ LightScribe | Sound Blaster X-Fi | Medusa 5.1 headset | Lian Li PC-V1000BW | Saitek X52


    DFI NF4 Ultra-D| AMD 3800 X2 | XP-90 | 2x1Gb Muskin Blue BE-5 @ 255mhz (2.5-3-2-0)
    EVGA 7900 GS | Viewsonic VX724 | WD 250Gb SE16 | OCZ 520watt

    Samsung 52X/16X CDRW/DVD | Samsung 52X CDRW | SB Audigy Gamer

  3. #3
    XS News
    Join Date
    Aug 2004
    Location
    Sweden
    Posts
    2,010
    A link would be nice

  4. #4
    XS_THE_MACHINE
    Join Date
    Dec 2004
    Location
    Sweden
    Posts
    1,970
    I thought we had just come to realize that K8L doesn't exist, or rather has been with us for quite some time (K8 Low-Power, that is Turion 64). Instead the AMD dual-cores (X2) were K9 and what we used to refer to as K8L is in reality the K10. (According to Inq it was their own Charlie who made up the K8L name.)

    (Thread here.)

    Yes, and source, please?

    ~ Kris
    Quote Originally Posted by Shintai View Post
    I have a feeling that in 5 years. WD, Seagate etc will be some unknown names.
    (Posted by Shintai, 08-18-2008)

  5. #5
    Xtreme Addict
    Join Date
    Jul 2004
    Location
    U.S of freakin' A
    Posts
    1,931
    Quote Originally Posted by LOE
    gOJDO - you got no info on L1 and L2 cache interface - I heard it is going to be twice as wide as K8
    As far as I know, the K10's L1 and L2 cache interface will be 256bits, but dual split like the K8.

    In other words, 2x128. Right now, the K8 has a 128 bit cache bus, which is dual split (2x64).

    The C2D just has one big 256bit wide interface.

    As for which implementation is better, I don't know.

    *Edit* Gojdo just added some more stuff. According to the updated info, the L2 cache is only 128-bits wide! But the L1 cache is 256-bits? WTH?
    Last edited by Carfax; 02-09-2007 at 04:07 AM.

  6. #6
    Xtreme Enthusiast
    Join Date
    Feb 2005
    Posts
    970
    Sounds impressive if true. I have no idea what it all means though. heh
    On the other hand, gOJDO might be just trying to make a point that without a source, it doesn't mean it's true.

  7. #7
    Xtreme Enthusiast
    Join Date
    Apr 2006
    Location
    Brasil
    Posts
    534
    L1 cache size remains the same, 64KB for Instructions and 64KB for Data.

    Yes, L1 bus is 2x128bit and L2 is 1x128bit.

  8. #8
    Xtreme Addict
    Join Date
    Jul 2004
    Location
    U.S of freakin' A
    Posts
    1,931
    Quote Originally Posted by doompc
    L1 cache size remains the same, 64KB for Instructions and 64KB for Data.

    Yes, L1 bus is 2x128bit and L2 is 1x128bit.
    You'd think they would have expanded the bus to full 256 bit so as not to starve the execution engines..

    Then again, I'm no engineer and there is probably a good reason for them choosing to stay with a 128-bit data bus on the L2 cache.

  9. #9
    Banned
    Join Date
    May 2006
    Location
    Skopje, Macedonia
    Posts
    1,716
    Quote Originally Posted by Carfax
    *Edit* Gojdo just added some more stuff. According to the updated info, the L2 cache is only 128-bits wide! But the L1 cache is 256-bits? WTH?
    No info about 256bit L2. Most probably is 128bit. The bus to the crossbar is 128bit, thus the access to the L3 is 128bit also. Because the L3 is going to be used as exclusive, most likely the L2 is 128bit.
    Can anyone confirm this?
    Last edited by gOJDO; 02-09-2007 at 05:37 AM.

  10. #10
    XS_THE_MACHINE
    Join Date
    Dec 2004
    Location
    Sweden
    Posts
    1,970
    Quote Originally Posted by gOJDO
    Source: Internet
    LOL!
    Quote Originally Posted by Shintai View Post
    I have a feeling that in 5 years. WD, Seagate etc will be some unknown names.
    (Posted by Shintai, 08-18-2008)

  11. #11
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Austin, TX
    Posts
    1,346
    Reduced latency L1???? Does that mean it's 2 cycles now? I don't really believe this.

    Also, for the D-TLB, do you mean 48-entry?


    - DTLB:
    * Fully-associative 48-way TLB (4K, 2M, 1G)
    Last edited by Shadowmage; 02-09-2007 at 06:54 AM.

  12. #12
    Xtreme Guru
    Join Date
    Jan 2005
    Location
    Tre, Suomi Finland
    Posts
    3,858
    AMD Analyst Day slides would seem to point to 128bit data paths for L2.
    Quote Originally Posted by Shadowmage
    Also gODO just took it from Ace's Hardware:

    http://www.aceshardware.com/forums/r...5159&forumid=1
    Quote Originally Posted by Carfax on Ace's
    This is something I saw on Xtremesys. The guy who posted this, did not post a source, but if he does, I'll edit this post and include it.
    Last edited by largon; 02-09-2007 at 06:54 AM.
    You were not supposed to see this.

  13. #13
    Xtreme Enthusiast
    Join Date
    Apr 2006
    Location
    Brasil
    Posts
    534
    The path to the northbridge has been widened to 128 bit.
    Athlons have 2 individual L1 buses, both has been widened to 128 bit. L1 must have 256 bit bus in order to allow 32Byte Instruction Fetch.

    L2 seems to have not changed, since it is exclusive victim cache and the CPU loads data from memory to L1 cache, it's ok here.

    No word about L3 cache, but it must be managed by the crossbar, with agressive data prefetch during idle cycles and a large WriteBack bufer.
    AMD said that the L3 cache will not be exclusive nor inclusive, beeing a shared victim cache it will be a bit of both.
    Last edited by doompc; 02-09-2007 at 07:08 AM.

  14. #14
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Austin, TX
    Posts
    1,346
    Quote Originally Posted by largon
    That's why I edited that out when I saw that :p

  15. #15
    Xtreme Member
    Join Date
    Jul 2006
    Posts
    146
    Quote Originally Posted by gOJDO
    Source: Internet
    As far as I can tell, most of this is from Dresdenboy's list at SI.

    Ironically, DDB posted a link to it at Ace's yesterday, and it was mostly ignored. Then gOJDO posted it here. Then Carfax saw it here and posted it on Ace's again, which started a big discussion.

    It's all in the formatting.
    Last edited by oldblue; 02-09-2007 at 07:30 AM.

  16. #16
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Oldblue is right,the majority of this info was posted by DDB on SI board,then he linked it to his post at Ace's ,which gojdo copy/pasted here without crediting DDB.
    Oh well..

  17. #17
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    you forgot the migration bus for the IMC
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  18. #18
    Xtreme Addict
    Join Date
    Jul 2004
    Location
    U.S of freakin' A
    Posts
    1,931
    Quote Originally Posted by doompc
    The path to the northbridge has been widened to 128 bit.
    Athlons have 2 individual L1 buses, both has been widened to 128 bit. L1 must have 256 bit bus in order to allow 32Byte Instruction Fetch.

    L2 seems to have not changed, since it is exclusive victim cache and the CPU loads data from memory to L1 cache, it's ok here.

    No word about L3 cache, but it must be managed by the crossbar, with agressive data prefetch during idle cycles and a large WriteBack bufer.
    AMD said that the L3 cache will not be exclusive nor inclusive, beeing a shared victim cache it will be a bit of both.
    Great explanation DoomPC.. You sound like a man who knows what he's talking about

  19. #19
    Banned
    Join Date
    May 2006
    Location
    Skopje, Macedonia
    Posts
    1,716
    @oldblue
    Most of Dresdenboy's list at SI is from AMD slides:







    In the list there is data from other sources also, that I have collected from internet and saved on HD. If it is very important for someone, I can find out the links of the sources and post here. My goal is to make a list of K8L(K10) vs K8 differences and improvements. Thats how people will be able to find all informations about K8L(K10) Macro/micro architecture on one place, instead googling and opening tens of useless documents. So, if anyone can add something to the list, will be appreciated.

    Thank you.
    Last edited by gOJDO; 02-09-2007 at 08:16 AM.

  20. #20
    Xtreme Member
    Join Date
    Jul 2006
    Posts
    146
    Quote Originally Posted by gOJDO
    @oldblue
    Most of Dresdenboy's list at SI is from AMD slides:
    I'm not questioning the veracity of most of this information, and I think this list is useful. I hope that as people update it, they list their sources. Hyperlinking each entry would help with this. Otherwise it will become difficult to separate internet rumor from AMD-supplied information.

    But most of your list was clearly copied and pasted from Dresdenboy's list, and I think he should get some credit for that.

  21. #21
    Banned
    Join Date
    May 2006
    Location
    Skopje, Macedonia
    Posts
    1,716
    Quote Originally Posted by oldblue
    I'm not questioning the veracity of most of this information, and I think this list is useful. I hope that as people update it, they list their sources. Hyperlinking each entry would help with this. Otherwise it will become difficult to separate internet rumor from AMD-supplied information.

    But most of your list was clearly copied and pasted from Dresdenboy's list, and I think he should get some credit for that.
    Yes, I copy/pasted from there, rearranged the list and added other info.
    Anyway, 100% of what I copy pasted from Dresdenboy's list is copy pasted from the AMD slides that I already have on my HD. Also, there is some info from metro.cl from chilehardware, hypertransport.org, other AMD slides. So, credits goes to all of them.
    Last edited by gOJDO; 02-09-2007 at 08:59 AM.

  22. #22
    I am Xtreme
    Join Date
    Dec 2002
    Posts
    5,931
    wow sounds like a monster.

    good post.

  23. #23
    Xtreme Mentor
    Join Date
    Feb 2004
    Location
    The Netherlands
    Posts
    2,984
    yeah gOJDO thanks for the effort :thumbsup:

    Ryzen 9 3900X w/ NH-U14s on MSI X570 Unify
    32 GB Patriot Viper Steel 3733 CL14 (1.51v)
    RX 5700 XT w/ 2x 120mm fan mod (2 GHz)
    Tons of NVMe & SATA SSDs
    LG 27GL850 + Asus MG279Q
    Meshify C white

  24. #24
    Xtreme Enthusiast
    Join Date
    Apr 2006
    Location
    Brasil
    Posts
    534
    Thank you, Carfax. I'm still learning all this stuff. But it does look good.
    Stephen said K10 is about 10% faster than Core2 in general use, that's what I was specting.

    Internaly both look quite similar. 4x DP FLOP / 2x 128 bit SSE per cycle, etc. They are very diferent design but do the same work. Then the IMC will it's job. Point for AMD.
    For quad-core parts the monolitic design plus the L3 cache may speed things up significantly. Another point for AMD.
    And 2P, 4P or even 8P (fully connected by 8 bit HTT3.0 links) systems will perform so strong, that Sun might go back in it's Intel partnership.

    But Intel has better manufaturing process, it may rule on desktop systems due to higher overclocking rates, and in 2P servers due to lower power consumption.

    gOJDO, can you tell me where to get that presentation ?
    Last edited by doompc; 02-09-2007 at 10:53 AM.

  25. #25
    Xtreme Addict
    Join Date
    Mar 2006
    Location
    Sillicon Valley, California
    Posts
    1,261
    Quote Originally Posted by informal
    Oldblue is right,the majority of this info was posted by DDB on SI board,then he linked it to his post at Ace's ,which gojdo copy/pasted here without crediting DDB.
    Oh well..
    He should not need to, as DDB is merely posting second-hand information.

    EDIT, if DDB wrote an article about it without posting slides, then yes, Gojdo should credit him. No one but AMD should receive credit for posting slides online.
    Athlon 64 3200+ | ASUS M2A-VM 0202 | Corsair XMS2 TWIN2X2048-6400 | 3ware 9650SE 4LPML | Seasonic SS-380HB | Antec Solo
    Core 2 Quad Q6600 @ 3.0GHz | ASUS P5WDG2-WS Pro 1001 | Gigabyte 4850HD Silent | G.Skill F2-6400PHU2-2GBHZ | Samsung MCCOE64G5MPP-0VA SLC SSD | Seasonic M12 650 | Antec P180
    Core i7-2600K @ 4.3 GHz @ 1.30V | ASUS P8P67 Pro | Sparkle GTX 560 Ti | G.Skill Ripjaw X F3-12800CL8 4x4GB @ 933MHz 9-10-9-24 2T | Crucial C300 128GB | Seasonic X750 Gold | Antec P183


    Quote Originally Posted by Shintai View Post
    DRAM production lines are simple and extremely cheap in a ultra low profit market.

Page 1 of 3 123 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •