Page 4 of 9 FirstFirst 1234567 ... LastLast
Results 76 to 100 of 212

Thread: Intel Details Nehalem uArch Improvements - 256KB L2, 8MB L3 Confirmed

  1. #76
    Xtreme Cruncher
    Join Date
    Feb 2003
    Location
    Estonia
    Posts
    1,097
    Quote Originally Posted by Movieman View Post
    Somehow I don't think it will take that long..
    ..theres already the sweet smell of XS WCG worlddomination in the air our saga of david and goliath continues

    i think i need a better job to buy more of these
    Member of XS WCG since 2006-11-25




  2. #77
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    I want one! Goodbye AMD, forever.

  3. #78
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by anubis View Post
    ..theres already the sweet smell of XS WCG worlddomination in the air our saga of david and goliath continues

    i think i need a better job to buy more of these
    Somehow I think we'll see what this beast does on WCG sooner than you think..
    Quote Originally Posted by Calmatory View Post
    I want one! Goodbye AMD, forever.
    Lets hope not, that would be the worst thing that could happen to all of us.
    monopoly of any kind leads to stagnation and high prices for all.
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  4. #79
    Xtreme Addict
    Join Date
    Apr 2005
    Location
    Wales, UK
    Posts
    1,195
    Quote Originally Posted by fellix_bg View Post
    Of course, this comes at the cost of totally available L3 size, as follows: 8 - (4*256K) = 7MB. That's the reason for the rather shy L2 per core, not because of the pure low latency design intentions.

    Read it again: inclusive relationship with the L2 arrays!
    The L3 still holds 8MB of data for quick access, which gets pulled into l2, then l1 when needed. The l3 is only there to provide a quick access point for the l2 to grab data, and the l2 has access to a pool of 8MB worth of data - when the l2 cache uses uses data from l3 thats a successful hit.

    Regardless of the fact that data may be stored simultanously in the l2 and l3, each level 2 has access to 8mb of l3 (and if it is non dependent you could have all cores using the same data)

  5. #80
    Xtreme Member
    Join Date
    Jun 2005
    Location
    Bulgaria, Varna
    Posts
    447
    Quote Originally Posted by onewingedangel View Post
    ...
    My point was about the data update in the caches and coherent state snooping, but yes -- 8MB is 8MB, however you look at it. I just stressed the inclusive nature in that case and what it contributes to the threading in overall.
    A simple pointer-chasing graph would tell us enough about the whole picture, anyway!
    Last edited by fellix_bg; 03-18-2008 at 04:59 AM.

  6. #81
    Xtreme Addict
    Join Date
    Apr 2005
    Location
    Wales, UK
    Posts
    1,195
    But increasing l2 size wouldn't negatively affect total l3 - as even if you had more data duplicated, you still have the same total capacity l3. l2 wsn't restricted to 256KB just to reduce duplication (as in such a large design another 1MB cache isn't that much), but rather to keep the l2 as fast as possible.

    Nehelams l3 will be about as fast as current l2 caches, and the l2 will be faster that current level 2 caches.

    Intel could have gone for 512KB of l2 per core, but this would have meant a slower l2, which would more than negated the increased capacity. If intel can maintain the speed and increase capacity of l2 I'm willing to bet they will on subsequent generations.
    Last edited by onewingedangel; 03-18-2008 at 05:10 AM.

  7. #82
    Coat It with GOOOO
    Join Date
    Aug 2006
    Location
    Portland, OR
    Posts
    1,608
    desktop boards will probably have 4 dimm slots to maintain basic ATX design. 3 of them interleaved and 1 just extra. Since you don't have to balance trace length of the northbridge with all of the 3 other components on the mobo (CPU/mem/SB) you're left with a bit more freedom on layout, so the more creative with PCB real estate will probably be able to cram 6 slots on a board so that they can fill 2 dimms per channel.
    Main-- i7-980x @ 4.5GHZ | Asus P6X58D-E | HD5850 @ 950core 1250mem | 2x160GB intel x25-m G2's |
    Wife-- i7-860 @ 3.5GHz | Gigabyte P55M-UD4 | HD5770 | 80GB Intel x25-m |
    HTPC1-- Q9450 | Asus P5E-VM | HD3450 | 1TB storage
    HTPC2-- QX9750 | Asus P5E-VM | 1TB storage |
    Car-- T7400 | Kontron mini-ITX board | 80GB Intel x25-m | Azunetech X-meridian for sound |


  8. #83
    Xtreme Member
    Join Date
    Jun 2005
    Location
    Bulgaria, Varna
    Posts
    447
    Quote Originally Posted by onewingedangel View Post
    ...as even if you had more data duplicated, you still have the same total capacity l3
    Then why Intel bothers to propagate inclusive design here?
    From your statements, there is no consequentive difference between inclusive and exclusive relationship, or it's me couldn't get your point?
    By the way, Intel have--since long time--fast enough SRAM cells in much bigger arrays than this skinny 256K one.

  9. #84
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by fellix_bg View Post
    Then why Intel bothers to propagate inclusive design here?
    From your statements, there is no consequentive difference between inclusive and exclusive relationship, or it's me couldn't get your point?
    By the way, Intel have--since long time--fast enough SRAM cells in much bigger arrays than this skinny 256K one.
    Look on Itaniums L2. It mimmicks that.
    Crunching for Comrades and the Common good of the People.

  10. #85
    Xtreme Member
    Join Date
    Jun 2005
    Location
    Bulgaria, Varna
    Posts
    447
    Itanic is a long instruction architecture, so the caching organization is a subordinate to the rather weird specifics of its EPIC design.
    Anyway, I think the L2 in Nehalem is--by design--to counter the shared L3, not as a decisive performant part for the architecture...
    And by all means the L3 here should be way (relatively) faster, being closely related, than the AMD's K10 implementation and Dunnington one, too.
    Last edited by fellix_bg; 03-18-2008 at 06:06 AM.

  11. #86
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by AliG View Post
    yes but that will mean that amd will be forced to work harder which eventually get rid of hector which can only be a good thing for them, they should put someone like Dirk in that position instead, I like him a lot more

    Anyways, yes it will hammer amd's cpus but hey we're the consumer, not the fanboys (well some of us might be), I go for where the performance is at, not company of choice. when amd was in the lead I bought two athlon products, but now that intel is doing the spanking, I'm going to get an intel product, its as simple as that.

    Best advice I can give you is to not buy products from the company you only like if the competition offers something better, look back at the k8 days, it took amd almost the full 4 year lead they had to convince dell to buy their cpus.
    QFT!
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  12. #87
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by onewingedangel View Post
    But increasing l2 size wouldn't negatively affect total l3 - as even if you had more data duplicated, you still have the same total capacity l3. l2 wsn't restricted to 256KB just to reduce duplication (as in such a large design another 1MB cache isn't that much), but rather to keep the l2 as fast as possible.

    Nehelams l3 will be about as fast as current l2 caches, and the l2 will be faster that current level 2 caches.

    Intel could have gone for 512KB of l2 per core, but this would have meant a slower l2, which would more than negated the increased capacity. If intel can maintain the speed and increase capacity of l2 I'm willing to bet they will on subsequent generations.
    Not only latency, as was shown in their slides, Intel says its L2 is Smarter as well. If it is smarter, less size is needed, right?

    Also it goes back to something Intel learned for the small Very Fast L1 and L2 used with the P4. They didn't want to make the same "Prescott" mistake that added 17% more L2 latency. No matter what's said in this forum, everything about Netburst didn't suck.
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  13. #88
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    498
    Speaking of L3 Shintai,

    Quote Originally Posted by Shintai View Post
    100euro on its fake...

    There would be no FSB, No L3 etc. And I dont think CPU-z even got the Nehalem word...but could be wrong.

    L1 would also be 2x32KB or more.

    And i´m sure a 2 socket Nelahem system would have more than 2GB memory...

    Also why are these screenshots always in so poor quality...to hide the photoshop marks?

    ------------------------------------------------------
    Not yet, as he even says himself it reads some parts wrong.

    I still dont believe in a L3. It simply makes no sense when looking on the size and past history. Itanium only got a L3 due to the massive sizes of up to 24MB and soon 30MB. And I dont think anyone here on the board got access to a nehalem system, nor will have it for the next 3-6 months.

    L3 is a step backwards for mainstream, not upwards.
    Jus't teasing you man! .

    ------------------------------------------------------

    Here's some interesting comments from knowledgeable people: (Mainly on Faster Synchronization Primitives which looks like a nice feature)

    >If using the lock prefix is a legacy operation what are
    >the modern ones?

    Linus Torvalds:
    I don't think there are any - I think they just meant that
    they made the old legacy instructions run faster, instead
    of trying to introduce anything new.

    Which I really look forward to testing. The serialization
    overhead of Core 2 is better than many other processors,
    but everything else is so good that it still stands out
    like a sore thumb. We have lots of kernel loads where one
    of the biggest costs is just locking (even without any
    nasty contention and cacheline ping-ping), because of how
    it serializes the pipeline.

    Now that people are trying to push more and more multi-
    threaded programming paradigms, the locking is finally
    getting some real exposure. It's always been a big issue
    in kernels, but now all the fast user-level locking is
    making it show up in "normal" loads too.

    --------------------------------------
    That's something I'm also looking forward too. Even without contention acquiring locks is *painful*. Unless the data/code you're protecting takes a significant amount of time to process/execute you'll be bitten by the sheer cost of the lock/unlock couples so there is room for *lots* of improvement there.

    ----------------------------------------
    +1

    It's not uncommon for Java workloads to waste 10% or more of the time processing uncontended locks, and I've seen up to ~30% in real-world apps(1).

    The underlying reason is that many critical parts of the core Java library are synchronized (StringBuffer, HashTable, many I/O functions). While there are new APIs that avoid this (StringBuilder, HashMap, etc) there is lots of legacy code that uses the old APIs directly or indirectly.

    JVMs usesoptimization tricks to avoid this (lock removal, lock elison, lazy unlocking etc) but that only serves to alleviate the problem, and doesn't entirely resolve it.

    -- Henrik


    (1) Measured as the increase in throughput when locks are forcefully disabled in JRockit (using -XXlazyunlocking or just hacking the JVM to not issue CAS instructions). The 30% number comes from a JSP-heavy app I ran into some time back. SPECjbb2005 gains ~10% by the use of -XXlazyunlocking.


    Faster Synchronization Primitives: As multi-threaded software becomes more prevalent, the
    need to synchronize threads is also becoming more common. Next generation Intel
    microarchitecture (Nehalem) speeds up the common legacy synchronization primitives (such
    as instructions with a LOCK prefix or the XCHG instruction) so that existing threaded
    software will see a performance boost.


    That's actually the part that I like the most. Better overall IPC is a very nice thing but lowering the cost of the synchronization primitives is much more interesting. It enables parallelization of 'harder' workloads which are not really suitable to parallelization and reap lower benefits because of the synchronization overhead.
    http://realworldtech.com/forums/inde...88380&roomid=2
    http://aceshardware.freeforums.org/n...ting-t423.html

    Interesting.. Let's hope all these <on paper> enhancements and buzz will turn real.. If the claim of Nehalem > Core 2 more than Core 2 > P4 will hold true than it's going to be really insane.
    Faceman


  14. #89
    Xtreme Addict
    Join Date
    Nov 2003
    Location
    NYC
    Posts
    1,592
    yum, need more ram to make my ramdisk dreams possible

  15. #90
    D.F.I Pimp Daddy
    Join Date
    Jan 2007
    Location
    Still Lost At The Dead Show Parking Lot
    Posts
    5,182
    Looks like Intel is ripping off AMD to me
    SuperMicro X8SAX
    Xeon 5620
    12GB - Crucial ECC DDR3 1333
    Intel 520 180GB Cherryville
    Areca 1231ML ~ 2~ 250GB Seagate ES.2 ~ Raid 0 ~ 4~ Hitachi 5K3000 2TB ~ Raid 6 ~

  16. #91
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Brother Esau View Post
    Looks like Intel is ripping off AMD to me
    Yeah, cuz I mean, like, AMD was the first ever to use a 3-level cache design, or think of an integrated memory controller!

    Oh, wait...

  17. #92
    Xtreme Guru
    Join Date
    May 2007
    Location
    Ace Deuce, Michigan
    Posts
    3,955
    Quote Originally Posted by terrace215 View Post
    Yeah, cuz I mean, like, AMD was the first ever to use a 3-level cache design, or think of an integrated memory controller!

    Oh, wait...
    yes but they dumped the idea, you do realize that the original nehalem was netburst on steriods right? That's why some of the features, like hyperthreading are coming back with nehalem, as the same division made both.


    If it wasn't for k8 being so successful, there would have been no conroe, instead just a beefier netburst and then on top of that intel has admitted they like the k10 design, but that its near impossible to produce it properly on a 65nm process. Now I'm not saying amd hasn't done the same, look at their original products, they were just intel parts with their name on it, but that doesn't mean intel didn't use some of the k10 design in nehalem
    Quote Originally Posted by Hans de Vries View Post

    JF-AMD posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans

  18. #93
    Xtreme Addict
    Join Date
    May 2004
    Posts
    1,755
    Quote Originally Posted by Face View Post
    Speaking of L3 Shintai,



    Jus't teasing you man! .
    I like this one better

    Quote Originally Posted by Shintai View Post
    Not yet, as he even says himself it reads some parts wrong.

    I still dont believe in a L3. It simply makes no sense when looking on the size and past history. Itanium only got a L3 due to the massive sizes of up to 24MB and soon 30MB. And I dont think anyone here on the board got access to a nehalem system, nor will have it for the next 3-6 months.

    L3 is a step backwards for mainstream, not upwards.
    I guess you are disappointed Intel is making a step backward

  19. #94
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by Brother Esau View Post
    Looks like Intel is ripping off AMD to me
    Pick self up from floor from laughing so hard. Or did you mean that as a Joke!

    IntelŪ PentiumŪ 4 processor Extreme Edition 3.20 GHz supporting Hyper-Threading Technology, with an additional 2 Megabytes of L3 cache.

    So if Intel uses it, stops using it, AMD copies Intel and then Intel returns to their original idea, it is Intel copying AMD, LOL!
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  20. #95
    Wanna look under my kilt?
    Join Date
    Jun 2005
    Location
    Glasgow-ish U.K.
    Posts
    4,396
    Quote Originally Posted by xlink
    256kb * 8 =2mb

    you're basically thinking about it wrong though. Take todays core based CPUs, add in more cache to the l2 cache at a very slight penalty to latency(should be about as fast as 65nm l2 cache), very small.
    then add in L1.5 cache which is somewhere between the speed of the l2 cache and the l1 cache.
    _________

    Sorry for being so slow to respond.

    The cache doesnt look to be universal- each 256KB is dedicated to one core. The slide even says "per core." Whats the orange in between the L2 and the L1-Data? Is that what you've called the L1.5?

    Also- im assuming it starts off as a quad, so the *8 is only accurate for servers.

    I cant see Nehalem having more cache to play with than Penryn for single-threaded apps, depending on how the L3 is used.


    Last edited by K404; 03-18-2008 at 09:35 AM.
    Quote Originally Posted by T_M View Post
    Not sure i totally follow anything you said, but regardless of that you helped me come up with a very good idea....
    Quote Originally Posted by soundood View Post
    you sigged that?

    why?
    ______

    Sometimes, it's not your time. Sometimes, you have to make it your time. Sometimes, it can ONLY be your time.

  21. #96
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by AliG View Post
    yes but they dumped the idea, you do realize that the original nehalem was netburst on steriods right? That's why some of the features, like hyperthreading are coming back with nehalem, as the same division made both.


    If it wasn't for k8 being so successful, there would have been no conroe, instead just a beefier netburst and then on top of that intel has admitted they like the k10 design, but that its near impossible to produce it properly on a 65nm process. Now I'm not saying amd hasn't done the same, look at their original products, they were just intel parts with their name on it, but that doesn't mean intel didn't use some of the k10 design in nehalem
    But that's a two-way street. If there wasn't a P3 replacing the P2's there would have been an Athlon. Each company pushes all of their competitors to get better or die. Way too soon to write-off AMD but to pretend they're not getting pimp slapped right is worse. There's nothing on K10 Intel wanted to Copy
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  22. #97
    Xtreme Member
    Join Date
    Jun 2005
    Location
    Bulgaria, Varna
    Posts
    447
    Regarding P4 (NetBurst) - in those times L2 cache was an essential factor for the performance of that architecture because of one simple fact: P4 don't actually have L1 cache for instructions (macro-op's by Intel's language), but the notorious trace cache, storing the already decoded &#181;Op's (it added eight stages to the already long pipeline). That meant a directly loading of the macro-op's cache lines from the... yes, you guess it - the L2 region.
    Last edited by fellix_bg; 03-18-2008 at 09:53 AM.

  23. #98
    Banned
    Join Date
    May 2005
    Location
    Belgium, Dendermonde
    Posts
    1,292
    Quote Originally Posted by onewingedangel View Post
    Nehelams l3 will be about as fast as current l2 caches, and the l2 will be faster that current level 2 caches.
    where do you get this from?
    if true, that really would be impressive

  24. #99
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by LowRun View Post
    I like this one better



    I guess you are disappointed Intel is making a step backward
    Hehe, I am abit surprised. But I think the L2s are more like a L1.5, extremely fast and faster than we ever seen before with L2s. And an L3 with the speed of Core 2 L2s.

    I guess the L2 will be around some 5-6cycles. And the L3 under 15cycles.

    But it very mimmicks Itaniums cache design. And maybe a underlying requirement for effective SMT.
    Crunching for Comrades and the Common good of the People.

  25. #100
    Xtreme Enthusiast
    Join Date
    Mar 2007
    Location
    Portsmouth, UK
    Posts
    963
    Quote Originally Posted by Donnie27 View Post
    Pick self up from floor from laughing so hard. Or did you mean that as a Joke!

    IntelŪ PentiumŪ 4 processor Extreme Edition 3.20 GHz supporting Hyper-Threading Technology, with an additional 2 Megabytes of L3 cache.

    So if Intel uses it, stops using it, AMD copies Intel and then Intel returns to their original idea, it is Intel copying AMD, LOL!
    I do believe Intel got the L3 idea from the DEC Alpha EV-5 21164. It may well have been first used before that but not by Intel/AMD.

Page 4 of 9 FirstFirst 1234567 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •