Page 5 of 9 FirstFirst ... 2345678 ... LastLast
Results 101 to 125 of 212

Thread: Intel Details Nehalem uArch Improvements - 256KB L2, 8MB L3 Confirmed

  1. #101
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by DeathReborn View Post
    I do believe Intel got the L3 idea from the DEC Alpha EV-5 21164. It may well have been first used before that but not by Intel/AMD.
    I doo too and I agree I've been jumpped here for saying Intel and AMD borrows heavily from DEC/Alpha. Intel had IMC before Alpha and AMD. There'd be no Athlons without Alpha's EV6. Hell even Timna was the forerunner of Fusion but tell that to some folks here?

    Slipped and Skipped was the line about RDIMM or Rambus, hehehehe! The return of the, "Awe he didn't say that" LOL!
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  2. #102
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by GoThr3k View Post
    where do you get this from?
    if true, that really would be impressive
    The 12MB L3 on a Itanium 2 Montecito has 14 cycles latency.The The L2 has 5 for Int and 7 for FP.

    Core 2 has 14 cycles for L2 ; K8 has 12 for L2 ; K10 L2 is 15 , L3 is 30 to 45.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  3. #103
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Austin, TX
    Posts
    1,346
    Quote Originally Posted by xlink View Post
    if anything they'de go UP.

    Core CPU
    64k fast cache
    6mb medium cash

    k8 CPU
    128k fast cache
    2mb medium speed cache

    k10 cpu
    k8 CPU
    128k fast cache
    1mb medium speed cache
    3mb SLOW cache

    nehalem
    64k fast cache
    2mb somewhat fast cache speed cache
    8mb medium cache
    What you said is completely incorrect.

    Nehalem's L2 and L3 speeds are comparable to Barcelona.

    Also, K10 has the following:

    128KB L1 (fast)
    512KB L2 (medium)
    3MB L3 (slow)

    Nehalem has:

    64KB L1 (fast)
    256KB L2 (medium)
    8MB L3 (slow)


    Also, I think Shintai owes everyone a lot of money from his 100 euro bet :p
    oh man

  4. #104
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Austin, TX
    Posts
    1,346
    Quote Originally Posted by Shintai View Post
    Hehe, I am abit surprised. But I think the L2s are more like a L1.5, extremely fast and faster than we ever seen before with L2s. And an L3 with the speed of Core 2 L2s.

    I guess the L2 will be around some 5-6cycles. And the L3 under 15cycles.

    But it very mimmicks Itaniums cache design. And maybe a underlying requirement for effective SMT.
    That's not possible. The L2's latency is going to be 10-15, just like Barcelona. L3 will be 20-40, just like Barcelona. If you have an L2 latency of around 5-6 cycles, then there is no point in having a L1.

    Also, I think that Shintai owes everyone an apology for calling people ugly names when they ended up being correct :p Now what happened to my sig...
    oh man

  5. #105
    Xtreme Cruncher
    Join Date
    Sep 2005
    Location
    Bay Area, CA
    Posts
    2,819
    OMG, this is 100% pure geek :banana::banana::banana::banana:...
    ------------------------------------------------------------------------------------------------------

    Crunch with us, the XS WCG team
    ------------------------------------------------------------------------------------------------------

  6. #106
    Xtreme Addict
    Join Date
    May 2004
    Posts
    1,755
    Quote Originally Posted by Shintai View Post
    Hehe, I am abit surprised. But I think the L2s are more like a L1.5, extremely fast and faster than we ever seen before with L2s. And an L3 with the speed of Core 2 L2s.

    I guess the L2 will be around some 5-6cycles. And the L3 under 15cycles.

    But it very mimmicks Itaniums cache design. And maybe a underlying requirement for effective SMT.
    This is what Franck (CPU-Z author) as to say about Nehalem L3 cache speed. I'll take his word.

    Quote Originally Posted by cpuz View Post
    Hey guys,

    Concerning caches on Nehalem : the L3 is now shared between 4 physical cores, meaning that is offers 4 access ports. The most access ports a cache has, the slowest it is. Consequently, it is not surprising that Intel added four small, fast and dedicated (and unified) L2 between the L1s and the L3. These caches keep using an inclusive relationship, so of course this means that the useful size of these L2s is only 128KB. However, those caches are not designed for high success rates but for speed.
    CPU-Z is wrong on L1 Data size however, they should be 4x32 KB and not 4x16KB. And I don't know about FSB.

  7. #107
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Austin, TX
    Posts
    1,346
    Quote Originally Posted by LowRun View Post
    This is what Franck (CPU-Z author) as to say about Nehalem L3 cache speed. I'll take his word.
    Sure, it'll be faster than the current Penryn cache, but not by that much. There's no way it'll be under 10. My guess is a 12 cycle latency.
    oh man

  8. #108
    Xtreme Addict
    Join Date
    May 2004
    Posts
    1,755
    Quote Originally Posted by Shadowmage View Post
    Sure, it'll be faster than the current Penryn cache, but not by that much. There's no way it'll be under 10. My guess is a 12 cycle latency.
    I have no idea but from what Franck said the 4 access ports shouldn't help to make it faster.

  9. #109
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by Shadowmage View Post
    Sure, it'll be faster than the current Penryn cache, but not by that much. There's no way it'll be under 10. My guess is a 12 cycle latency.
    well, if you dont call 12 cycles fast for a 3rd lvl cache than i dont know what you define as fast.

  10. #110
    Xtreme Addict
    Join Date
    Jun 2007
    Location
    Thessaloniki, Greece
    Posts
    1,307
    Frank was implying that the L3 will be slower than current L2. I think its unlikely to be below 16cl possibly even higher
    Seems we made our greatest error when we named it at the start
    for though we called it "Human Nature" - it was cancer of the heart
    CPU: AMD X3 720BE@ 3,4Ghz
    Cooler: Xigmatek S1283(Terrible mounting system for AM2/3)
    Motherboard: Gigabyte 790FXT-UD5P(F4) RAM: 2x 2GB OCZ DDR3 1600Mhz Gold 8-8-8-24
    GPU:HD5850 1GB
    PSU: Seasonic M12D 750W Case: Coolermaster HAF932(aka Dusty )

  11. #111
    Xtreme Enthusiast
    Join Date
    Oct 2006
    Posts
    617
    Quote Originally Posted by Hornet331 View Post
    Quote Originally Posted by Shadowmage View Post
    Sure, it'll be faster than the current Penryn cache, but not by that much. There's no way it'll be under 10. My guess is a 12 cycle latency.
    well, if you dont call 12 cycles fast for a 3rd lvl cache than i dont know what you define as fast.
    i'd guess from :
    Quote Originally Posted by Shadowmage View Post
    That's not possible. The L2's latency is going to be 10-15, just like Barcelona. L3 will be 20-40, just like Barcelona. If you have an L2 latency of around 5-6 cycles, then there is no point in having a L1.
    that he was talking about l2

  12. #112
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Brother Esau View Post
    Looks like Intel is ripping off AMD to me
    How do you figure?
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  13. #113
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by AliG View Post
    I wouldn't say that, amd can't get their imc to clock well, but look at ibm's power6 monster, that is manufactured on a 65nm soi process with an imc and yet scales to 4.5ghz+ on air supposedly (though I haven't heard anything on the temps). As for hyperthreading, that failed previously because of the poor netburst design, the concept of it is quite good. Once multithreaded software appears more, you'll see the benefits, not to mention the much shorter pipeline to transfer data will help out with the hyperthreading usefullness
    IBM also leaks like a sieve .. achieving 4.5 GHz on thier 65 nm process was manipulated through both architecture and process conditions to produce high clocks. This is because IBM moved away from an OoO engine to more in order, and simplied the engine to minimize the deepest FO4 delay.

    http://www.research.ibm.com/journal/rd51-6.html (everything you want to know)

    The most important article is this one:
    http://www.research.ibm.com/journal/rd/516/curran.pdf

    Various frequency/cycle-time targets were evaluated
    during an exploratory phase. A cycle time corresponding
    to 13-FO41 inverter delays was selected based on the
    fastest known techniques to achieve back-to-back
    execution of 64-byte dependent, fixed-point instructions.
    IBM restricted themselves to a cycle time of only 13 FO4 delays for the fixed point latency, this is pretty short all things considered... but also means your circuits must be very very simply (transistor lean). Table 1 shows the FO4 delay reduction from power 5 to power 6, for both simple fixed point and fused multiply and add. IBM went in with the preconception of achieving high clockspeeds, and achieved it through this and process:

    The POWER6 processor chip is fabricated using the IBM
    high-performance 65-nm partially depleted SOI process
    with 40-nm gate length n-FETs, 35-nm gate length
    p-FETs, and 1.05-nm gate oxides
    This gate thickness is about 0.15-0.2 nm thinner than either AMD or Intel at 65 nm (their reported thicknesses were 1.3 nm and 1.25 nm respectively as I recall). Translation, IBM's power 6 is a power sucker.

    http://www.research.ibm.com/journal/rd/516/berridge.pdf

    Figure 12 shows their leakage curve, following an exponential you would expect for tunneling current in such a thin gate. At nominal operating conditions for a 4.5 GHz processor which is about 8.5 ps, their leakage just through the gate is about 80 Watts.

    This is doable for the market that Power6 is designed for, which are high class enterprise systems where cooling solutions can be specifically designed and, if throughput is high enough, the higher power can be justified.

    IBM's design and the process tweaks they made to get there is a very special application, and is completely in appropriate for the markets AMD or Intel service... extrapolating or implying that AMD could do a 4.5 GHz because IBM can do 4.5 GHz is simply incorrect, and anyone counting on that should not hold their breath... it just ain't gonna happen.

    Jack
    Last edited by JumpingJack; 03-18-2008 at 10:25 PM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  14. #114
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Shadowmage View Post
    What you said is completely incorrect.

    Nehalem's L2 and L3 speeds are comparable to Barcelona.

    Also, K10 has the following:

    128KB L1 (fast)
    512KB L2 (medium)
    3MB L3 (slow)

    Nehalem has:

    64KB L1 (fast)
    256KB L2 (medium)
    8MB L3 (slow)


    Also, I think Shintai owes everyone a lot of money from his 100 euro bet :p
    We don't know this detail yet.... if Intel does not put the L3 on it's own clock domain, Intel's L3 cache will clock at core frequency ... significantly better than AMD's 1800/2000 Mhz L3. In otherwords, it could be faster or it could be slower... we won't know until we have the CPUs in the wild and people actually measure it.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  15. #115
    Xtreme Member
    Join Date
    Oct 2007
    Posts
    407
    i might even go with the extreme processor to start as well!!!
    If you plan to overclock you may have no choice. Bye bye FSB. Hello multiplier locking. I predict this is going to add a huge amount of value to their 'Extreme' chips. Intel tends to leave a lot of overclocking headroom in their chips. Makes me happy that I own my single share of Intel stock .

    Since they may be finally closing off the FSB overclocking loophole I just wish they would include two versions of the Extreme. One at the highest bin for whatever they want to charge. $1099 or something like that. And then a prosumer version with a lower bin but still with an unlocked multiplier for $699 or so. The bleeding edge enthusiasts with deep pockets would still get the high end chip, but the overclockers without so much money might be willing to spend a bit more than usual for the ability to overclock this monster of a chip. I know I would depending on how high it clocked over stock. But no matter how high it clocked it would be difficult to justify spending over $1000 on a cpu.

  16. #116
    Xtreme Addict
    Join Date
    May 2007
    Location
    'Zona
    Posts
    2,346
    Quote Originally Posted by Shadowmage View Post
    Also, I think Shintai owes everyone a lot of money from his 100 euro bet :p
    Jeez... everyone kept saying how dumb I am... but yet I turn out to be right...
    Hmmmm.... makes you wonder.

    Edit- Sadly, it seems like Enjoy was the only one that took the bet, though it sorta seemed like you wanted to aswell.
    I personally feel like we should hold him too it.
    Last edited by LordEC911; 03-18-2008 at 11:17 PM.

  17. #117
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by gojirasan View Post
    If you plan to overclock you may have no choice. Bye bye FSB. Hello multiplier locking. I predict this is going to add a huge amount of value to their 'Extreme' chips. Intel tends to leave a lot of overclocking headroom in their chips. Makes me happy that I own my single share of Intel stock .
    AMD CPUs don't have a FSB, how do those get overclocked?
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  18. #118
    Xtreme Member
    Join Date
    Oct 2006
    Location
    S.California
    Posts
    380
    through HTT
    Cpu: Intel Core i7 920 @ 3.9 ghz (cooled w/ Apogee GTZ)
    Mobo: Gigabyte EX58 UD5
    Ram: G.SKill 3x1 GB DDR3 1600
    GPU: GTX 280
    PSU: E Power 1000 Watt

  19. #119
    Xtreme Enthusiast
    Join Date
    Mar 2007
    Posts
    557
    To be more precise, through HTT base frequency, from which all other frequencies are derived.

    The main limiting factor in Nehalem overclocking may be the same as it's main advantage over Core2 arch. - triple channel IMC. Increased number of wires required for 3-channel IMC may negatively affect over clocking headroom.

    Lets wait and see. This should be the most exciting upgrade for me since i built my dual dual-core Opteron machine 3 years ago (i was not really impressed by my 8-core Intel Xeon build )

  20. #120
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Shadowmage View Post
    What you said is completely incorrect.

    Nehalem's L2 and L3 speeds are comparable to Barcelona.
    And you know that how ? Intel always had faster caches than AMD.Did they lost all that know-how overnight ?
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  21. #121
    Xtreme Mentor
    Join Date
    Aug 2006
    Location
    HD0
    Posts
    2,646
    Quote Originally Posted by Shadowmage View Post
    What you said is completely incorrect.

    Nehalem's L2 and L3 speeds are comparable to Barcelona.

    Also, K10 has the following:

    128KB L1 (fast)
    512KB L2 (medium)
    3MB L3 (slow)

    Nehalem has:

    64KB L1 (fast)
    256KB L2 (medium)
    8MB L3 (slow)


    Also, I think Shintai owes everyone a lot of money from his 100 euro bet :p
    here's the thing, more likely than not it will not have any slow l3 cache. The l3 cache will be comparable to todays' l2 cache in terms of speed.

    and dont' say that's impossible because they've got dang low latency cache on itanium despite it having A TON of chache so the manufacturing tech is obviously there.

    watch as the l2 is around 6-15 cycles and the l3 is around 10-20 cycles, AND the core is really high clocking making overall latency even lower than today's Core uArch.

    this is netburst on steroids. It's wider more accurate and emphasizes width over length unlike the original. It's the return of the 20 stage pipeline and I think that this time the world is ready for it unlike the last. Just based off of the fact that it's using DDR3 on a 196 bit bus something tells me it'll be very bandwidth hungry. VERY. Why the heck do you think they redesigned the cache architecture they'de been using for the past decade?
    Last edited by xlink; 03-19-2008 at 01:26 AM.

  22. #122
    Xtreme Addict
    Join Date
    May 2005
    Posts
    1,341
    Quote Originally Posted by JumpingJack View Post
    We don't know this detail yet.... if Intel does not put the L3 on it's own clock domain, Intel's L3 cache will clock at core frequency ... significantly better than AMD's 1800/2000 Mhz L3. In otherwords, it could be faster or it could be slower... we won't know until we have the CPUs in the wild and people actually measure it.
    that was original barcelona L3 speed, by the time you will see nehalem you will also see shangai with several redesigns including this l3 speed and you just have to check the amd forum to see what performance difference this makes on phenom cpu's...
    Last edited by duploxxx; 03-19-2008 at 02:16 AM.
    Quote Originally Posted by Movieman View Post
    Fanboyitis..
    Comes in two variations and both deadly.
    There's the green strain and the blue strain on CPU.. There's the red strain and the green strain on GPU..

  23. #123
    Xtreme Legend
    Join Date
    Jul 2004
    Location
    France
    Posts
    354
    Quote Originally Posted by LowRun View Post
    I have no idea but from what Franck said the 4 access ports shouldn't help to make it faster.
    Hey guys,
    meanwhile I learned more about the Nehalem caches.
    Unlike what I previously stated, the L3 does not offer 4 access ports, but only one. This is also what explains the presence of these L2s. I explain what I understood :

    When several cores share a cache level, this cache has to answer to them as fast as possible, in order the cores do not spend too much time waiting. Two methods exist to reduce latencies :

    - increase the number of access ports. This was my 1st thought, since this is the best solution on the paper. However, in practice, this drastically increases the complexity, and increase from 1 to 4 port can increase the cache surface by 2 or 3. So this is not possible atm.

    - use a banked access method, a little bit like what is done for DRAMs. This allows the cache to be accessed by different threads in the same time (under certain conditions, exactly like DRAMs technology), however the bank accesses results in lot of performance drop. Considering that a 8 MB L3 is already slow due to its size, this is not a good solution neither.

    So, Intel choosed to reduce the number of accesses to this shared cache. This is what the small L2s are aimed for. These L2s are small, and due to inclusive relationship with L1s, the effective size can be as low as 196 KB (and not 128 KB as I previously said). With such a size, the hit rate can not be very high (see the Celeron), but this is not very important. Let's say the hit rate is only 50% (that is a pessimistic statement), that means that hafl of the core requests are handled by the L2. So, in the worst case of 4 requests in the same time, only two arrive to the L3. Exactly the same as what currently happens on the Core 2 Duo.
    Moreover, the 50% of the requests handled by the L2 are treated much faster as if they were handled by the L3. So, the overall cache hierarchy efficiencey is even better.

    There are some drawbacks however :
    - SMT results in 8 possible simulataneous accesses, and not 4.
    - power dissipation is increased. Adding 1 MB (4x256) results in 1/8 = 12.5% dissipation increase. For that reason, it is possible that the L3 uses different voltage/clock planes, but I was not confirmed that yet.

  24. #124
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by cpuz View Post
    ...

    So, Intel choosed to reduce the number of accesses to this shared cache. This is what the small L2s are aimed for. These L2s are small, and due to inclusive relationship with L1s, the effective size can be as low as 196 KB (and not 128 KB as I previously said). With such a size, the hit rate can not be very high (see the Celeron), but this is not very important. Let's say the hit rate is only 50% (that is a pessimistic statement), that means that hafl of the core requests are handled by the L2. So, in the worst case of 4 requests in the same time, only two arrive to the L3. Exactly the same as what currently happens on the Core 2 Duo.
    Moreover, the 50% of the requests handled by the L2 are treated much faster as if they were handled by the L3. So, the overall cache hierarchy efficiencey is even better.

    There are some drawbacks however :
    - SMT results in 8 possible simulataneous accesses, and not 4.
    - power dissipation is increased. Adding 1 MB (4x256) results in 1/8 = 12.5% dissipation increase. For that reason, it is possible that the L3 uses different voltage/clock planes, but I was not confirmed that yet.
    By simply looking at the die picture you see that things are much more complicated.Look at the L3 controllers ( the write buffers ) , they're freaking huge!

    The new , 2nd level TLB also implies really complex sharing and arbitration mechanism.All of the above , coupled with Intel's second to none expertise in fast cache makes me believe we'll all going to be surprised by the performance of Nehalem's cache subsystem.

    http://chip-architect.com/news/Shanghai_Nehalem.jpg
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  25. #125
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Hi guys:
    I read thru here and you folks know much more on the technical end of this than I do.
    I work with the dual socket boards so that's what I tend to look for information on.
    Now we know that the Harpertowns(Penryns) get an approximate 10% increase clock for clock over the Clovertown(Kentsfields,C2D) and what I'm hearing is that Nehalem will be 20-30% better clock for clock than the Harpertowns.
    That's on pretty good authority.
    Not scientific but lets just say this guy knows what he's talking about and no, not someone from this forum.
    I also wouldn't stick my neck out and say this if I wasn't pretty damned sure this was true.
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

Page 5 of 9 FirstFirst ... 2345678 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •