I want one! Goodbye AMD, forever.![]()
Crunch with us, the XS WCG team
The XS WCG team needs your support.
A good project with good goals.
Come join us,get that warm fuzzy feeling that you've done something good for mankind.
The L3 still holds 8MB of data for quick access, which gets pulled into l2, then l1 when needed. The l3 is only there to provide a quick access point for the l2 to grab data, and the l2 has access to a pool of 8MB worth of data - when the l2 cache uses uses data from l3 thats a successful hit.
Regardless of the fact that data may be stored simultanously in the l2 and l3, each level 2 has access to 8mb of l3 (and if it is non dependent you could have all cores using the same data)
My point was about the data update in the caches and coherent state snooping, but yes -- 8MB is 8MB, however you look at it. I just stressed the inclusive nature in that case and what it contributes to the threading in overall.
A simple pointer-chasing graph would tell us enough about the whole picture, anyway!![]()
Last edited by fellix_bg; 03-18-2008 at 04:59 AM.
But increasing l2 size wouldn't negatively affect total l3 - as even if you had more data duplicated, you still have the same total capacity l3. l2 wsn't restricted to 256KB just to reduce duplication (as in such a large design another 1MB cache isn't that much), but rather to keep the l2 as fast as possible.
Nehelams l3 will be about as fast as current l2 caches, and the l2 will be faster that current level 2 caches.
Intel could have gone for 512KB of l2 per core, but this would have meant a slower l2, which would more than negated the increased capacity. If intel can maintain the speed and increase capacity of l2 I'm willing to bet they will on subsequent generations.
Last edited by onewingedangel; 03-18-2008 at 05:10 AM.
desktop boards will probably have 4 dimm slots to maintain basic ATX design. 3 of them interleaved and 1 just extra. Since you don't have to balance trace length of the northbridge with all of the 3 other components on the mobo (CPU/mem/SB) you're left with a bit more freedom on layout, so the more creative with PCB real estate will probably be able to cram 6 slots on a board so that they can fill 2 dimms per channel.
Main-- i7-980x @ 4.5GHZ | Asus P6X58D-E | HD5850 @ 950core 1250mem | 2x160GB intel x25-m G2's |
Wife-- i7-860 @ 3.5GHz | Gigabyte P55M-UD4 | HD5770 | 80GB Intel x25-m |
HTPC1-- Q9450 | Asus P5E-VM | HD3450 | 1TB storage
HTPC2-- QX9750 | Asus P5E-VM | 1TB storage |
Car-- T7400 | Kontron mini-ITX board | 80GB Intel x25-m | Azunetech X-meridian for sound |
Then why Intel bothers to propagate inclusive design here?
From your statements, there is no consequentive difference between inclusive and exclusive relationship, or it's me couldn't get your point?
By the way, Intel have--since long time--fast enough SRAM cells in much bigger arrays than this skinny 256K one.
Itanic is a long instruction architecture, so the caching organization is a subordinate to the rather weird specifics of its EPIC design.
Anyway, I think the L2 in Nehalem is--by design--to counter the shared L3, not as a decisive performant part for the architecture...
And by all means the L3 here should be way (relatively) faster, being closely related, than the AMD's K10 implementation and Dunnington one, too.
Last edited by fellix_bg; 03-18-2008 at 06:06 AM.
Originally Posted by Movieman
Posted by duploxxx
I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
Posted by gallag
there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.qft!
Not only latency, as was shown in their slides, Intel says its L2 is Smarter as well. If it is smarter, less size is needed, right?
Also it goes back to something Intel learned for the small Very Fast L1 and L2 used with the P4. They didn't want to make the same "Prescott" mistake that added 17% more L2 latency. No matter what's said in this forum, everything about Netburst didn't suck.
Originally Posted by Movieman
Posted by duploxxx
I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
Posted by gallag
there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.qft!
Speaking of L3 Shintai,
Jus't teasing you man!.
------------------------------------------------------
Here's some interesting comments from knowledgeable people: (Mainly on Faster Synchronization Primitives which looks like a nice feature)
http://realworldtech.com/forums/inde...88380&roomid=2>If using the lock prefix is a legacy operation what are
>the modern ones?
Linus Torvalds:
I don't think there are any - I think they just meant that
they made the old legacy instructions run faster, instead
of trying to introduce anything new.
Which I really look forward to testing. The serialization
overhead of Core 2 is better than many other processors,
but everything else is so good that it still stands out
like a sore thumb. We have lots of kernel loads where one
of the biggest costs is just locking (even without any
nasty contention and cacheline ping-ping), because of how
it serializes the pipeline.
Now that people are trying to push more and more multi-
threaded programming paradigms, the locking is finally
getting some real exposure. It's always been a big issue
in kernels, but now all the fast user-level locking is
making it show up in "normal" loads too.
--------------------------------------
That's something I'm also looking forward too. Even without contention acquiring locks is *painful*. Unless the data/code you're protecting takes a significant amount of time to process/execute you'll be bitten by the sheer cost of the lock/unlock couples so there is room for *lots* of improvement there.
----------------------------------------
+1
It's not uncommon for Java workloads to waste 10% or more of the time processing uncontended locks, and I've seen up to ~30% in real-world apps(1).
The underlying reason is that many critical parts of the core Java library are synchronized (StringBuffer, HashTable, many I/O functions). While there are new APIs that avoid this (StringBuilder, HashMap, etc) there is lots of legacy code that uses the old APIs directly or indirectly.
JVMs usesoptimization tricks to avoid this (lock removal, lock elison, lazy unlocking etc) but that only serves to alleviate the problem, and doesn't entirely resolve it.
-- Henrik
(1) Measured as the increase in throughput when locks are forcefully disabled in JRockit (using -XXlazyunlocking or just hacking the JVM to not issue CAS instructions). The 30% number comes from a JSP-heavy app I ran into some time back. SPECjbb2005 gains ~10% by the use of -XXlazyunlocking.
Faster Synchronization Primitives: As multi-threaded software becomes more prevalent, the
need to synchronize threads is also becoming more common. Next generation Intel
microarchitecture (Nehalem) speeds up the common legacy synchronization primitives (such
as instructions with a LOCK prefix or the XCHG instruction) so that existing threaded
software will see a performance boost.
That's actually the part that I like the most. Better overall IPC is a very nice thing but lowering the cost of the synchronization primitives is much more interesting. It enables parallelization of 'harder' workloads which are not really suitable to parallelization and reap lower benefits because of the synchronization overhead.
http://aceshardware.freeforums.org/n...ting-t423.html
Interesting.. Let's hope all these <on paper> enhancements and buzz will turn real.. If the claim of Nehalem > Core 2 more than Core 2 > P4 will hold true than it's going to be really insane.
Faceman![]()
Looks like Intel is ripping off AMD to me![]()
SuperMicro X8SAX
Xeon 5620
12GB - Crucial ECC DDR3 1333
Intel 520 180GB Cherryville
Areca 1231ML ~ 2~ 250GB Seagate ES.2 ~ Raid 0 ~ 4~ Hitachi 5K3000 2TB ~ Raid 6 ~
yes but they dumped the idea, you do realize that the original nehalem was netburst on steriods right? That's why some of the features, like hyperthreading are coming back with nehalem, as the same division made both.
If it wasn't for k8 being so successful, there would have been no conroe, instead just a beefier netburst and then on top of that intel has admitted they like the k10 design, but that its near impossible to produce it properly on a 65nm process. Now I'm not saying amd hasn't done the same, look at their original products, they were just intel parts with their name on it, but that doesn't mean intel didn't use some of the k10 design in nehalem
Pick self up from floor from laughing so hard. Or did you mean that as a Joke!
IntelŪ PentiumŪ 4 processor Extreme Edition 3.20 GHz supporting Hyper-Threading Technology, with an additional 2 Megabytes of L3 cache.
So if Intel uses it, stops using it, AMD copies Intel and then Intel returns to their original idea, it is Intel copying AMD, LOL!![]()
![]()
Originally Posted by Movieman
Posted by duploxxx
I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
Posted by gallag
there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.qft!
_________Originally Posted by xlink
Sorry for being so slow to respond.
The cache doesnt look to be universal- each 256KB is dedicated to one core. The slide even says "per core." Whats the orange in between the L2 and the L1-Data? Is that what you've called the L1.5?
Also- im assuming it starts off as a quad, so the *8 is only accurate for servers.
I cant see Nehalem having more cache to play with than Penryn for single-threaded apps, depending on how the L3 is used.
![]()
But that's a two-way street. If there wasn't a P3 replacing the P2's there would have been an Athlon. Each company pushes all of their competitors to get better or die. Way too soon to write-off AMD but to pretend they're not getting pimp slapped right is worse. There's nothing on K10 Intel wanted to Copy![]()
Originally Posted by Movieman
Posted by duploxxx
I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
Posted by gallag
there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.qft!
Regarding P4 (NetBurst) - in those times L2 cache was an essential factor for the performance of that architecture because of one simple fact: P4 don't actually have L1 cache for instructions (macro-op's by Intel's language), but the notorious trace cache, storing the already decoded µOp's (it added eight stages to the already long pipeline). That meant a directly loading of the macro-op's cache lines from the... yes, you guess it - the L2 region.
Last edited by fellix_bg; 03-18-2008 at 09:53 AM.
Hehe, I am abit surprised. But I think the L2s are more like a L1.5, extremely fast and faster than we ever seen before with L2s. And an L3 with the speed of Core 2 L2s.
I guess the L2 will be around some 5-6cycles. And the L3 under 15cycles.
But it very mimmicks Itaniums cache design. And maybe a underlying requirement for effective SMT.
Crunching for Comrades and the Common good of the People.
Bookmarks