Intel Details Nehalem uArch Improvements - 256KB L2, 8MB L3 Confirmed

**Face** · 03-18-2008, 07:44 AM

Speaking of L3 Shintai,

Originally Posted by Shintai

100euro on its fake...

There would be no FSB, No L3 etc. And I dont think CPU-z even got the Nehalem word...but could be wrong.

L1 would also be 2x32KB or more.

And i´m sure a 2 socket Nelahem system would have more than 2GB memory...

Also why are these screenshots always in so poor quality...to hide the photoshop marks?

------------------------------------------------------
Not yet, as he even says himself it reads some parts wrong.

I still dont believe in a L3. It simply makes no sense when looking on the size and past history. Itanium only got a L3 due to the massive sizes of up to 24MB and soon 30MB. And I dont think anyone here on the board got access to a nehalem system, nor will have it for the next 3-6 months.

L3 is a step backwards for mainstream, not upwards.

Jus't teasing you man!

.

------------------------------------------------------

Here's some interesting comments from knowledgeable people: (Mainly on Faster Synchronization Primitives which looks like a nice feature)

>If using the lock prefix is a legacy operation what are
>the modern ones?

Linus Torvalds:
I don't think there are any - I think they just meant that
they made the old legacy instructions run faster, instead
of trying to introduce anything new.

Which I really look forward to testing. The serialization
overhead of Core 2 is better than many other processors,
but everything else is so good that it still stands out
like a sore thumb. We have lots of kernel loads where one
of the biggest costs is just locking (even without any
nasty contention and cacheline ping-ping), because of how
it serializes the pipeline.

Now that people are trying to push more and more multi-
threaded programming paradigms, the locking is finally
getting some real exposure. It's always been a big issue
in kernels, but now all the fast user-level locking is
making it show up in "normal" loads too.

--------------------------------------
That's something I'm also looking forward too. Even without contention acquiring locks is *painful*. Unless the data/code you're protecting takes a significant amount of time to process/execute you'll be bitten by the sheer cost of the lock/unlock couples so there is room for *lots* of improvement there.

----------------------------------------
+1

It's not uncommon for Java workloads to waste 10% or more of the time processing uncontended locks, and I've seen up to ~30% in real-world apps(1).

The underlying reason is that many critical parts of the core Java library are synchronized (StringBuffer, HashTable, many I/O functions). While there are new APIs that avoid this (StringBuilder, HashMap, etc) there is lots of legacy code that uses the old APIs directly or indirectly.

JVMs usesoptimization tricks to avoid this (lock removal, lock elison, lazy unlocking etc) but that only serves to alleviate the problem, and doesn't entirely resolve it.

-- Henrik

(1) Measured as the increase in throughput when locks are forcefully disabled in JRockit (using -XXlazyunlocking or just hacking the JVM to not issue CAS instructions). The 30% number comes from a JSP-heavy app I ran into some time back. SPECjbb2005 gains ~10% by the use of -XXlazyunlocking.

Faster Synchronization Primitives: As multi-threaded software becomes more prevalent, the
need to synchronize threads is also becoming more common. Next generation Intel
microarchitecture (Nehalem) speeds up the common legacy synchronization primitives (such
as instructions with a LOCK prefix or the XCHG instruction) so that existing threaded
software will see a performance boost.

That's actually the part that I like the most. Better overall IPC is a very nice thing but lowering the cost of the synchronization primitives is much more interesting. It enables parallelization of 'harder' workloads which are not really suitable to parallelization and reap lower benefits because of the synchronization overhead.

http://realworldtech.com/forums/inde...88380&roomid=2
http://aceshardware.freeforums.org/n...ting-t423.html

Interesting.. Let's hope all these <on paper> enhancements and buzz will turn real.. If the claim of Nehalem > Core 2 more than Core 2 > P4 will hold true than it's going to be really insane.

Thread: Intel Details Nehalem uArch Improvements - 256KB L2, 8MB L3 Confirmed

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions