Nehalem wont be 20 stage. It will be somewhere around what we have today ~12-14. It is a Core 2 evolutionary step.
DDR3 on a 192bit bus is the evolutionary step. More cores simply just need more bandwidth and Nehalem is designed to scale to 8 cores (16 threads).
I think they redesigned the cache structure mainly due to SMT and speed of the shared cache. Plus its the exact same design as Itanium. So you have more knowledge and experience.
Bookmarks