Intel Details Nehalem uArch Improvements - 256KB L2, 8MB L3 Confirmed

**xlink** · 03-20-2008, 11:00 AM

Originally Posted by gojirasan

I don't think any of us here really know what is going on in the minds of Intel corporate folks. However, I would guess that Intel would actually prefer to stop or at least hinder overclocking if they could, at least in the lower bins.

if they did that they'de kill this market segment and I'd be off to AMD land more likely than not.

besides fried CPU + fried board with intel parts = more profit.

if anything they should encourage us to run our G0s and even B3s at 4Ghz day to day for the sake of their profits.

**Blauhung** · 03-20-2008, 11:35 AM

Intel is by no means killing overclocking. The current platforms just might limit it's usefulness to only the high end desktop platforms based on Bloomfield. Only time will tell how Lynnfield and Havendale will operate, and there's a chance there is some saleability left in the platform.

**Movieman** · 03-20-2008, 11:37 AM

Originally Posted by Blauhung

Intel is by no means killing overclocking. The current platforms just might limit it's usefulness to only the high end desktop platforms based on Bloomfield. Only time will tell how Lynnfield and Havendale will operate, and there's a chance there is some saleability left in the platform.

Hey,hey,hey..Tell them to leave a little for us dual socket guys.
We like to push the systems also.

**Blauhung** · 03-20-2008, 12:05 PM

Originally Posted by Movieman

Hey,hey,hey..Tell them to leave a little for us dual socket guys.
We like to push the systems also.

since the dual socket systems use the exact same silicon that the Smackover platform uses (Nehalem CPU, Tylersburg bridge, each with 1 QPI disabled in UP desktop), I see no possible reason that motherboards won't come out with all the same tweaking options. I don't know if there's a Skull Trail type Intel designed board in the works, but all the groundwork is there to make one.

**shiznit93** · 03-20-2008, 12:07 PM

Originally Posted by xlink

if they did that they'de kill this market segment and I'd be off to AMD land more likely than not.

besides fried CPU + fried board with intel parts = more profit.

Yea, since AMD overclocking is so hot right now

BTW, fried cpu = rma != profit

**gojirasan** · 03-20-2008, 02:13 PM

Originally Posted by Donnie27

Intel did complain about Shady VAR's selling overclocked systems.

Did I ever say they didn't? They complained, but no one in the overclocking community believed them. Did you? I am sure they were crying a river that they made overclocking so much more difficult. I admit that it's possible that since that time they have seen that the ability to overclock does not really eat into their profits. In fact maybe they will ship all future CPUs with unlocked multipliers instead of just the extreme editions. That would surely build some good will in this community. Do you think they will?

It would be VERY SILLY for Intel to sponsor Fugger's Demo and then do as you suggest

How do you figure?

This small market can't influence Intel or AMD's bottom line=P

On this we agree. At least not significantly. But maybe they are not as sure of that as we are. If not then how do you explain that both companies use multiplier locking?

It would be a waste of time for Intel or Anyone else to worry about legit overclockers as compared to some Jerk selling a 2.4GHz as a 3GHz. There were plenty of Bogus Companies selling Counterfeit everything from fake MS mice, re-badged RAM, overclocked processor, Windows all the way back to 3.11 and even DOS LOL!

I agree that it would be a waste of time unless it is very easy and cheap to do. If it is expensive then it is clearly not worth it. If we are lucky it will take some extra and very costly modifications to prevent overclocking Nehalem. As far as there having been 'plenty' of counterfeit CPUs, IIRC wasn't most of the remarking done in Europe? And I thought it was pretty limited even there. Of course pre-overclocked systems are a different story. I have no idea how prevalent that was.

Contradicting your own statements uh?

Yup. I'm human. I realized I was wrong. But now I'm right

.

But you're dead wrong about Good-Will. Intel spent too much time and money gaining that BACK from AMD. Even as A64 was barely better and X2 was CLEARLY better, Intel kept Good-will right up until Prescott. Many folks loved their Northwood C's.

IIRC, only the true fanbois liked Northwood. I absolutely refused to buy any Pentium 4 product. In fact I am typing this on a Pentium 3. They made a lot of bad decisions in those days. Rambus and Netburst. My god. And now I have bought a share of Intel stock not to make money but just because they have shown themselves to be so seriously baddass. For once an American company I can be proud of. They have obviously learned from the errors of their ways. Now if Nvidia could only do the same.

I don't think 'good will' means a whole lot to most of us. And loyalty is seriously overrated. Enthusiasts are about the least loyal customers they could have. We'll jump ship over a few extra FPS in Crysis or 30 seconds less render time in 3DStudioMax or a price $10 lower. The bleeding edge is the bleeding edge regardless of whose logo is on the box. I don't think Intel is unaware of that either.

Then you're unaware of how much higher the higher Multiplier Processors can go

They hit the wall much later than the Cheaper models.

Indeed I am. Since I scored my E8400 I haven't been paying much attention to the overclocking records. Does the E8500 really clock that much higher? That's surprising since the stock clock is not much higher. I guess I'll have to go take a look at the numbers.

The problem with what you're saying here is that Nehalem *should start out faster clock for clock. Meaning it doesn't have to overclocked as hard.

Maybe it won't have to be overclocked 'as hard', but if it can't be overclocked at all we may find lots of enthusiasts sticking with Penryn until the next process shrink. IIRC, my lil E8400 with a stock speed of 3GHz has made it up to 4.7 Ghz on air and 5+ on phase. That is a lot more than 30% faster. So for an overclocker that may become relevant in a Penryn vs. Nehalem comparison.

Last but not least, as was proven at IDF, Intel and most of the folks there are VERY AWARE of overclocking and this site.

So what if they are aware? That doesn't mean they are going to unlock all their multipliers, provide a warranty for overclocked chips, and welcome the overclocking community with open arms. Talk is cheap. Let's see some action if they support us so much.

**Bobsama** · 03-20-2008, 02:30 PM

Originally Posted by Blauhung

since the dual socket systems use the exact same silicon that the Smackover platform uses (Nehalem CPU, Tylersburg bridge, each with 1 QPI disabled in UP desktop), I see no possible reason that motherboards won't come out with all the same tweaking options. I don't know if there's a Skull Trail type Intel designed board in the works, but all the groundwork is there to make one.

MM has it right--DP is where many of us lusting after the highest performance are at. I'd personally like to see a lot more options for overclocking on DP systems. They're the same silicon and they'll likely clock better due to higher binning. I'd like to see all the same options on desktop as on workstation/servers.

BTW, Intel won't unlock everything and replace chips we kill. Replace those few CPUs that are really dead, not those overclocked to death.

**Donnie27** · 03-20-2008, 07:23 PM

Originally Posted by gojirasan

Did I ever say they didn't? They complained, but no one in the overclocking community believed them. Did you?

I did and no, I wasn't the only one. Intel didn't shut or punished folk for shipping Enthusiast Motherboards that did allow overclocking. In fact, one of these folks came up the current scheme to show the Processor's Name and Original Speed in a ROM. That's now why you see a something like E6600 at 3200MHz in the current BIOS, so unscrupulous dealers can't sell it as a 3.2GHz for more money. Problem solved.

Originally Posted by gojirasan

I am sure they were crying a river that they made overclocking so much more difficult. I admit that it's possible that since that time they have seen that the ability to overclock does not really eat into their profits. In fact maybe they will ship all future CPUs with unlocked multipliers instead of just the extreme editions. That would surely build some good will in this community. Do you think they will?

Sorry that is even close to how any of this works. These folks are in Business to make money. Not so you can skate their speed binning efforts.
Intel always leaves some headroom, that's about all any overclocker should depend on=P

It would be pretty unprofitable for Intel, AMD, IBM or any one else to sell you an unlocked CPU that would EAT their own profits. AMD only did this when most folks know their processors are still slower after its overclocked. Yet they're still speed binned and still used price difference for faster models.

Just because you don't believe Intel Created Goodwill even on this forum, doesn't mean everyone else feels the same way you do.

http://www.xtremesystems.org/forums/...ght=Fugger+IDF

Goodwill

How do you figure?

It would be silly for Intel to sponsor Fugger, give props to this site, have a round table of Geeks, implement some of their Ideas and then turn off the tap. Sorry that makes no sense what so ever to me.

On this we agree. At least not significantly. But maybe they are not as sure of that as we are. If not then how do you explain that both companies use multiplier locking?

Yepp, I did several times already. I started building computer in 1994. I remember folks getting sold 300MHz Celerons overclocked and sold as 450MHz. Most folks who has dealt with computer back in those days will tell the same. Overclockers were the only ones loving those overclocking friendly Celerons.

I agree that it would be a waste of time unless it is very easy and cheap to do. If it is expensive then it is clearly not worth it. If we are lucky it will take some extra and very costly modifications to prevent overclocking Nehalem. As far as there having been 'plenty' of counterfeit CPUs, IIRC wasn't most of the remarking done in Europe? And I thought it was pretty limited even there. Of course pre-overclocked systems are a different story. I have no idea how prevalent that was.

Remarking is done all over the world. One place even tried to see broken CPUs AMD had thrown away. Oh, we're not innocent either. I remember the SNDS BD where Intel warned folk no to exceed 1.7 v-core. Folks using 1.9v was wondering why their procs were dying. This only affect something like 40% of all processors so the other 60% said it must have been something unrelated, sheesh!

I do remember those days well. Last year a friend of mine sent my old Bootlegged MS Mouse to Microsoft LOL! I have Rebadged GSkill RAM LOL!

Yup. I'm human. I realized I was wrong. But now I'm right

.

I've been wrong as well, will be wrong again. All I really hope to do is learn something when I'am wrong.

IIRC, only the true fanbois liked Northwood. I absolutely refused to buy any Pentium 4 product. In fact I am typing this on a Pentium 3. They made a lot of bad decisions in those days. Rambus and Netburst. My god. And now I have bought a share of Intel stock not to make money but just because they have shown themselves to be so seriously baddass. For once an American company I can be proud of. They have obviously learned from the errors of their ways. Now if Nvidia could only do the same.

Nope not at all. Northwood hung well with its competition, still had better motherboards and overclocked easy as hell. Northwood didn't need RAMBUS, Intel had switched to DDR-400 by then. The cool thing was that Intel announced DDR-400, NW 800MHz FSB and 865/875 long before it shipped.

I don't think 'good will' means a whole lot to most of us. And loyalty is seriously overrated. Enthusiasts are about the least loyal customers they could have. We'll jump ship over a few extra FPS in Crysis or 30 seconds less render time in 3DStudioMax or a price $10 lower. The bleeding edge is the bleeding edge regardless of whose logo is on the box. I don't think Intel is unaware of that either.

So ask the owner of this site?

It doesn't mean locked multipliers are a sign that Intel or AMD is at war with us. I bought NW, it was an upgrade of my old AthlonXP that replaced my Thunderbird. 3500+ replaced that NW and my current Conroe replaced that 3500+. If Phenom would have kicked ass, I would jumped on that bandwagon.

Indeed I am. Since I scored my E8400 I haven't been paying much attention to the overclocking records. Does the E8500 really clock that much higher? That's surprising since the stock clock is not much higher. I guess I'll have to go take a look at the numbers.

I honestly don't know for sure. I did say I believe.

Maybe it won't have to be overclocked 'as hard', but if it can't be overclocked at all we may find lots of enthusiasts sticking with Penryn until the next process shrink. IIRC, my lil E8400 with a stock speed of 3GHz has made it up to 4.7 Ghz on air and 5+ on phase. That is a lot more than 30% faster. So for an overclocker that may become relevant in a Penryn vs. Nehalem comparison.

I said we'll have to wait and see. But that's just my opinion, not a fact. I pointed out that I could easily be wrong.

Originally Posted by gojirasan

So what if they are aware? That doesn't mean they are going to unlock all their multipliers, provide a warranty for overclocked chips, and welcome the overclocking community with open arms. Talk is cheap. Let's see some action if they support us so much.

IMHO that's unrealistic=P I don't even think overclockers expect something like that. If I were you, I wouldn't use the term "us" with that claim.

I openly complained to folks trashing out the FSB on the desktop. I said that FSB was more flexible and would be easier to overclock, I got jumped for my efforts

I wonder if folks still think FSB is so terrible now?

My fingers are still crossed for that legacy Nehalem chip/s for the old fashioned stuff.

**KTE** · 03-21-2008, 03:25 AM

Jack, I really don't want to get into details since its too lengthy for the time I have sidelined to post on a rumor thread, but for the sake of you and other level-headed enthusiasts, heregoes

Originally Posted by JumpingJack

Originally Posted by KTE

K10h is a 12 stage pipeline, 65nm, 283mm², 463M transistor, 23.x FO4 delays design. Not made for high clocks in any way, AMD intended, as presented at one of the global IEEE 2006 conferences to reach 2-2.8GHz with Barcelona with it's rated supply Vdd. Intel Core 2 is a 21 FO4 depth design AFAIK and Penryn at FO4 ~18, it is supposed to have been reduced substantially since HKMG integration.

The IBM Power6 is not the least nor the only architecture with 13 FO4 inversion delay, it just happens to be very well tuned for absolute speed and performance. P3 had FO4 15 depth, Willamette P4 FO4 8-10, Alpha 21264 has 15 FO4, and so on. Neither of those could achieve what IBM did.

If you have references, I would really love to see those.

They are details I've known from Intel and AMD themselves directly, thus more authoritative than online quotes I provide, apart from Penryn Core 2 Extreme where I was only told that FO4 delay is reduced from Core 2 65nm (drop them an email or ask David/s at RW, they should know as ISSCC and IEEE 2006/7 Conferences did make brief mention of them). The fact that either MFG doesn't like releasing such vital data online since the P4 days, means we're not going to find much online on it since the little documentation that does exist does not cover either architectural engineering in such depth for competitive reasons, until it's old. You only ever hear tid-bits through journals and studies now which isn't important detailed engineering and very few daily journalistic sources can catch or even understand on to the real details which matter in engineering (they're not exactly educated to). The only thing they tend to do is feed extravagant hype 12-18 months early to suite the intended extremists, who do a good enough payroll job each time to propel things in-favor of their obsession, as the MFG intended to begin with, and then you have unintelligent corner lurking individuals react like their mother is being held hostage by one of the MFGs, so they have to wage childish trantrums on anyone who speaks ever so slightly admonishing or not-so-perfect of that particular MFG, regardless of accuracy or their knowledge limitations, be it on Intel or AMD. A sad case I wish never existed since '98 online, since we're only interested in the architectures when discussing and I know I don't favour any MFG in any product but whatever is cheap and okay for my intended tasks in the end, as most of the sane will. They just want our money. Thus it just spoils forums and usefulness in discussion.
To search for those 2 figure FO4 online will require much time I'm not able to spend right now with an intermittent broken network connection for about 7 days now, even if it does exist, however I will try and get you some mentions of those FO4 depths no doubt, specifically for Core 2 and K10h later.

Ok, it wasn't that bad actually, just scanned to approximate how hard it may be to find it and took less than 20 seconds for K10h:
K10h inverter delay mentioned [end of page 2 and start of page 3]: http://www.hypertransport.org/docs/n...a_05-16-07.pdf
This document mentions a few FO4 depths including that of Core 2 (only one I've found so far online): http://www.springerlink.com/index/q88838k207r37554.pdf
This document mentions them of many more CPUs: http://www.realworldtech.com/page.cf...1502231107&p=2
These comparative graphs also looks accurate to me judging off all the lower FO4s I know about to be correct: http://www-vlsi.stanford.edu/group/chart/cycleFO4.pdf, http://www-vlsi.stanford.edu/group/c...kFrequency.pdf, http://www-vlsi.stanford.edu/group/c...werDensity.pdf

I'll try and get some word from Intel on Penryn FO4 for you specifically and let you know the full reply by PM (you can then post it if you want, since I don't have any need to post in this thread after my first post and this to answer your enthusiastically put genuine request).

I have not been able to find that type of data readily. Now, true... P3 nor Willamette got to these clocks (your FO4s are probably correct, I have not seen the data myself), but they were also not built on a 65 nm process with a 1.09 nm gate.

Yep, exactly. My point in focus wasn't to compare clocks between any of them at all, you and I both know there are major variables which would make that inaccurate, but that FO4 doesn't dictate Frequency@TDP alone, a wide variety of features and a whole architectural design and material choice can limit and affect this greatly. If Intel CPUs can clock greatly, I would never say it is only because of one circutry factor alone, it has been like this since NetBurst which went from +16 to 8 FO4 depths (not sure of the maximum, but it was above 16 for sure and some PEs wager 6 is the lowest FO4 they had) and Core 2 has a fairly reserved FO4 above 20 to begin with yet it still can clock high, although with high TDPs at 65nm, it still is very good.

I'll quickly explain a little for the benefit of genuine and sane minded knowledge seekers. In any modern microprocessor, the slowest pipeline stage is what more than determines your maximum operatable frequency. In VHDL, the critical path delays is where the major problem for clocking arises as the delays will add up here. The biggest factors affecting a CPU regarding maximum clock frequencies at a constant FO4 delay are:
a) Microarchitecture
b) Process Variation and Accessibility
c) Logic Styles
d) Timing Overheads
e) Cell designs
f) Wiring Size
g) Floorplan and Placement

Now, even more so than these are the FO4latch (incl. clock skew and jitter delays), FO4logic and subsequently FO4pipeline delays, which designate the depth of the critical path through logic in one pipeline stage. They affect a CPU clocking frequency greatly within the desktop TDPs, as well as how much of a CPU surface can be covered in one processor cycle. The most paramount of those parameters affects the critical path lengths and critical path delays (i.e. register propogation delay). Even the subthreshold leakage, gate direct tunneling leakage, junction leakage and gate induced drain leakage affects any CPUs clocking greatly at a given transistor Vdd and Tox. Array power, latch and clock are the primary essential components of power dissipation in CPUs too, whereas modern CPUs have a power given by the formula P = Pdynamic + Pleakage, and leakage for SiO is supposed to be as much as 40% of the used power, especially as the fabrication node decreases; decreasing the threshold voltage with any transistor increases the leakage current exponentially (i.e. decreasing the threshold voltage by 100mV increases the leakage current by a factor of 10) and decreasing the length of transistors increases the leakage current as well. This again poses huge clock frequency barriers to CPUs in real-life, rather than theoretical simulations when you shift process size [more on it here].
The fan-out of four inverters metric becomes an ideal metric to compare and estimate clocking which is entirely technology scaling based only, i.e. if you keep the same architecture but just change FO4, it is bound to clock better if design/TDP does not restrict this. The ratio of a CPUs FO4 delay to the minimal signal delay for any CMOS is node independent, and you can calculate it by the formula (I don't have the required characters) Fmax ≈1/π.Trise where Trise=τFO4 (one FO4 delay). Such that, for a given technology node, FO4 13 at 65nm for a CMOS has a maxmimum theoretical limit of ~7.5GHz, while at 18nm it has a maximum theoretical frequency of ~11.5GHz. Now, this is where FO4 delay becomes paramount for CMOS, alone, all things kept constant. If you decrease the FO4 as is commonly done in engineering to find the maximum theoretical frequency of the circuitry, to one FO4, then the maximum clock frequency possible at 65nm is ~90GHz whilst at 18nm it's ~225GHz. The industry standard is to measure energy efficiency between FO4s in different CPU designs as power-performance space (some do this as Energy*Delay^2). In this respect, an electrical assessment, you will see Power6 outperform Netburst, K8, Core 2 and K10h for the efficiency.

As I mentioned, Power6 is not a desktop CPU nor does it compare to desktop CPUs in the low load desktop workloads nor in applications it is designed for and they are made to drive, but for 65nm CPUs, it sure is electrically much better engineering than K8, P4 or Core 2 is for absolute Performance/MHz/TDP. It doesn't produce the most Gigaflops and throughput between them for no reason, and all the meanwhile it stays sub 60C on air cooling while it is a circuit designed for heavy temperatures (plus 100C was the burn-in testing).

Some excellent and authoritative sources for such knowledge are: Proceedings of the Advanced Metallization Conference -2007, IEEE Transactions on Computers (i.e. Integrated Analysis of Power and Performance for Pipelined Microprocessors), IEEE Transactions on Electron Devices, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits, Inductance Calculations Working Formula and Tables- Research Triangle Park, Inductance Calculations in a Complex Circuit Environment - IBM J. Res. Develop. and so on.

**NH|Delph1** · 03-21-2008, 07:47 AM

I love your long replies, but I think most people could use a summary at the end

//Andreas

**villa1n** · 03-21-2008, 12:47 PM

Originally Posted by KTE

Jack, I really don't want to get into details since its too lengthy for the time I have sidelined to post on a rumor thread, but for the sake of you and other level-headed enthusiasts, heregoes

They are details I've known from Intel and AMD themselves directly, thus more authoritative than online quotes I provide, apart from Penryn Core 2 Extreme where I was only told that FO4 delay is reduced from Core 2 65nm (drop them an email or ask David/s at RW, they should know as ISSCC and IEEE 2006/7 Conferences did make brief mention of them). The fact that either MFG doesn't like releasing such vital data online since the P4 days, means we're not going to find much online on it since the little documentation that does exist does not cover either architectural engineering in such depth for competitive reasons, until it's old. You only ever hear tid-bits through journals and studies now which isn't important detailed engineering and very few daily journalistic sources can catch or even understand on to the real details which matter in engineering (they're not exactly educated to). The only thing they tend to do is feed extravagant hype 12-18 months early to suite the intended extremists, who do a good enough payroll job each time to propel things in-favor of their obsession, as the MFG intended to begin with, and then you have unintelligent corner lurking individuals react like their mother is being held hostage by one of the MFGs, so they have to wage childish trantrums on anyone who speaks ever so slightly admonishing or not-so-perfect of that particular MFG, regardless of accuracy or their knowledge limitations, be it on Intel or AMD. A sad case I wish never existed since '98 online, since we're only interested in the architectures when discussing and I know I don't favour any MFG in any product but whatever is cheap and okay for my intended tasks in the end, as most of the sane will. They just want our money. Thus it just spoils forums and usefulness in discussion.
To search for those 2 figure FO4 online will require much time I'm not able to spend right now with an intermittent broken network connection for about 7 days now, even if it does exist, however I will try and get you some mentions of those FO4 depths no doubt, specifically for Core 2 and K10h later.

Ok, it wasn't that bad actually, just scanned to approximate how hard it may be to find it and took less than 20 seconds for K10h:
K10h inverter delay mentioned [end of page 2 and start of page 3]: http://www.hypertransport.org/docs/n...a_05-16-07.pdf
This document mentions a few FO4 depths including that of Core 2 (only one I've found so far online): http://www.springerlink.com/index/q88838k207r37554.pdf
This document mentions them of many more CPUs: http://www.realworldtech.com/page.cf...1502231107&p=2
These comparative graphs also looks accurate to me judging off all the lower FO4s I know about to be correct: http://www-vlsi.stanford.edu/group/chart/cycleFO4.pdf, http://www-vlsi.stanford.edu/group/c...kFrequency.pdf, http://www-vlsi.stanford.edu/group/c...werDensity.pdf

I'll try and get some word from Intel on Penryn FO4 for you specifically and let you know the full reply by PM (you can then post it if you want, since I don't have any need to post in this thread after my first post and this to answer your enthusiastically put genuine request).
Yep, exactly. My point in focus wasn't to compare clocks between any of them at all, you and I both know there are major variables which would make that inaccurate, but that FO4 doesn't dictate Frequency@TDP alone, a wide variety of features and a whole architectural design and material choice can limit and affect this greatly. If Intel CPUs can clock greatly, I would never say it is only because of one circutry factor alone, it has been like this since NetBurst which went from +16 to 8 FO4 depths (not sure of the maximum, but it was above 16 for sure and some PEs wager 6 is the lowest FO4 they had) and Core 2 has a fairly reserved FO4 above 20 to begin with yet it still can clock high, although with high TDPs at 65nm, it still is very good.

I'll quickly explain a little for the benefit of genuine and sane minded knowledge seekers. In any modern microprocessor, the slowest pipeline stage is what more than determines your maximum operatable frequency. In VHDL, the critical path delays is where the major problem for clocking arises as the delays will add up here. The biggest factors affecting a CPU regarding maximum clock frequencies at a constant FO4 delay are:
a) Microarchitecture
b) Process Variation and Accessibility
c) Logic Styles
d) Timing Overheads
e) Cell designs
f) Wiring Size
g) Floorplan and Placement

Now, even more so than these are the FO4latch (incl. clock skew and jitter delays), FO4logic and subsequently FO4pipeline delays, which designate the depth of the critical path through logic in one pipeline stage. They affect a CPU clocking frequency greatly within the desktop TDPs, as well as how much of a CPU surface can be covered in one processor cycle. The most paramount of those parameters affects the critical path lengths and critical path delays (i.e. register propogation delay). Even the subthreshold leakage, gate direct tunneling leakage, junction leakage and gate induced drain leakage affects any CPUs clocking greatly at a given transistor Vdd and Tox. Array power, latch and clock are the primary essential components of power dissipation in CPUs too, whereas modern CPUs have a power given by the formula P = Pdynamic + Pleakage, and leakage for SiO is supposed to be as much as 40% of the used power, especially as the fabrication node decreases; decreasing the threshold voltage with any transistor increases the leakage current exponentially (i.e. decreasing the threshold voltage by 100mV increases the leakage current by a factor of 10) and decreasing the length of transistors increases the leakage current as well. This again poses huge clock frequency barriers to CPUs in real-life, rather than theoretical simulations when you shift process size [more on it here].
The fan-out of four inverters metric becomes an ideal metric to compare and estimate clocking which is entirely technology scaling based only, i.e. if you keep the same architecture but just change FO4, it is bound to clock better if design/TDP does not restrict this. The ratio of a CPUs FO4 delay to the minimal signal delay for any CMOS is node independent, and you can calculate it by the formula (I don't have the required characters) Fmax ≈1/π.Trise where Trise=τFO4 (one FO4 delay). Such that, for a given technology node, FO4 13 at 65nm for a CMOS has a maxmimum theoretical limit of ~7.5GHz, while at 18nm it has a maximum theoretical frequency of ~11.5GHz. Now, this is where FO4 delay becomes paramount for CMOS, alone, all things kept constant. If you decrease the FO4 as is commonly done in engineering to find the maximum theoretical frequency of the circuitry, to one FO4, then the maximum clock frequency possible at 65nm is ~90GHz whilst at 18nm it's ~225GHz. The industry standard is to measure energy efficiency between FO4s in different CPU designs as power-performance space (some do this as Energy*Delay^2). In this respect, an electrical assessment, you will see Power6 outperform Netburst, K8, Core 2 and K10h for the efficiency.

As I mentioned, Power6 is not a desktop CPU nor does it compare to desktop CPUs in the low load desktop workloads nor in applications it is designed for and they are made to drive, but for 65nm CPUs, it sure is electrically much better engineering than K8, P4 or Core 2 is for absolute Performance/MHz/TDP. It doesn't produce the most Gigaflops and throughput between them for no reason, and all the meanwhile it stays sub 60C on air cooling while it is a circuit designed for heavy temperatures (plus 100C was the burn-in testing).

Some excellent and authoritative sources for such knowledge are: Proceedings of the Advanced Metallization Conference -2007, IEEE Transactions on Computers (i.e. Integrated Analysis of Power and Performance for Pipelined Microprocessors), IEEE Transactions on Electron Devices, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits, Inductance Calculations Working Formula and Tables- Research Triangle Park, Inductance Calculations in a Complex Circuit Environment - IBM J. Res. Develop. and so on.

To be serious, you ve filled in some gaps in knowledge i had regarding the actual effect and limitation f04 and process size has on theoretical max speed.

**JumpingJack** · 03-21-2008, 04:18 PM

KTE -- Thanks for the link, and great post.

Look how shallow the P4's were, the long pipeline did it's job there

Another interesting paper on Clock Scaling vs IPC (projection written about 2000 or so): http://www.cs.utexas.edu/ftp/pub/dbu...ers/ISCA00.pdf (table 2)

Jack

Thread: Intel Details Nehalem uArch Improvements - 256KB L2, 8MB L3 Confirmed

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions