Jack, I
really don't want to get into details since its too lengthy for the time I have sidelined to post on a rumor thread, but for the sake of you and other level-headed enthusiasts, heregoes
They are details I've known from Intel and AMD themselves directly, thus more authoritative than online quotes I provide, apart from Penryn Core 2 Extreme where I was only told that FO4 delay is reduced from Core 2 65nm (drop them an email or ask David/s at RW, they should know as ISSCC and IEEE 2006/7 Conferences did make brief mention of them). The fact that either MFG doesn't like releasing such vital data online since the P4 days, means we're not going to find much online on it since the little documentation that does exist does not cover either architectural engineering in such depth for competitive reasons, until it's old. You only ever hear tid-bits through journals and studies now which isn't important detailed engineering and very few daily journalistic sources can catch or even understand on to the real details which matter in engineering (they're not exactly educated to). The only thing they tend to do is feed extravagant hype 12-18 months early to suite the intended extremists, who do a good enough payroll job each time to propel things in-favor of their obsession, as the MFG intended to begin with, and then you have unintelligent corner lurking individuals react like their mother is being held hostage by one of the MFGs, so they have to wage childish trantrums on anyone who speaks ever so slightly admonishing or not-so-perfect of that particular MFG, regardless of accuracy or their knowledge limitations, be it on Intel or AMD. A sad case I wish never existed since '98 online, since we're only interested in the architectures when discussing and I know I don't favour any MFG in any product but whatever is cheap and okay for my intended tasks in the end, as most of the sane will. They just want our money. Thus it just spoils forums and usefulness in discussion.
To search for those 2 figure FO4 online will require much time I'm not able to spend right now with an intermittent broken network connection for about 7 days now, even if it does exist, however I will try and get you some mentions of those FO4 depths no doubt, specifically for Core 2 and K10h later.
Ok, it wasn't that bad actually, just scanned to approximate how hard it may be to find it and took less than 20 seconds for K10h:
K10h inverter delay mentioned
[end of page 2 and start of page 3]:
http://www.hypertransport.org/docs/n...a_05-16-07.pdf
This document mentions a few FO4 depths including that of Core 2
(only one I've found so far online):
http://www.springerlink.com/index/q88838k207r37554.pdf
This document mentions them of many more CPUs:
http://www.realworldtech.com/page.cf...1502231107&p=2
These comparative graphs also looks accurate to me judging off all the lower FO4s I know about to be correct:
http://www-vlsi.stanford.edu/group/chart/cycleFO4.pdf,
http://www-vlsi.stanford.edu/group/c...kFrequency.pdf,
http://www-vlsi.stanford.edu/group/c...werDensity.pdf
I'll try and get some word from Intel on Penryn FO4 for you specifically and let you know the full reply by PM (you can then post it if you want, since I don't have any need to post in this thread after my first post and this to answer your enthusiastically put genuine request).
Yep, exactly. My point in focus wasn't to compare clocks between any of them at all, you and I both know there are major variables which would make that inaccurate, but that FO4 doesn't dictate Frequency@TDP alone, a wide variety of features and a whole architectural design and material choice can limit and affect this greatly. If Intel CPUs can clock greatly, I would never say it is only because of one circutry factor alone, it has been like this since NetBurst which went from +16 to 8 FO4 depths (not sure of the maximum, but it was above 16 for sure and some PEs wager 6 is the lowest FO4 they had) and Core 2 has a fairly reserved FO4 above 20 to begin with yet it still can clock high, although with high TDPs at 65nm, it still is very good.
I'll quickly explain a little for the benefit of genuine and sane minded knowledge seekers. In any modern microprocessor, the slowest pipeline stage is what more than determines your maximum operatable frequency. In VHDL, the critical path delays is where the major problem for clocking arises as the delays will add up here. The biggest factors affecting a CPU regarding maximum clock frequencies at a constant FO4 delay are:
a) Microarchitecture
b) Process Variation and Accessibility
c) Logic Styles
d) Timing Overheads
e) Cell designs
f) Wiring Size
g) Floorplan and Placement
Now, even more so than these are the FO4
latch (incl. clock skew and jitter delays), FO4
logic and subsequently FO4
pipeline delays, which designate the depth of the critical path through logic in one pipeline stage. They affect a CPU clocking frequency greatly within the desktop TDPs, as well as how much of a CPU surface can be covered in one processor cycle. The most paramount of those parameters affects the critical path lengths and critical path delays (i.e. register propogation delay). Even the subthreshold leakage, gate direct tunneling leakage, junction leakage and gate induced drain leakage affects any CPUs clocking greatly at a given transistor V
dd and T
ox. Array power, latch and clock are the primary essential components of power dissipation in CPUs too, whereas modern CPUs have a power given by the formula P = P
dynamic + P
leakage, and leakage for SiO is supposed to be as much as 40% of the used power, especially as the fabrication node decreases; decreasing the threshold voltage with any transistor increases the leakage current exponentially (i.e. decreasing the threshold voltage by 100mV increases the leakage current by a factor of 10) and decreasing the length of transistors increases the leakage current as well. This again poses huge clock frequency barriers to CPUs in real-life, rather than theoretical simulations when you shift process size
[more on it here].
The fan-out of four inverters metric becomes an ideal metric to compare and estimate clocking which is entirely technology scaling based only, i.e. if you keep the same architecture but just change FO4, it is bound to clock better if design/TDP does not restrict this. The ratio of a CPUs FO4 delay to the minimal signal delay for any CMOS is node independent, and you can calculate it by the formula (I don't have the required characters) Fmax ≈1/π.Trise where Trise=τFO4 (one FO4 delay). Such that, for a given technology node, FO4 13 at 65nm for a CMOS has a maxmimum theoretical limit of ~7.5GHz, while at 18nm it has a maximum theoretical frequency of ~11.5GHz. Now, this is where FO4 delay becomes paramount for CMOS, alone, all things kept constant. If you decrease the FO4 as is commonly done in engineering to find the maximum theoretical frequency of the circuitry, to one FO4, then the maximum clock frequency possible at 65nm is ~90GHz whilst at 18nm it's ~225GHz. The industry standard is to measure energy efficiency between FO4s in different CPU designs as power-performance space (some do this as Energy*Delay^2). In this respect, an electrical assessment, you will see Power6 outperform Netburst, K8, Core 2 and K10h for the efficiency.
As I mentioned, Power6 is not a desktop CPU nor does it compare to desktop CPUs in the low load desktop workloads nor in applications it is designed for and they are made to drive, but for 65nm CPUs, it sure is electrically much better engineering than K8, P4 or Core 2 is for absolute Performance/MHz/TDP. It doesn't produce the most Gigaflops and throughput between them for no reason, and all the meanwhile it stays sub 60C on air cooling while it is a circuit designed for heavy temperatures (plus 100C was the burn-in testing).
Some excellent and authoritative sources for such knowledge are: Proceedings of the Advanced Metallization Conference -2007, IEEE Transactions on Computers (i.e. Integrated Analysis of Power and Performance for Pipelined Microprocessors), IEEE Transactions on Electron Devices, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits, Inductance Calculations Working Formula and Tables- Research Triangle Park, Inductance Calculations in a Complex Circuit Environment - IBM J. Res. Develop. and so on.
Bookmarks