I am not saying that increases in the CPU core frequency yield a linear performance gain greater than 1:1. My example as stated was crude and too simple, in this case the cubic inches, compression ratio, and gearing are static, the supercharger simply lets the engine perform more efficiently (from a power output perspective) by taking greater advantage of the fuel/air intake mixture (plus aggressive timings) it has available as RPMs rise.

Probably a very bad example, but I was trying to make the point that the changes in the architecture of this processor and the new chipsets (HT 3.0, etc) do not provide any advantages (in most cases) over the current platforms in performance until the core clock speed increases and we start to notice that around 2.4GHz (see below for other reasons at this time). I think I have said this several times since Computex, AMD desperately needs to get the core speeds on this processor architecture improved (above 2.4GHz or so, privately a few people at AMD agree) for it to be really competitive and to take full advantage of their processor/platform improvements.

I do not think AMD ever intended or even believed this CPU would launch at the speeds it will (1.8~2.0, possibly 2.2 in Q4) as the processor simply does not perform as efficiently as it should (appears capable of) based upon the architecture changes. A lot of the early information we had was that Barcelona would launch in the 2.2~2.4 range and then scale quickly, with a potential to 4GHz in the end. The early performance expectations and claims of performance improvements over current platforms were based on simulations at 2.4~2.6GHz and then scaling upwards. The CPU was designed with these speeds and above in mind, it simply is too slow right now not to mention several core improvements have been flipped on/off or just are not as efficient as they should be in early testing.

At least with the early samples we have seen, there are improvements against current processors on a clock for clock basis as the core speed improves, this does not mean a linear performance gain that is greater than 1:1, it simply means the chip is operating more efficiently as the core speed improves. There could be a wide variety of reasons for this as we have seen dramatic changes in the platform performance almost week to week as new steppings, chipet revisions, and BIOS code were changed. We have seen HT not working or set at 1.0, 2.0, 3.0 specifications depending upon core speed and chipset, secondary caches turned off or even gated based upon core speed (L3 cache and L2 prefetchers as late as July), floating-point instructions flipped on or off, out of order execution of load algorithms flipping from conservative to aggressive and back depending upon core speed, and even translation lookaside buffers being tinkered with during this time not too mention a dozen other changes.

Also remember that the DRAM controller is now split into two separate 64-bit controllers. Each controller can be operated independently by the chipset and there can be some significant improvements in efficiency, especially where the individual cores are working on independent threads and each have their own memory access patterns, yet another area where core speeds could create variable results. Added to this is the fact that the data prefetcher now brings data directly into the low latency L1 data cache, as opposed to the L2 cache in the K8. K10 also increased the ability of its L1 instruction cache prefetcher to handle two outstanding requests to any address. These two areas plus the new DRAM prefetcher on the revised memory controller are the control mechanisms that we have noticed having the greatest impact on performance, especially with the increase in core speed. It is also the area that believe has been most "tinkered" with during the prototype and pre-production phases. We have noticed the processors only needing DDR2-667 in June to really being responsive with DDR2-1066 as the core speeds have increased along with the other improvement/additions to the processor, BIOS, and chipsets.

When I said that certain features were "idle" in some cases, this is what I was talking about. Until we see production level silicon and final BIOS code, it is extremely difficult to determine what is occurring inside Barcelona/Phenom and what is not on a clock for clock basis. Throw into that mix, a whole new generation of chipsets (ie...RD790) that take further advantage of these changes and you have a situation that is very fluid as the initial performance results will be on older HT 2.0 chipsets that are designed for the enterprise environment. There is not a consumer level board available that is tuned for this processor series yet, trying to use it on one is like using a QX6850 on a VIA PT880, yeah it works, but look at the results.

That is why we do not want to guesstimate the performance or even provide tangible numbers until we have had a chance to test released product. For whatever reason, in the early tests, the processor operated more efficiently as the core speed increased, we will find out shortly why it did. I hope this helps and if I could speak in greater detail, I would, but September 10th is getting close. Like I said in my previous message, some people will be happy, some will not, and most will realize that certain hype does not directly translate into expected performance improvements, not until we see some speed (counting on this). In the end, this processor lays the groundwork for what comes next, sort of like how the Core Series did for the Core 2 (imho).

Edited: 09/01/2007 at 11:36 AM by Gary Key
I don't understand. Not "super linear" increase, but "more efficient" at higher clocks, huh?