Well that makes about as much sense..
HT3 enabled CPU, HT3 chipset = HT1 platform?
Printable View
Well that makes about as much sense..
HT3 enabled CPU, HT3 chipset = HT1 platform?
but motherboard manufacture clearly state Phenom platform got HT 3.0
http://www.bit-tech.net/news_images/...0-733-news.jpg
It matters for this situation when textures are swapped from main memory to graphics memory and when CPU sends updated frame information for the scene rendering... but even then, current gen and older games do not consume enough HT BW to push it to the limits and bog down. If the graphics card is memory rich, it becomes a non-factor.
I have seen perhaps 4 or 5 FPS hit at 1x over 5x.... albiet I have only tested a few games (quake 4, FEAR, HL2: lost coast).
Other than that, HT is not part of the equation for computational output in a single socket, it servers IO only for HDs, keyboard, mouse, graphics card...
I don't understand. Not "super linear" increase, but "more efficient" at higher clocks, huh?Quote:
I am not saying that increases in the CPU core frequency yield a linear performance gain greater than 1:1. My example as stated was crude and too simple, in this case the cubic inches, compression ratio, and gearing are static, the supercharger simply lets the engine perform more efficiently (from a power output perspective) by taking greater advantage of the fuel/air intake mixture (plus aggressive timings) it has available as RPMs rise.
Probably a very bad example, but I was trying to make the point that the changes in the architecture of this processor and the new chipsets (HT 3.0, etc) do not provide any advantages (in most cases) over the current platforms in performance until the core clock speed increases and we start to notice that around 2.4GHz (see below for other reasons at this time). I think I have said this several times since Computex, AMD desperately needs to get the core speeds on this processor architecture improved (above 2.4GHz or so, privately a few people at AMD agree) for it to be really competitive and to take full advantage of their processor/platform improvements.
I do not think AMD ever intended or even believed this CPU would launch at the speeds it will (1.8~2.0, possibly 2.2 in Q4) as the processor simply does not perform as efficiently as it should (appears capable of) based upon the architecture changes. A lot of the early information we had was that Barcelona would launch in the 2.2~2.4 range and then scale quickly, with a potential to 4GHz in the end. The early performance expectations and claims of performance improvements over current platforms were based on simulations at 2.4~2.6GHz and then scaling upwards. The CPU was designed with these speeds and above in mind, it simply is too slow right now not to mention several core improvements have been flipped on/off or just are not as efficient as they should be in early testing.
At least with the early samples we have seen, there are improvements against current processors on a clock for clock basis as the core speed improves, this does not mean a linear performance gain that is greater than 1:1, it simply means the chip is operating more efficiently as the core speed improves. There could be a wide variety of reasons for this as we have seen dramatic changes in the platform performance almost week to week as new steppings, chipet revisions, and BIOS code were changed. We have seen HT not working or set at 1.0, 2.0, 3.0 specifications depending upon core speed and chipset, secondary caches turned off or even gated based upon core speed (L3 cache and L2 prefetchers as late as July), floating-point instructions flipped on or off, out of order execution of load algorithms flipping from conservative to aggressive and back depending upon core speed, and even translation lookaside buffers being tinkered with during this time not too mention a dozen other changes.
Also remember that the DRAM controller is now split into two separate 64-bit controllers. Each controller can be operated independently by the chipset and there can be some significant improvements in efficiency, especially where the individual cores are working on independent threads and each have their own memory access patterns, yet another area where core speeds could create variable results. Added to this is the fact that the data prefetcher now brings data directly into the low latency L1 data cache, as opposed to the L2 cache in the K8. K10 also increased the ability of its L1 instruction cache prefetcher to handle two outstanding requests to any address. These two areas plus the new DRAM prefetcher on the revised memory controller are the control mechanisms that we have noticed having the greatest impact on performance, especially with the increase in core speed. It is also the area that believe has been most "tinkered" with during the prototype and pre-production phases. We have noticed the processors only needing DDR2-667 in June to really being responsive with DDR2-1066 as the core speeds have increased along with the other improvement/additions to the processor, BIOS, and chipsets.
When I said that certain features were "idle" in some cases, this is what I was talking about. Until we see production level silicon and final BIOS code, it is extremely difficult to determine what is occurring inside Barcelona/Phenom and what is not on a clock for clock basis. Throw into that mix, a whole new generation of chipsets (ie...RD790) that take further advantage of these changes and you have a situation that is very fluid as the initial performance results will be on older HT 2.0 chipsets that are designed for the enterprise environment. There is not a consumer level board available that is tuned for this processor series yet, trying to use it on one is like using a QX6850 on a VIA PT880, yeah it works, but look at the results.
That is why we do not want to guesstimate the performance or even provide tangible numbers until we have had a chance to test released product. For whatever reason, in the early tests, the processor operated more efficiently as the core speed increased, we will find out shortly why it did. I hope this helps and if I could speak in greater detail, I would, but September 10th is getting close. Like I said in my previous message, some people will be happy, some will not, and most will realize that certain hype does not directly translate into expected performance improvements, not until we see some speed (counting on this). In the end, this processor lays the groundwork for what comes next, sort of like how the Core Series did for the Core 2 (imho).
Edited: 09/01/2007 at 11:36 AM by Gary Key
What he is trying to say is that the current platform bottlenecks the K8 CPU (i.e. HT1.0, IMC, chipset etc), until once K10 hit 2.4 GHz, where the throughput would normally bottleneck a K8 platform, then that bottleneck no longer exists and observed performance will be better than the current... he does not realize what he is saying I suspect -- one could infer from this logic that IPC really did not improve much at all, and AMD designed around the concept that BW is the major performance limiting culprit....Quote:
Probably a very bad example, but I was trying to make the point that the changes in the architecture of this processor and the new chipsets (HT 3.0, etc) do not provide any advantages (in most cases) over the current platforms in performance until the core clock speed increases and we start to notice that around 2.4GHz (see below for other reasons at this time). I think I have said this several times since Computex, AMD desperately needs to get the core speeds on this processor architecture improved (above 2.4GHz or so, privately a few people at AMD agree) for it to be really competitive and to take full advantage of their processor/platform improvements.
This is rubbish for DT relate work, unless you are running several instances of a high throughput algorithm that takes up all the memory BW.
In dual socket/server apps... this may very well be true... I have not seen any data that suggests this is the case but neither have I seen any data the conclusively suggest that it isn't.
AMD spend an about 1/2, or more, of their development effort and transistor budget into bandwidth -- which I found odd, because BW was not what was holding them up....
Dont know if this has been posted but here goes,
http://forums.vr-zone.com/showthread.php?t=182403
What Gary is trying to say(only in a few technical terms) is that the chips are not final,nor the BIOSes.Everything they had was not the representative of final shipping silicon since ,as he said,they saw week to week improvements as they got new samples with new boards/bioses.This tells a lot about what kind of EVTs and bioses were involved in the whole 'testing" process ,and this won't change until the Sept. 10th
After reading the whole thread i come to one conclusion.
The cpu sux and will need to get to 4ghz to compete.
When they reach 4ghz (years) its outdated and sux.
Dude ... you are saying the chips are not final, this is fair as it is not clear... but Gary is clearly saying somthing other than this when he states, it needs higher clock to release the potential....
Quote:
Probably a very bad example, but I was trying to make the point that the changes in the architecture of this processor and the new chipsets (HT 3.0, etc) do not provide any advantages (in most cases) over the current platforms in performance until the core clock speed increases and we start to notice that around 2.4GHz (see below for other reasons at this time).
At first he said this:
What is this saying.... ??? It seems to me that he thinks it scales better after 2.4 GHz... this is just ludicrous. Higher clock => higher performance, this is true... but, what he is implying is something else -- IPC at 2.0 GHz, call it X and IPC at 2.4 GHz call it y... he is saying past 2.4 GHz Y>X, this is simply not true.Quote:
The one caveat that I will add, this chip really does not get into a groove until you get over 2.4GHz and then it scales incredibly well. Also, the first RD790 boards we have will undergo another spin so any Phenom results with those boards are subject to interpretation depending on whether you like AMD or not.
This is old, this video showed up during AMD's July Technology Analyst day... it was paired with the one slide of SPEC2006_FP results that they showed in their presentation.
On that note, Barcey will be a good HPC CPU, the interconnect backbone really speeds things along.... and the FPU has always been strong on the AMD core.
No he clarified himself later.He meant as if the chip was at 80% of it's efficiency at 2Ghz(due to many constraints,some he mentioned to be ES releated-IMC,HT speeds etc.;some are BIOS related).
After chip gets all of its parts above 2.4GHz(IMC and L3),the rest of the core starts to act the way it was designed(optimal throughput in the appropriate sections)
And all of this was with EVTs and early BIOSes.He also said they saw week to week improvement with new ES and boards(this tells a lot about the samples they had and about the bios support)
The chip wasn't working as it should at 2Ghz...It doesn't scale better than linearly,that is impossible.It just works as it was designed at 2.4Ghz since the IMC freq. and L3 latency are at the place where they make the rest of the core most efficient.
PS You never saw A64 at say 2.8Ghz perform worse with higher latency(cas 5) and lower frequency memory ,than same CPU with low latency and high freq. memory?This is not exactly the same comparison since K10 is much much improved design,but it gives you an idea,that although the IPC stayed the same,chip efficiency was lower than in the second case scenario(say that IPC of second case was a baseline)
Latency for caches (L3, etc.) are measured in cycles and are constant (in terms of cycles) irrespective of GHz frequency.
Plus, like I figured you would, your K8 example doesn't hold water b/c I specifically said "assuming all else equal" - all else means just that. RAM latency, etc. Your example specifically changes one of those variables.
There is no good, logical reason that anyone can provide of why K10 would disproportionately outperform at 2.4GHz vs. 2.0GHz.
And they say conroe is fsb limited?
L3 latency is variable in K10 and NOT constant.You should know better.
Second ,all above relates to the degree of the maturity of boards themselves(BIOS level) and says nothing about the way chip actually performs.AMD handed over some number of EVTs and that's all.Nothing conclusive can be derived from the coolaler forum tests.
well good thing the retail is coming in a few days so an end to the super speculation on pre released chips can come into the light.
It still makes no sense.... clock scaling is linear in the absense of bottlenecks... to put it another way, it is not possible to increase clock speed 20% and realize a 30% increase in performance.... his statements are contrary to the way a digital circuit would work.
The argument that there are still tweaks (bug fixes, or work arounds in BIOS) to bring online some gains is reaonable, but to say the 'chip turns on scaling at 2.4 GHz' is ludicrous.