PDA

View Full Version : Phenom offers more than 100% clockscaling?!



Jakko
01-08-2008, 03:33 AM
All credits to KTE for pointing this out, I think it deserved its own thread because I really wonder what's going on here:


Guys, check this out in an official review (trustworthy).

I spotted something odd in CB10 review by Xbit-Labs. I've seen many oddities in a few reviews around, but not the time to spend on them nor discussing them but I'll point this out briefly. All NB, RAM, HT frequencies remain the same below, only the CPU speed/multi changed for higher speeds. Thus, pure CPU MHz scaling theoretically (attached image).

From 2.2GHz to 2.6GHz, 400MHz increase there is a 1087 CB difference.
From 2.6GHz to 3.0GHz, 400MHz increase, there is a 1480 CB difference.

:confused:

Is this what Gary of AnandTech and the rest of the reviewers talked about with needing plus 2.8GHz Phenom to do well?

To put it clearer;

From 2.2->2.6GHz, there's a +18.18% clock change.
And from 7114->8201 CB, there's a +15.28% performance change.

BUT

From 2.6GHz->3.0GHz, there's a +15.38% clock change.
And from 8201->9681, there's a +18.05% performance change.

Which is obviously above what is usually possible, i.e., above 100% clock scaling. Don't know which numbers I can trust here, but I'll see what I can get with my BE and if those numbers are correct. I can verify the 9500, 9600, 9700, 9900 numbers are correct though from my own runs (although I had higher NB speed/HT). ;)

http://www.xtremesystems.org/forums/showpost.php?p=2681946&postcount=534

Who has any idea?
I do remember anand from anandtech say something about the phenom really starting to "kick in" around 2.6 Ghz, but noone really knew what he ment because more than 100% clockscaling is not possible according to everyone.
People then assumed he was talking about excellent multicore scaling.

Maybe he did mean better than 100% clockscaling?
:confused:

LightSpeed
01-08-2008, 04:39 AM
ive really want to know how phenom scales past 3g, i have a feeling it indeed is phenomenal but its just sad as to the way these clock

another interesting review was about multicore gaming performance. i think the phenom @ 2.4 did better than the qx9770 stock in ut3, which is heavily multithreaded. too tired to link, head over to lostcircuits review

Andi64
01-08-2008, 05:13 AM
But:
From 2.2Ghz to 3.0Ghz = 36.36% Core Clock increase.
From 7114 to 9681 = 36.08% Performance increase.

We do not know if another ~15% increase in clock at 3.0Ghz will give more than 15% performance increase, it would be nice.

Jakko
01-08-2008, 06:25 AM
At what speed does the memory work at different clockspeeds?

Ace123
01-08-2008, 10:27 AM
Great find.
Amd can do nothing now but get the freakin clockspeeds up on these things

Terwin
01-08-2008, 10:54 AM
The confusion I'm getting is there are a bunch of new bios settings in the 790FX mobos that have so far confused me enough I can only hit mid 2.6ghz range stable. People are glossing over the specific settings used to get 3+ghz. I would sure appreciate if they elaborated on them specifically. I'm using the 9500 and the 9600BE. At this rate I'm gonna dropkick the 790FX in favor of the nVidia AM2+ board coming soon.

KTE
01-08-2008, 11:50 AM
No biggy. My quick explanation: http://www.xtremesystems.org/forums/showpost.php?p=2685061&postcount=7


But:
From 2.2Ghz to 3.0Ghz = 36.36% Core Clock increase.
From 7114 to 9681 = 36.08% Performance increase.True, because where you aggregate high and low values, it neutralizes out the high and low extremes, so you miss the data we're pointing out here. Even a 100MHz core clock speed increase cannot theoretically produce higher than 100% performance scaling since the core doesn't have more than 100% it's potential. ;)

Hence why we know, something else is going on underneath affecting performance "efficiency" at various clock speeds.

There was low clock scaling from 2.2GHz to 2.6GHz. But as soon as you topped 2.6GHz the performance scaling was better than expected for the clock speeds. I saw this too, as CB showed a x3.78-3.80 speedup, until it hit 2.64GHz where it showed a x3.82 speedup. But then again, I could not keep RAM/HT/NB the same as they did because I didn't have unlocked multi's, so my tests were not accurately applicable.


The confusion I'm getting is there are a bunch of new bios settings in the 790FX mobos that have so far confused me enough I can only hit mid 2.6ghz range stable. People are glossing over the specific settings used to get 3+ghz. I would sure appreciate if they elaborated on them specifically. I'm using the 9500 and the 9600BE. At this rate I'm gonna dropkick the 790FX in favor of the nVidia AM2+ board coming soon.
Quite perplexing, don't you reckon Brad. :p:

Depends what you're going for TBH. If you want to try for 3GHz, try 200x15 with the BE (BIOS settings below).

Change CPU VID to 1.45
Change CPU VCore to 1.4V
Either drop NB speed using the multi or change NB VID to ~1.35.
HT is better to keep 100MHz below NB IME.

See if you get 3GHz stable at 200MHz HT ref. ;)

If not try 1x multi lower.

Manicdan
01-08-2008, 12:53 PM
remember the preview by some guy who i think worked for AMD and he compared the phenom to an engine. why cant we consider the change from 2.6 to 3.0ghz as geting past turbo lag in a car.

i know i dont know nearly as much about cpus as some of you guys. but i dont see why its not possible for phenom to be built to run at 3.0-4.0ghz and anything below just dosnt give it the lower latency to work at its max potential.

Ace123
01-08-2008, 03:06 PM
well the only problem is, Cpus dont work on a positive induction system like a car.


Cpus baisically run on the premesis of how much work is done per clock cycle.
Which is why this is weird.

Somehow, these amd chips are virtually doing MORE work per clock cycle once the 2.6ghz threshold is crossed.

Its very unlikely AMD sneaked in special code that would tell the processor it was ok to do more work @ given speed.

Jakalwarrior
01-08-2008, 05:31 PM
The only way I can think of that it could work is if some internal part was on a higher multiplier than the CPU so its going up in bigger steps. Possibly with a bottleneck involved.
Im sure more data and more time will sort this out though.

The only way you could compare it to a turbo would be if the mutliplier was stuck on 7 until you got the HT up high enough to spin it to 10.

JumpingJack
01-08-2008, 11:10 PM
I would hold off making a 'super scaling' argument based on the cinbench data .... if you plot the cinbench score against the clock speed, it would appear that the 2.6 GHz score is anomolous and does not fit the trend. This is not the first time I have seen xbit make a graphing/math mistake or perhaps the unusually low score for the 9900 is real and there is some weird artifact, nonetheless, using the Xbit data:

http://img256.imageshack.us/img256/9118/xbitcpubphenomscallingza3.jpg

The 3.0 GHz point appears to be sane, but the 2.6 GHz (9900) is off, way to low... as such, doing a simple two point extrapolation is probably incorrect.

All the other scores are falling in linearly to rolling off slighty, for example:

Here we see a roll off near the top....
http://img529.imageshack.us/img529/7176/xbitxvidbphenomscallingyd7.jpg


3DsMax is scaling very well, darn near linear:
http://img529.imageshack.us/img529/1977/xbit3dsmaxbphenomscallicw6.jpg



(iTunes is the exception, the 3.0 Ghz is slightly higher than linear)
Jack

KTE
01-09-2008, 12:09 AM
Yep, there can't be "superscaling" just like you can't output 1000hp on the road out of a 800hp dyno tested motor. The results aren't anything different for what you get real life (below 2.66GHz I can confirm the results), so those would give an accurate indication of the processor performance rather than a user error. Above 2.7GHz, I've not ran CB to know the results. However, like I said, running any better efficiency at one clock than another is easily possible and if that happens you get +100% scaling figures when comparing to a lower clock speed, since the runs lower down were inefficient. Just like how Super Pi guys can get faster times at same clocks/settings than other guys, usually 1-2seconds faster. There are also small fluctuations in clocks regularly and these can influence end results between points. Hence why many benchmarks are not consistent to the point, they only give "around about" repeatable scores. 3DMark is one of them, your results can vary 60-70 marks either side for each field easily. Cinebench multi results vary much less usually. Nevertheless, I need practical data before I guess more and reviews is not the place I'll be getting it. Too many of such results can be seen around hence why I complained. 84xx to 82xx is no where near the discrepency other reviews have shown for Phenom, too many have shown 500 marks below what end users get in the tests to account them as fully accurate and start basing statistics through them.
Also, performance scaling is never usually linear. That's very hard to find throughout clock ranges, but linear based on what? Ones own starting data?
This is quite inaccurate to base any complete judgments off still, since my first 3 runs can be very efficient for the clock speed, my second one be inefficient and my thrid efficient again in the first attempts and that would give me a weird looking unexplainable chart again. Same scenarrio is present if you flip things around. What matters is, how the performance is scaling at each of the clocks we will get retailed or oc'd and which software is showing lower results than expected (making others seem higher than expcted).

JumpingJack
01-09-2008, 12:55 AM
Yep, there can't be "superscaling" just like you can't output 1000hp on the road out of a 800hp dyno tested motor.

Just a quick comment, I did not read through whole post...

I am not so quick to make that conclusion just yet but looking at the xbit data there is something funny going on (assuming no typo), which makes me wonder what is happening at 2.6 Ghz ?? that is going so low in that case, certainly there is a 'core count' superscaling that I have seen happen --

Of course this is on barcelona, and it seems odd that you can get a speed up > core count, seems to contradict Amdahl's Law.... but here is the data:
http://classic.chem.msu.su/gran/gamess/barcelona.html

KTE
01-09-2008, 02:07 AM
Cinebench 2.64GHz at 1.92GHz NB/HT, around 480MHz 4-4-4-4 1T RAM gives ~84xx in XP Pro 32-bit. 2.16GHz NB/HT gives around 846x maximum.

I think it's the way you look at it all Jack and there are a few ways. We can either tackle the findings and explain them based on what is happening with software and the core at a basic level as far as we know and it's limitations to cause odd performance figures, or you can make absolute conclusions based solely on the limited data we have at present.

One way to look at it would be to say every result is the peak core performance at X speed. In that case, those results above would look to contradict the generally accepted law.

But another way to look at it is, compiler inefficiency or software data feed bottelnecks to the core (starved bandwidth for whatever reason, even L3 inefficiency with low core count) which will produce lower scores at 1 core/X MHz when the bandwidth is not fully saturated and higher than normally expected scores at 2 cores/X MHz when there is a greater share of data being fed to each core.

It isn't just Barcelona that goes over 100% scaling there BTW, Clovertown/Harpertown (Test 3) goes over 100% too (although not as frequently).

I can't really say more, they are just my preliminary opinions. More correctly, my thoughts out loud. I'll try asking those in the know for anything.

BadNizze
01-09-2008, 03:52 AM
I noteced that my 9500 @ 250*11 scores 4k in 3dmark06 CPu test right? My Ulocked 9600BE scores 3,8k @ 14*200... Can it be that Agena is bus hungry like K7?

nemrod
01-09-2008, 08:23 AM
True, because where you aggregate high and low values, it neutralizes out the high and low extremes, so you miss the data we're pointing out here. Even a 100MHz core clock speed increase cannot theoretically produce higher than 100% performance scaling since the core doesn't have more than 100% it's potential. ;)

7114/2,2 3233
7453/2,3 3240
7769/2,4 3237
8201/2,6 3154
9681/3 3227

In all case you have 3234 +- 0.2% except for the 2.6Ghz cpu which perform worst. :shrug: It's a quick conclusion to take this bad value as main comparison in order to extract some magical gain factor. Indeed those 2.6GHz value is bad and that's all.

KTE
01-09-2008, 09:30 AM
In all case you have 3234 +- 0.2% except for the 2.6Ghz cpu which perform worst. :shrug: It's a quick conclusion to take this bad value as main comparison in order to extract some magical gain factor. Indeed those 2.6GHz value is bad and that's all.That's one way already presented to look at it, but this method avoids dealing or explaining what we're experiencing here. Calling a value "bad" just because it doesn't fit a trend doesn't explain why it's occurring which is what we want to know here. Even I can say that much, it's very easy to. :)
See the other benchmarks already shown (2-3 different ones) which see higher than expected scaling and not just this one. The above value is not bad, the value is perfectly as you would find in a real life situation as I did. Hence, there is more to it than just labeling the value wrong. I'm more interested in an explanation for why it occurs at 2.6GHz since I have the results so I know that's what I find when I replicate the CB test at those settings 10x repeatedly, i.e. not an user error.

WeStSiDePLaYa
01-09-2008, 10:09 AM
To the people who say it is impossible.


It is not. CPU's are not strictly linear. The extra speed at a point may ease a bottleneck in the CPU.

Take for example heavily bandwidth limited GFX. You can raise the mem 10%, but see increase of 15% or more, because it is easing a bottleneck.

AliG
01-09-2008, 10:20 AM
I think the reason for that is due to the l3 cache, the latency issue isn't as bad at the higher clocks, and it bottlenecks the cpu at the lower clocks, so what you see is the bottleneck starting to disappear at the 2.8ghz mark or so, so the performance below that will be affected by the bottleneck more so than above, thus what seems to be more than 100% scaling

informal
01-09-2008, 11:13 AM
Hi guys :).
Simply put,there is no superscaling going on with K10.The fact is that at lower clocks K10 suffers from low Nortbridge clock and higher L3 latencies.At higher clocks in mulithreaded scenarios ,the shared L3 is doing great with data sharing and doesn't suffer from high latencies as before.
So in single thread apps +low clocks we see a sub par execution due to nb and L3 clock/latency.In MT apps at >2.6Ghz clocks we see the K10 at its full potential(minus maybe some non highly publicized errata(NB related) that is fixed by BIOS patch;i remember that 2 more,apart from the TLB one, were posted here at XS by one new member,will try to find that post;both were NB related )

Manicdan
01-09-2008, 11:13 AM
the true way to test this is by waiting for one thats stable at like 3.2 and just downclock it and see how linear it really is.

and btw i remember with my 939 cpu, a 10% increase in cpu speed netted 16% increase in aquamark3 score, (combined not cpu or graphics alone) and i tested it multiple times and got the same results.

Jakalwarrior
01-09-2008, 11:55 AM
Unless you had an unlocked 939 it was probably memory related. If you didnt use a memory divider then that would cover it, if you did change dividers to keep the speed the same then maybe it just liked that divider better (hidden internal timings linked to it etc...). If everything on the memory remained the same though, then that would be a little enigma.

BadNizze
01-09-2008, 12:30 PM
Hi guys :).
Simply put,there is no superscaling going on with K10.The fact is that at lower clocks K10 suffers from low Nortbridge clock and higher L3 latencies.At higher clocks in mulithreaded scenarios ,the shared L3 is doing great with data sharing and doesn't suffer from high latencies as before.
So in single thread apps +low clocks we see a sub par execution due to nb and L3 clock/latency.In MT apps at >2.6Ghz clocks we see the K10 at its full potential(minus maybe some non highly publicized errata(NB related) that is fixed by BIOS patch;i remember that 2 more,apart from the TLB one, were posted here at XS by one new member,will try to find that post;both were NB related )

I agree, I see no gains in running high NB speeds over 2,6-2,7ghz not in 3Dmark any way. To me it locks like K10 needs 2,6ghz+ to shine, AMD new this from the start, witch i way they planed 2,6ghz+ part from the start. BUT they found the now famous TLB bug and could not get good gains to release faster chips then 2,2-2,3.... K10 was never meant to compete with Conroe, but with jawhawk and other netburst based CPU´s. Then came C2D. My GUESS is that AMD would have like to release K10 without L3 @ 3ghz or soo. HTT @ 2600mhz and NB running same speed as CPU. This was not possible with the added L3, AMD hade no more monney or more time to fix the issues and whent ahead and relesed a half done CPU... That´s my 2 cents anyway... I may be Crazy but something does not fit in my book!

nemrod
01-09-2008, 03:08 PM
That's one way already presented to look at it, but this method avoids dealing or explaining what we're experiencing here. Calling a value "bad" just because it doesn't fit a trend doesn't explain why it's occurring which is what we want to know here. Even I can say that much, it's very easy to. :)
See the other benchmarks already shown (2-3 different ones) which see higher than expected scaling and not just this one. The above value is not bad, the value is perfectly as you would find in a real life situation as I did. Hence, there is more to it than just labeling the value wrong. I'm more interested in an explanation for why it occurs at 2.6GHz since I have the results so I know that's what I find when I replicate the CB test at those settings 10x repeatedly, i.e. not an user error.

Sure but, the most important point is that at 2.6GHz, K10 performs bad. As the performance before this point and after this point scale perfectly. In other words. Some use this "bad value" to see promise of wonderful performance at higher frequency but this is just the cpu doing something wrong at 2.6. So knowing why could be interessting but other speculation like the title of this thread "Phenom offers more than 100% clockscaling?!" is only a misinterpretation of results.

Brother Esau
01-10-2008, 01:46 AM
Speaking of K10 has AMD updated and released the Source Code yet for this little shin dig going on?

Jakko
01-10-2008, 02:34 AM
So it IS possible for a CPU to have better than 100% clockscaling?

For example, if whatever is going on at 2.6 ghz, is also going on to a lesser degree for every frequency below, say 3.2 Ghz, then @ 3.2 Ghz you will start to see better than 100% scaling, right?

Of course unlikely, but possible...
We still haven't found out why 2.6 performance is so low.

malvindo
01-10-2008, 03:47 AM
So it IS possible for a CPU to have better than 100% clockscaling?

For example, if whatever is going on at 2.6 ghz, is also going on to a lesser degree for every frequency below, say 3.2 Ghz, then @ 3.2 Ghz you will start to see better than 100% scaling, right?

Of course unlikely, but possible...
We still haven't found out why 2.6 performance is so low.

wha, i still hard to believe that cpu can better than 100% scalling

since we all not had the data, i rather believe WeStSiDePLaYa opinion
maybe @ some point the situation was bottleneck
but untill somebody can explain this still interesting mystery

nice thread

kl0012
01-10-2008, 04:37 AM
For some number of independent threads/tasks the superlinear scaling is possible, imho. The reason is reduced number of task switches. A task switch is relatively long operations which executes every 100 ms (or something like this) using Round-Robin scheduling. For example, if you have two task runing in parallel on different cores then CPUs spends 0 time on task switches, whereas two task running on the same CPU will force CPU to spend some (not necessarily minor) time on task switches.

Devil's Prophet
01-11-2008, 06:03 AM
IMHO you must compare the differences between the original clock and the highest obtained clock. As someone pointed out, the performance gain is completely lineair with the clockspeed gain.

The numbers in the OP show that the cpu OC'd to 2.6GHz isn't performing at its maximum level. That should be the reason why the OC from 2.6 to 3.0GHz offers more performance gain than it should gain from the increase in clockspeed. BUT that is IF you consider the performance at 2.6GHz maxed out. Which it isn't, and is clearly shown in the results for the OC from 2.2 to 2.6GHz.

But now that I typed this, I'm beginning to think that I'm only stating the obvious. So, nevermind me if you already figured this one out. ;)

malvindo
02-08-2008, 09:37 AM
IMHO you must compare the differences between the original clock and the highest obtained clock. As someone pointed out, the performance gain is completely lineair with the clockspeed gain.

The numbers in the OP show that the cpu OC'd to 2.6GHz isn't performing at its maximum level. That should be the reason why the OC from 2.6 to 3.0GHz offers more performance gain than it should gain from the increase in clockspeed. BUT that is IF you consider the performance at 2.6GHz maxed out. Which it isn't, and is clearly shown in the results for the OC from 2.2 to 2.6GHz.

But now that I typed this, I'm beginning to think that I'm only stating the obvious. So, nevermind me if you already figured this one out. ;)


CMIIW,
u r trying to say that below 2.6Ghz Phenom architecture bottleneck?

hopefully people can push Phenom to 3.4Ghz so we can configure the scalling ;)