:) Yeah!
Printable View
334 > 200
& not ?
200 = 200
gosh, said you can't clock AMD's L3 cache thats not true
Me and charged3800z24 both have gotten our 2.4ghz NB which is L3 cache speed.
http://www.xtremesystems.org/forums/...1&d=1219028090
And here is some extra data for you. Seems like Crysis doesn't scale so well with CF at all. CPU @ 3.6GHz and single HD4870 at 780/1000.
Outstanding job, thank you!
Yeah, I have stopped messin' around with Crysis at the moment and went to FEAR.... and all I can say is WOWSER. At 1900x1200, everything maxed to the hilt, and it is CPU limited.
It is pointless to run this card < 1900x1200, absolutely pointless. FEAR is old, I know, but still a gorgeous game... amazing to see it 1900x1200 cranked to the max, max AA, max AF everything max.
Ok, about to post FEAR data ... of course, in the world according to Gosh FEAR is single threaded so Phenom should do poorly.
FEAR on the 4870 X2 Phenom vs QX9650 using the 4870 X2 cards......
I did several experiments, I am posting the results for 3 of them. One at my normal stock baseline settings (that I do all my clock for clock studies on), i.e. both processors at 2.5 Ghz and DDR2-800. I then overclock both processors to 3.0 Ghz and set the memory to DDR2-1067. I know in FEAR at low res on a 8800 GTX that the difference between DDR2-800 and DDR2-1067 can have a substantial impact on the Phenom performance, thus I am also posting a 2.5 Ghz DDR2-800 vs DDR2-1067 for phenom to show the impact.
Bus speeds are not changed in any experiments, all OC is done via the multiplier (hence the reason I only buy unlocked multi CPUs)
Fear does not have a window mode so i cannot capture a CPUID for validation, you will need to take my word for it.
Phenom @ 2.5 GHz DDR2-800 max =253 min =37 Ave = 96
http://forum.xcpus.com/gallery/d/743...00_ALL+MAX.JPG
QX9650 @ 2.5 GHz DDR2-800 max =393 min =40 Ave =134
http://forum.xcpus.com/gallery/d/741...200_ALLMAX.JPG
About 40% (EDIT: ooops, miscalculated in my head, had to change the old 35% to the correct value) faster clock for clock at high res, max everything. Note, the min framerate is the same -- meaning both CPUs will give you the same quality gameplay.
Next experiment, OC to 3.0 GHz
Phenom @ 3.0 GHz DDR2-1067 max = 305 min= 48 ave = 113
http://forum.xcpus.com/gallery/d/741..._DDR2_1067.JPG
QX9650 @ 3.0 GHz DDR2-1067 max =466 min =40 Ave = 159
http://forum.xcpus.com/gallery/d/742..._DDR2-1067.JPG
On average 40% faster clock for clock....
Finally, here is Phenom at DDR2-1067 but 2.5 Ghz to compare above to see the impact of faster memory:
Phenom @ 2.5 Ghz DDR2-1067 Max = 273 Min = 37 Ave = 102
http://forum.xcpus.com/gallery/d/741..._DDR2_1067.JPG
Ok, there you have it. Both scale the response with clock speed the same. Oh, and incase you are wondering ... yes, I can reproduce the Techpower up FEAR number within a few FPS using the same CPU clock speed he used.
Jack
Wow... QX9650 just totally spanked FEAR alive. Max 466... seriously?
But... I'm still seeing something weird here. Seems like the QX9650 always has that 2% between 25 to 40fps no matter what. What exactly happened in there? Maybe a FRAPS measure would give us a better picture. Maybe some spots in the FEAR benchmark was squeezing FSB bandwidth.
P.S.: And you are welcome about the data. :)
Not quite understanding your question... could you clarify... however, I will do a fraps run if you tell me exactly which one you would like to see. EDIT: Ohhh, I see you are looking at the % bin statistics... good question ... Hold on I will FRAPS the 2.5 Ghz DDR2-800 runs again.
EDIT: It is obviously very clear that the drivers that shipped with the card, and the drivers given to reviewers were two different realities. Some reviewers I would appear skipped the 'press-drivers' and installed the shipped drivers -- Extremetech and Tom's came away with bad impressions of the 4870 X2 for example, and my numbers are matching theirs most closely in general. I suspect in a month or two, this card is just gonna get better and better.
jack
Uhm... basically, I'm wondering why on the QX9650, there was always 2% of the time that fps was between 25 to 40. If you look at the screenshots again, on the QX9650, although fps is sky rocketting, 2% is always between 25 to 40. On the Phenom system, upon overclocking to 3GHz, it's always over 40. And minimum was at 45.
yeah, i figured that out... I am FRAPing them now.
It happens as you pass through the glass in the door at the end of the perf run. FRAPs samples about once per second, it is not capturing the highest or lowest. It would take several runs and a lot of luck to actually capture the event that produces the max and min.
This is a FRAPs plot of the 2.5 GHz DDR2-800 runs for both CPUs....
http://forum.xcpus.com/gallery/d/743...9850vs9650.JPG
Could be something flunking in the background, too. I have that a lot with my old IDE HDD. Actually, it's flunking so much on my system that my 3DMark Vantage scores vary in the range of 1000...
next up.... quake wars enemy territory.
Ok... so for Quake Wars Enemy Territory, I utilize the HOCBenchmark utility, which can be downloaded here. Because I keep historical records, I am using an older version of QWET as well.
You can download the benchmark utility here: www.hocbench.com (as of the time of this post, their server appears down).
I setup the HOC benchmark to run 3 runs at 1024x768, 1280x1024, and 1900x1200. The quality settings are set to high, and I am using 16x AF and 4x AA in the HOC utility. I am also using the Quarry script/scene. The output comes in the form of an HTML file, no screen dumps of the actual game.
Phenom @ 2.5 GHz DDR2 - 800 output:
esolution: 1024×768
Score = 150 FPS
Score = 153 FPS
Score = 154 FPS
Average score = 152 FPS
Resolution: 1280×1024
Score = 150 FPS
Score = 150 FPS
Score = 152 FPS
Average score = 150 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 147 FPS
Score = 147 FPS
Score = 147 FPS
Average score = 147 FPS
=================================
QX9650 @ 2.5 GHz DDR2 - 800 output:
Resolution: 1024×768
Score = 175 FPS
Score = 178 FPS
Score = 175 FPS
Average score = 176 FPS
Resolution: 1600×1200
Score = 171 FPS
Score = 171 FPS
Score = 170 FPS
Average score = 170 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 165 FPS
Score = 164 FPS
Score = 166 FPS
Average score = 165 FPS
The output files are attached.
EDIT: Darnit my bad... the QX9650 instead of 1280x1024, I ran 1600x1200... rerunning the 1280x1024 to add that info.... sorry.
EDIT2: Here is the QX9650@2.5GHz DDR2-800 run at 1280x1024
Resolution: 1280×1024
Score = 176 FPS
Score = 176 FPS
Score = 171 FPS
Average score = 174 FPS
The attachment has been updated with that run as well.
BTW -- QWET uses all 4 cores.
1600 x 1200 or not, even the 1920 x 1200 scores are beating Phenom to pulp, even when Phenom is at 1024 x 768. I think we pretty much have a conclusion.
Not done yet...
http://forum.xcpus.com/gallery/d/7437-2/DESKTOP.JPG
(View to see the taskbar menu slide ups for installed software)
This is what is currently installed... the Phenom has exactly, program for program, the exact installation. So there is still a lot of comparing to be done :)
EDIT: I am gonna stick on QWET for a moment, running again on the QX9650 at 200 Mhz (800 Mhz FSB) speed, @ 2.5 Ghz ....
Well, color me purple, check this out.... QWET at 200 Mhz system clock, 800 MHz FSB..... (all the above runs were done at 333 Mhz or 1333 MHz FSB) EDIT: NOTE -- memory divider was not changed, this run is at DDR2-400 Mhz :) .... my bad.
Resolution: 1024×768
Score = 147 FPS
Score = 140 FPS
Score = 147 FPS
Average score = 144 FPS
Resolution: 1280×1024
Score = 144 FPS
Score = 141 FPS
Score = 139 FPS
Average score = 141 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 137 FPS
Score = 140 FPS
Score = 142 FPS
Average score = 139 FPS
Let's see what we get when we go to 1600 MHz FSB.
EDIT: OOOPS my bad... this was 200 MHz (800 MHz FSB), but I forgot to up the memory divider.... I am rerunning now at 200 Mhz FSB + DDR2-800 instead of DDR2-400. This may prove informative since I mentioned above that the GPU can go straight to system memory in one hop.
Output for DDR2-800 ...
Resolution: 1024×768
Score = 158 FPS
Score = 163 FPS
Score = 160 FPS
Average score = 160 FPS
Resolution: 1280×1024
Score = 157 FPS
Score = 157 FPS
Score = 160 FPS
Average score = 158 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 155 FPS
Score = 156 FPS
Score = 156 FPS
Average score = 155 FPS
So in this case a 40% decrease in FSB BW translates into a 7-10% hit in FPS.
Ok.... last one for the day... Half-Life2-Lost Coast, I will do episode one and two a bit later I suspect. I had to run it windowed so I could screen grab with a shot of CPUID. The window that HL2 creates is always on top, so after the bench I had to move it slightly off screen in order to reveal the CPUID window.
My standard baseline, Phenom@2.5 Ghz DDR2-800 4870 X2 1920x1200, max everything including AA and AF.
http://forum.xcpus.com/gallery/d/744..._1920x1200.JPG
QX9650 @2.5 Ghz DDR2-800 4870 X2 1920x1200, max everything including AA and AF.
http://forum.xcpus.com/gallery/d/744..._1920x1200.JPG
From my personal opinion C2Q has no problem with bandwidth itself and never had. The problem with the fsb is its latency. Increasing or decreasing FSB has just an effect in terms of bandwidth but latency stays always the same. You can easy check this with low level Benchmarks.
In other words FSB is not a big problem in terms of bandwidth but there is a physical way on the PCB the Data must go through. Wheter the Data has to go through PCB layers decide in comparision to K10 and Nehalem, whether the coherenc latency is μs or ns. This is a very significant factor, which allows K10 and Nehalem better scaling vs Core2Q.
So what an effect does this have in real life conditions. In a good case the prefetcher can offset this physical latency, the data is already in the L2 Cache and can be calculated (in fact in this case there is no latency) in a worse case the prefetcher works inefficient and the physical latency of the FSB results in poor performance. And exactly thats when K10 outperforms an Inte Core2Quad.
I excuse my bad english, if there are any question feel free to ask.
Give me an Intel Rig and i sure will ;). And again the FSB does not limit bandwidth wise, it all depends on the prefetcher and how well its actually working.
"Wrong" Data in the Cache (simplyfied) -> accessing memory -> high latency -> bad coherency -> bad performance ;)Quote:
Anandtech:
This is the test that actually screws the whole thing for Intel. It turns out that CBALLS2 calls a function in the Microsoft C Runtime Library (msvcrt.dll) that, when combined with Vista SP1, can magnify the Core architecture's performance penalty when accessing data that is not aligned with cache line boundaries.
jack im impressed with your data, have you ever thought about making your own review site?
This site would be my number one, since your comparisons are top notch. :up:
Yes, especially for processor to memory... latency across the bus to other parts, such as SB IO or even Graphics card is not a huge issue because even with the longer latency, the timing of the IO and graphics card overwhelms any latency on the bus.
This is where the large cache and aggressive prefetches work well, only in a few cases can you see this really be a problem