gosh, you posts are just a bunch of excuses for why AMD fails to match or beat Intel.
Printable View
gosh, you posts are just a bunch of excuses for why AMD fails to match or beat Intel.
Well I can tell you that even I was suprised when I saw results from World in Conflict that Jack did. I was starting to wounder if he had forgotten to disable the TLB bug. The difference was so big that it can’t even be explained by differences between how the processors work. You need to optimize for one processor to get that big difference so I looked it up. And found it, Intel has been helping them…
I know it’s hard for a non programmer to understand. But if one game scales to large threads and don’t share memory and don’t syncronize threads then it works well on intel. It’s like running separate single threaded applications that has some point where they joins work and then they go back to work again. This is also easier to do for programmers but it isn’t effective if you really want to use all the power that the processor has. I did show you a link bout the render split design before.
I wouldn’t be surprised if the did use intel’s compiler on the World in Conflict also.
http://yro.slashdot.org/comments.pl?...3&cid=13042922
http://aceshardware.freeforums.org/c...iler-t428.html
Again with the programmer excuse. Seriously, pal, no disrespect intended, but knowing how to code doesn't mean anything at all. Just because you are a programmer does not mean you know more than others. And you are the one who suddenly come out with Phenom winning against even Q9450 while hundreds of others find Phenom not even measuring up against Q6600 in most cases.
If you are so knowledgeable, would you mind explaining why Intel processors do so much better than AMD processors with PCSX2? (In the same manner you have been "explaining" things, that is)
Oh geez the coding again, anyways if it where a coding issue whats the point in buying a phenom rig for gaming considering everything is better optimized to run on Intel anyways if thats the case.
Hmmmm, so basically what we have is build Intel and get better performance in a desktop environment say 95% of the time due to universally optimized coding or buy Phenom for better bandwidth but is slower 95% of the time at the same task due to lack of optimized code.
Because AMD will run all that Intel runs without problems too, and if you add applications doing normal work AMD will like it more.
You can also be certain of that Intel is doing what they can to change how developers create applications. Nehalem/Larrabee etc is thread monsters (more than phenom), you need to scale well in order to use all the power in those processors. If ray tracing will arrive then you probably also going to need one processor that is good at scaling to threads.
The memcpy Intel seems to do with their compiler is really ugly, it is one of the most common functions used in DirectX games (maybe game developers know this if they use intel compiler and write their own function)
Does not matter. Regardless, games were made for Intel processor, and they run better on Intel as they were written for Intel (you are implying this) so end of story.
You just literally admitted to Intel being better at handling game, sir.
Every type of game. Unless you are telling me you know precisely how each game works even without looking at their source code?
:brick:
I read this entire thread...no amount of factual information is going to get gosh to finally come around. He will continue on his path of circular reasoning until you guys flare out...and then he will claim victory.
Is this a competition?
Do you feel better if I say that you win?
This discussion is about processors. I think all knows the future, it's about threading. Applications will shift from single threaded to multithreaded more and more. When you are going to gain on using a quad depends on what type of applications he/she is using today. Most of us will do very well on duals.
The main argument in gaming when threaded games are used for AMD hasn’t been talked about as much here. It’s about bottlenecks. We mostly talks about which can get highest avg FPS.
What really sad is that reality isn’t shown in tests and that favours Intel when all these tests are done. Intel will be stronger and stronger, if AMD was a bit more late with the 48xx serie than they might had been in SERIOUS trouble. If there isn’t any AMD then we all loose and all shareholders at Intel will be happy.
victory in the intellectual sense...:brick:
There are over a hundred links on Google to show otherwise. And there are more links and data in this thread that say otherwise, too.
Plus you did not answer my question. Why does Intel Core 2 work better with PCSX2 than AMD Phenom? We DO have the PCSX2 source code (the latest SVN is 384), so you can peep in it for technical details right there. Care to elaborate based on your expertise in programming?
Have been asking for those links before, can you show me?
Show me the code and then I will be able to answer. I don't know how they have coded that application and I think that even you will understand that is impossible for anyone to explain why.
I can give you some clues tough. It’s difficult to do threading; it is probably very difficult emulating threads from other code effectively. That program probably want raw processor power (high Hz) and fast cache.
:) :) Well I will never break you of your paranoia but look at this last line... the thread their updates per physics, shadow volumes (fog), particles, and tree .... hmmmm, after the update... that get's sent to the GPU. Frame by frame, each thread works in parallel to complete a update to the frame which is then sent to the GPU ....
Thanks for the link...
Jack
:) ... I am using B3 silicon, it does not have the TLB bug and turning the patch on or off via BIOS or AOD makes no difference.
Here is a comprehensive effect of the TLB patch on /off on a 2.3 GHz 9600 BE http://forum.xcpus.com/motherboard-c...om-menace.html
I haven't generated the charts yet, but for the all low res/low detail (CPU bound condition) at 1280x1024
pre Patch: Max = 139 Ave = 61 Min = 30
post Patch: Max = 111 Ave = 51 Min = 28
Unlike some of the silly runs here just to answer questions, the data in the article above is usually run 3 times for reproducibility, averaged, and reported as an average, the data I quote above is from the first run.
I can post screen shots if you like.
Computationally, the TLB patch is a huge hit -- especially on games, but considering that games page most of their memory and are so branchy, it is not surprising. However, I have also never observed a TLB bug manifest itself, it was an unfortunate PR disaster for AMD, but running a B2 (TLB bug) processor is perfectly stable over the 20 or so games I tested on it. I would have no reservations recommending someone take a B2 CPU if they so desired, the TLB bug was blown way way out of proportion.
Jack
To Gosh:
Phenom 9850
http://www.sharkyextreme.com/hardwar...261_3737046__9
http://www.techspot.com/review/93-am...on/page10.html
http://www.tbreak.com/reviews/articl...0&pagenumber=3
http://www.bit-tech.net/hardware/200...9550_b3_cpus/7
http://www.bit-tech.net/hardware/200...9550_b3_cpus/8
http://hothardware.com/Articles/AMD-...vision/?page=7
http://www.extremetech.com/article2/...2282402,00.asp
(The following has 3870X2 data)
http://www.firingsquad.com/hardware/...view/page5.asp
http://www.firingsquad.com/hardware/...view/page6.asp
http://www.firingsquad.com/hardware/...view/page7.asp
http://www.firingsquad.com/hardware/...iew/page10.asp
(The following has TLB and without TLB data)
http://techreport.com/articles.x/14424/4
http://techreport.com/articles.x/14424/5
Phenom 9900
http://www.legitreviews.com/article/597/8/
http://www.legitreviews.com/article/597/9/
I would post more but I am really tired of copying and pasting... there are just too many. And most of them show that between 1280 x 1024 to 1600 x 1200 for instance, Intel leads. And Phenom occasionally breaks through in one or two specific games (F.E.A.R., Company of Heroes) but Phenom can't even measure up to Q6600/Q6700 in the rest of the test. In the Q9300 showdown, Phenom only had an edge with CoH, and mind you, E8500 still won over Phenom in that test at high res.
And PCSX2 source code can be found at http://www.pcsx2.net
Does it hurt to actually do the Googling yourself every once in a while?
ooops... weird, double post.
:) This is because of the configuration -- it throttles at the GPU :) .... you just can't get that :) and even observing the data, watching faster GPUs relieve that bottle neck and go to higher FPS doesn't convince you. This is just weird.
It's ok Gosh -- nobody believes you, this is why you get the gruff you get. You're really a nice guy overall. Bottom line, it does not matter -- if running at high resolution -- who is better. Even if the GPU is the limiter and AMD scores 4 FPS higher by some forum random postings, the game play will be the same on either.
Unfortunately, if you bought a Phenom with this misconception on a 8800GTX card, thinking you would be able to exploit the latest round of cards ... you have wasted your money on a GPU upgrade, the Phenom will just not feed it fast enough (not because of lack of BW, but because it takes the Phenom longer to complete the computational cycle).
The X2's came today, I should have some data by tonight....
All the data produced in this thread are games that are multithreaded across all 4 cores...
Lost Planet, GRID, World in Conflict utilize all 4 cores. Gosh -- clock for clock, Intel produces a computational result faster than AMD does single threaded, dual threaded or quadruple threaded... there is no question.
Does this make the Phenom a bad CPU? No. Does it require AMD to price it accordingly? Yes. This is why AMD is losing billions, they cannot make a profit and charge 200 bucks or less for a quad core processor, while trying to support the low end with dual cores that are even slower than the competition.
True enough. I get about 20fps in Lost Planet with 8x AA and 16x AF, but the game looks pretty, and still runs mirror smooth. Same goes for Crysis. It just doesn't matter which one runs better. 1 or 2fps isn't that important.
Grats! Start spanking something? I'm expecting lots of graphss and lots of paragraphs in a separate thread. :up:Quote:
The X2's came today, I should have some data by tonight....
very interesting to see :)
is this done with phenom (or is it fake maybe)?
http://www.xtremesystems.org/forums/...d.php?t=197648
http://www.xtremesystems.org/forums/...1&d=1218151935
I partly agree with you ;), there is at least one szenario where Phenom will outperform an Intel2Q just due superior architecture. Excuse me for not adding some information about gpu bound discussionen.
I just quote myself
At least for a game such a szenario will be very rare just because of the fact most of the games have one "main" thread for gpu rendering and some help threads for the other stuff (KI...) . For an Intel thats very great the prefetcher can "always" allocat the Data to the right core. From my personal technological view C2D is the best DualCore Processor but i terms of QuadCore processors K10 is the better (and so will be Nehalem)Quote:
That is, in principle, the huge advantage of the C2D. It loads the data early enough to perform efficiently, so the application can count immediately, without accessing the memory .
This principle can get problems with SMT and crashes with ongoing data transfer (depending of the type of data transfer , its important how much data has to be calculated). In terms of streaming Data K10 wins, in terms of ex. video encoding there is more calculating and the C2D takes advantage of its huge L2Cache an the mighty prefetcher in the background.
Now here comes the problem: with SMT and 4 (or more) Threads the prefetcher has problems finding out which Data is needed an works partially effective. Does the prefetcher well the result is very good and L2 and prefetcher works great. But when something unexpected happens a very long memory access is needed and the whole structure collapses while k10 still can handle the coherency due to the shared L3 Cache and even if the Data is not in the L3 Cache K10 can loads the Data 3 times faster then C2Q. Again : this will happen only with massive SMT .
If you read through the forums, the link that generated this data came from chiphell...
This is the CPU that generated it:
http://www.chiphell.com/?action-view...mid-211-page-3
Q6600, not Phenom.
EDIT: Oooops, then I read lower in the thread and this is not the link. Hold on let me search some more.
EDIT2: What CPU, I see no mention of phenom in there. And I don't know if it is fake because Chiphell does not have a 4870X2 review posted.
Jack
The CPU that did that was a QX9650 at 4GHz.
Why? Cuz...
http://bbs.chiphell.com/viewthread.p...extra=page%3D1
And cuz...
http://bbs.chiphell.com/viewthread.p...extra=page%3D1
Well... cuz the data matches, more or less.
3DMark06
Phenom@2.5 Ghz 4870X2 1680x1050 2xA 16xAF
http://forum.xcpus.com/gallery/d/736...70X2_hires.JPG
Phenom@2.5 Ghz 4870X2 Default Settings
http://forum.xcpus.com/gallery/d/736...X2_default.JPG
QX9650@2.5 Ghz 4870X2 1680x1050 2xA 16xAF
http://forum.xcpus.com/gallery/d/736...70X2_HIRES.JPG
QX9650@2.5 Ghz 4870X2 default
http://forum.xcpus.com/gallery/d/737...X2_default.JPG
Scores are low partly due to clock speed, partly due to me not optimizing. Catalyst is set to default as installed.
...Quote:
".. an ounce of honest data is worth a pound of marketing hype."
Comon, gosh is just gonna argue that the Phenom lost because the 3dmark SM2 and SM3 tests are mostly singlethreaded and that the CPU test is unfair.
I remember when I bought into the whole Phenom is 40% faster than C2Q... :rolleyes:Quote:
".. an ounce of honest data is worth a pound of marketing hype."
mostly single threaded, yes indeed.
phenom could be much more compelling in a multi-threaded universe.
anyway; it seems close anyway, based on the limited info i have read :D
it is single threaded apps (and software generally) that are limiting uptake of quad core cpu's in my opinion.
So yesterday, I produced World in Conflict bench runs, trying to match at least the game settings. This is what I had from yesterday, just copied over from the the other page:
Here is 1024x768 run on QX9650 @ 2.67 G 4870 X2Quote:
QX9650 @ 2.67 GHz (333x8) DDR2-1067 8800 GTX in order of resolution
http://forum.xcpus.com/gallery/d/728...C_SETTINGS.JPGhttp://forum.xcpus.com/gallery/d/729...C_SETTINGS.JPGhttp://forum.xcpus.com/gallery/d/728...C_SETTINGS.JPG
Phenom 9850 @ 2.5 GHz (200x7.5) DDR2-800 8800 GTX
http://forum.xcpus.com/gallery/d/730...C_SETTINGS.JPGhttp://forum.xcpus.com/gallery/d/730...C_SETTINGS.JPGhttp://forum.xcpus.com/gallery/d/730...C_SETTINGS.JPG
Phenom 9850 @ 2.5 Ghz (200x7.5) DDR2-1067 8800 GTX
http://forum.xcpus.com/gallery/d/732...C_SETTINGS.JPGhttp://forum.xcpus.com/gallery/d/732...C_SETTINGS.JPGhttp://forum.xcpus.com/gallery/d/732...C_SETTINGS.JPG
QX9650 @ 2.67 GHz
1024x768. Max = 123 Ave = 50 Min = 21
1280x1024 Max = 104 Ave = 47 Min = 22
1650x1080 Max = 95 Ave = 46 Min = 21
Phenom @ 2.5 Ghz DDR2-800
1024x768. Max = 69 Ave = 27 Min = 10
1280x1024 Max = 69 Ave = 27 Min = 10 (this is odd, exactly the same)
1650x1080 Max =70 Ave = 28 Min = 10
Phenom @ 2.5 GHz DDR-1067
1024x768. Max = 67 Ave = 28 Min = 11
1280x1024 Max = 68 Ave = 27 Min = 9
1650x1080 Max =72 Ave = 29 Min = 9
http://forum.xcpus.com/gallery/d/737...C_SETTINGS.JPG
Here are the same runs on the 4870 X2
Phenom 9850@2.5G DDR2-800 4870X2 1024x768 0xAA 16xAF
http://forum.xcpus.com/gallery/d/737...C_SETTINGS.JPG
QX9650@2.67G OCC Settings Max = 106 Ave = 49 min = 17
Phenom@2.5G OCC Settings Max = 78 Ave = 33 min = 11
Hardly did anything at all going to a new card -- hmmmmmm weird.
Ok, next the fun one....
Here is my settings, meant to stress the CPU and take the GPU out of the equation.... with a 8800 GTX I got these results:
Here is today's new 4870 results (mind you, in this experiment I match the clock speeds of the processors).Quote:
QX9650 @ 2.5 GHz, DDR2-800 (in this case, all my baseline data is there)
http://forum.xcpus.com/gallery/d/732...80X1024_r1.JPG
Phenom 9850 @ 2.5 GHz, DDR2-800
http://forum.xcpus.com/gallery/d/733...80x1024_R1.JPG
Phenom@2.5 1280x1024
http://forum.xcpus.com/gallery/d/738...ighphysics.JPG
QX9650 2.5G 1280x1024
http://forum.xcpus.com/gallery/d/738...s_SETTINGS.JPG
To summarize
Phenom 9850 @ 2.5G 1280x1024
...8800 GTX...max =148 ave =61 min =28
...4870 X2.....max =154 ave =68 min = 32
QX9650 @ 2.5G 1280x1024
...8800 GTX...max =302 ave =121 min =53
...4870 X2.....max =234 ave =101 min = 48
The phenom gained about 10% on avereage, but the X2 lost. I will need to check if there is something wrong. But my suspicion is that there is some more work for the ATI programmers in order to get this better on WIC.
For a more real test Jack, could you test WiC @ 1680x1050 4xAA 8xAF and all the game settings @ Full ? ( except DX10 if you're running XP )
No I don't!
I have tried to explain in this thread.
Here is some of AMD strong points (what I think) in gaming and why those will show on more complex games. I am NOT a game programmer (that type of programming is very boring) but have some knowledge about DirectX (very little so I might be wrong here).
If a game has live graphics (faces that can show feelings, wind that can move trees etc) then processor need to calculate the picture. This will of course increase the burden for the processor but the cache used for that type of calculation is probably similar in AMD and Intel. They calculate points in a 3d space and pack them in memory blocks. When the points for one block has been calculated it s transferred to the video card. This is done by allocating video memory (I think that memory on the video card is some how mapped to one address space for the computer memory area). There are special commands for this. When the block has been allocated the calculated points are copied to the video card, maybe they use memcpy in C++ for this. When the points has been copied some sort of command is used that can work on those points. Video card communication has high latency so there can’t be too much requests between the video card and the processor, the bandwidth is high so if you can pack more data in blocks you will gain speed.
AMD is using hypertransport for GPU data. If this copying of memory travels using hypertransport these blocks will not hold back any memory transfers because memory transfers are done using the IMC. On Intel all these copying of points to the GPU needs to go through the FSB and that can sometimes block memory transfers and the latency will go up. If one thread needs that it will be slower and if another thread is waiting for that thread then it will also be slower. It’s like a chain reaction in worst case scenario.
In single threaded applications all operations are done in sequence. And there are no conflicts. Advantages using a different passage for one type of traffic don’t exist there. In multithreaded applications one thread could be used for sending data, other threads prepare buffers for the sender thread in order to do work in parallel. If they are using memory and the sender thread is sending data then there is one conflict on Intel and that will delay operations.
I think that live graphics need to send much more data to the video card because the picture needs to be calculated. In race driver grid for example, if there is a crash and smoke then you can se that the processor works harder and is probably more data that is sent between the video card and processor. If the game has parts in the picture that isn’t changed then it probably isn’t that much data that is transferred.
In complex games (live graphics) and the resolution goes up then the game probably is going to use more memory (more data needs to be calculated). That will make the processor use more data from memory instead of finding it in the cache (the cache on Intel is HUGE so it might need some very high and complex picture for this). If threads need to communicate more, smaller threads then this traffic will also increase the burden on the FSB for C2Q. If memory transfers to the video card are running, synchronization will have higher latency. On AMD threads talks to each other using the L3 cache.
When and how different system designs will be better depends on the game of course. But complex games that are calculating much of the picture, is using more than one thread to do this will make the AMD system design to show some of its advantages. Increasing thread count and system design will will shift more to favour AMD.
If you compare only processor speed than Intel is faster because of the cache. If AMD and Intel have the SAME FPS then it is something OTHER than the processor that makes the FPS equal and the answer to that can’t be more than what differs from AMD and Intel.
If you compare these designs you will also be able to se that there exist more bottlenecks on Intel. Raw processor power than Intel wins, but if the game does something special or there are big changes and much data needs to be processed or recalculated then this will need more time on Intel.
EDIT: About mapped memory and hypertransport
http://www.amd.com/us-en/assets/cont...docs/40546.pdf
Appendix B
AMD Family 10h processors support four write-combining buffers. Although the number of buffers available for write combining depends on the specific CPU revision, current designs provide as many as four write buffers for WC memory mapped I/O address spaces. These same buffers are used for streaming store instructions. The number of write-buffers determines how many independent linear 64-byte streams of WC data the CPU can simultaneously buffer.
Having multiple write-combining buffers that can combine independent WC streams has implications on data throughput rates (bandwidth), especially when data is written by the CPU to WC memory mapped I/O devices, residing on the AGP, PCI, PCI-X® and PCI Express® buses including:
•Memory Mapped I/O registers—command FIFO, etc.
•Memory Mapped I/O apertures—windows to which the CPU use programmed I/O to send data to a hardware device
•Sequential block of 2D/3D graphic engine registers written using programmed I/O
•Video memory residing on the graphics accelerator—frame buffer, render buffers, textures, etc.
HyperTransport™ Tunnels and Write Chaining
HyperTransport™ tunnels are HyperTransport-to-bus bridges. Many HyperTransport tunnels use a hardware optimization feature called write-chaining. In write-chaining, the tunnel device buffers and combines separate HyperTransport packets of data sent by the CPU, creating one large burst on the underlying bus when the data is received by the tunnel in sequential address order. Using larger bursts results in better throughput since bus efficiency is increased.
[...]
Point is...
We already have several tests in CPU Limited scenarios ( relatively low resolutions and noAA noAF, and medium to high details )... and we all know already and it has been proven a million times that the Core 2 processors are faster ( clearly ) than the Phenoms here.
In real-life scenarios ( medium to full details depending on the game and your VGA and monitor, usually combined with some AA & AF ) once again we've seen some tests, but some people are still skeptic.
So... if you're about to gain anything with a Phenom ( assuming that it's faster in high resolutions as gosh insists ) making it a worth buying CPU ( for a gamer ) would be some FPS in real-life scenarios ( Crysis @ High @ 1680x1050 2xAA 8xAF, Call Of Duty 4:M.W. @ Full Details @ 1920x1200 4xAA 16xAF, etc ).
In this case my bet goes to "equality".
I believe that both platforms will be "scoring" nearly the same, or within the normal run to run variation margin.
Maybe 1 to 3 exceptions will be some games that are very CPU limited even at those settings ( x3: The Threat, Supreme Commander, come to my mind atm ), maybe not...
Whaa...? If someone can at least explain to me what you are trying to say... :eek: "Maybe they use memcpy in C++" for this...? I am starting to question your programming knowledge, sir. Seriously. (But I don't want to jump to conclusions, as I have noticed that English is not your first language. Either way, I think I'll need to lay off this until someone properly explains exactly what you are trying to say)
:) Ok, I think finally with this post everyone can see where the confusion lies. Let's not gang up on this but try to explain where the problems are.
@ Gosh
In a nutshell -- the CPU is not responsible for creating the actual image the generates a frame to be displayed. This is what the GPU is for, and the reason it is called a Graphics Processing Unit. Increasing resolution affects the computational load that the GPU must endure, not the CPU.
I will go through this point by point later.... busy at the moment. I will do it in several posts, and it may take a few days :) so please be patient, keep an open mind, and read carefully what I write and the references that I link.
Jack
It’s a very common function used in C++ for moving memory. In assembler there are all these mov (wich is used in memcpy also, one processor don't know that many different commands) and variants of that command. What you do is moving memory from one location to another. Game applications are very much about moving memory. Why I wrote that function was because the Intel compiler deoptimized it for non intel processors.
I hope someone else can explain in better english what I wrote :)
To those following this thread and interested in an X2 update. My general comment. It is clear that the drivers are still beta like, some games are just blazing fast. Other games it is benching in slower than my old 8800 GTX -- examples.
Lost Planet Snow is slower than on an 8800 GTX.
Company of Hero's is much faster (much much faster)
But it appears, from the TechPowerUp data that it is running 1 GPu only, and no x-fire option in CCC ... so I am working on figuring it out. Good news is that I am reproducing the TechPowerUp single GPU numbers within a few FPS.
I am anxious for a driver refresh, hopefully they will be able to update the x-fire profile for all the games to harness the true potential of these cards.
jack
OK, I didn't really read this entire thread but, from what I gather, AMD has a faster GPU solution than Intel? Funny, I didn't know Intel released Larrabee or any other discrete GPU.
@gosh
These are within a generally accepted margin of error. There could easily be a larger separation even on the same platform. If the tests were done correctly, there would be multiple runs, averages, and charts for good measure. Because charts make everything seem more official.:D This would help eliminate that margin of error that ONE SCREEN SHOT has.Quote:
Originally Posted by gosh
You do realize that a good dual core (read: E8500) will beat both of those right?
Of course, you have been told this, or similar, many times. IMO, this thread should be closed.
@JJ
Awaiting your results:up:
Well, gotta figure out why it is not x-firing the two CPUs on the card. All my scores align with a single GPU, not the dual GPU. Frustrating
Just out of curiosity, what is your score in Lost Planet Snow? (At the moment)
I haven't played too much .... I wanted to see low res first...
8800 GTX 640x480 is 185 FPS
4870 X2 640x480 is ~140 FPS
At higher res (1280x1024)
8800 GRX is 101ish
4870 X2 is 84 ish
I have a new PCI device that HW wizard cannot find drivers for... but i think it is due to the HDMI drivers for the card, not anything to do with the actual issue I am seeing here. All i can say is thank god for Wizzard and TechPowerUp... without their data I would have not understood as clearly what was going on.
EDIT: well, resolved the unknown PCI device issue. There is a microsoft UAA bridge driver that was not allowing the identification of the HDMI capability on the 4870 X2. I have been running XP SP2 for the longest time on this build (for consistency), on these two builds I have 4 partitions on the system drive, first partition is for the 'identical builds' of the two rigs, 2nd partition is for 'scratch' work, i.e. testing, checking out new drivers, etc before I disturb the primary build.
XP is great, I really appreciate the fact MS lets you activate XP as many times as you want on the same HW... i could install 20 copies of XPS on the same computer if I wanted. They hosed us with Vista.
Mmm... couldn't find 1280 x 1024, so I stuck with 1280 x 960 instead. Settings at default, with Texture Filtering set to "Trilinear"
Q9450 @ 2.66GHz w/ 2GB DDR2 667MHz and HD4870 at 780/1000 (default, my card is a MSI pre-overclocked card)
Snow: 92
Cave: 84
GPU downclocked to 750/900 (default of reference HD4870):
Snow: 87
Cave: 82
CPU overclocked to 3.6GHz and GPU at default 780/1000:
Snow: 93
Cave: 112
Hope that helps... somehow.
Edit: Gameplay data through snow (first mission) at 3.6GHz, 900MHz RAM, 780/1000. All settings at High. 8x AA, 16x AF, retail version, unpatched:
Min: 32fps
Avg: 41fps
Max: 78fps
It does help! I am thinking there will be major improvements in this game after a few revisions of Catalysts.
Or in all games. Although the card is too overkill for my system right now (Crysis works perfectly fine at 4x AA and 1280 x 1024), I still think there's room for improvement on ATI's part. This thing seems like it can be much faster, and currently, the drivers seem to offload some work on the CPU (all 4 cores utilized while running some dual-core games, especially noticeable in applications that fully stress the graphics card).
Actually, if you want to see it offloading work on the CPU, try PCSX2. I've seen up to 50% of the extra two cores of the CPU being utilized while the graphics card is stressed in that software.
Well ... I am beginning to get this all figured out. The 4870 X2 is a mixed bag at the moment, and there is a lot of sensitivity to configuration, expecially CPU clockspeed and core count. I will not be doing anything systematic just yet until I feel comfortable that I can reproducibly recreate scores from various reviews (particularly, TechPowerUp -- that was one of the best on the net). However, my problems that I mentioned above were not because of a single GPU only firing, it was because I was clocked too low on my processor.
To make a long story short.... I am interested in comparing processors at a base clock speed that is valid, my 9850 is stock at 2.5 Ghz, so I have been running most of my testing around there. So in some games I am seeing huge improvements, in others not so much -- I thought very odd. However, it is clear that the X2 pushes the current gaming matrix (with a few notable exceptions, Crysis, WIC) to the CPU limited domain.
To see what I mean...
Here is 3DMark06 (Default values, 1280x1024, no AA, no AF) for a 2.5 Ghz QX9650:
http://forum.xcpus.com/gallery/d/737...X2_default.JPG
Here is 3DMark06 for a 3.67 GHz QX9650:
http://forum.xcpus.com/gallery/d/739...0X2_WOWSER.JPG
Here is why I say 'core count' with respect to 3DM06
Techreport's 3.6GHz 8400 got a 3DM06 score of 17555
Ok, so the summary:
2.5GHz QX9650: 3DMark06 score 14471, SM2 5231, SM3 7316, CPU 4025
3.67GHz QX9650: 3Dmark06 score 20609, SM2 7655, SM3 10285, CPU 5753
I had the same behavior with my Q6600 and 9800GX2... it's because the SM2 and SM3 tests are singlethreaded, so core clock is very important. Try overclocking the GPU with the Qx9650 @ 2.5GHz and you won't see any improvement.
Question: Is there information about the difference in latency reading and writing data comparing Hypertransport and the Front Side Bus?
Hypertransport is designed to be a very fast point to point communication (if I am right). If this communications is very fast it could be one explanation why AMD is even or sometimes have a bit better numbers on some tests when they test single threaded games on very high detail and high resolution.
If communication to external hardware is similar on AMD and Intel then Intel should always win ins single threaded games if they are clocked the same even if the main bottleneck is the GPU. 6 MB L2 cache at 15 clocks that can be used for one core compared to 512 KB L2 cache on AMD does make a huge difference (I think more than 10% in all single threaded games) if processor performance are compared.
Gosh ....
You are confused a bit about the communication to and from the GPU on the different platforms.
Let's take getting a chunk of data from memory to the GPU (which does happen, just not the volume you think it is)....
AMD's hypertransport connects the chipset to the CPU, then the CPU to memory. Intel's layout connects the memory to the chipset then to the GPU. PCIe 2.0 is spec'ed to be it's own bus master, as opposed to earlier implementations which used DMA. On Intel's platform, the GPU has direct access to low level system memory for various data and the CPU simply writes the command buffer on GPU memory simply because it does not need to access the frontside bus to get to the memory data to begin with. The GPU is only one hop away from memory on the Intel platform, it is two hops away on AMD's.
In terms of the data the GPU gets, it is in fact very little, not enough to saturate the FSB or HT ... the CPU populates the command buffer on the GPU for the GPU to do it's action, all the other large data elements are precached onto the GPU memory (hence the reason GPU card makers keep upping memory, to keep pace with the large textures of todays games)
http://people.cs.uchicago.edu/~robis.../gpu_paper.pdf
This is essentially your second misunderstanding, in that you are thinking that all the data for a game event is stored in system memory, it is not .. I provided you a link that showed the different usages of video memory per game, perhaps you did not realize that that was reporting the video memory on the card and not system memory. Not sure, but all the heavy duty data that is needed for rendering a level is first loaded into the GPU's local memory (textures, vertex data, etc.) this is why when you start a game it takes several seconds (20, 30 or even a few minutes) to load... it is transferring that data over the low BW bus (both HT and FSB are low BW compared to the memory BW of a GPU).Quote:
The GPU is able to make calls to a certain window of the system’s main memory and is responsible for loading the data it will operate on into its own memory. The CPU directs this activity by writing commands to a command buffer on the GPU. On the old fixed-function pipeline, these commands associated matrices with vertex data. Now, CPU commands could very well point to program code that will be fetched and executed on the GPU.
Even nVidia provides you the concept of the partition between main and GPU memory:
http://http.developer.nvidia.com/GPU...s/fig28-01.jpg
http://http.developer.nvidia.com/GPU...gems_ch28.html
The point is.... on an Intel platform the Graphics Memory Controller Hub (GMCH) provides one hop access for the GPU to memory. AMD's arrangement puts it as a two hop access ... if anything, Intel provides lower latency access for the graphics card to main memory, in either case ... it is irrelevant since all the texture and geometry data is loaded to video ram (with it's high BW interface) before run time.
This is moot regardless, because the volume of data needed by the GPU from main memory is very small, since all the data that the GPU needs is placed in the Vertex, Texture, Mesh, and other buffers on the local GPU memory. The rendering for the scene is done by the GPU via commands written to the command buffer.
If the bottleneck is the GPU, then AMD and Intel will tie +/- a few FPS just on noise of the measurement, single threaded, multithreaded -- it does not matter. Again, Gosh ... moving to parallel computational methods for gaming code or any code, will simply speed up the computational result than that to be had over a single thread. A single task application will always speed up if you can run segments in parallel over simple sequential execution. The trick, and challenge, of multithreading in gaming is the interdependency of segments on the other. This is why you see some speed up but not a 2x gain, for example, going from single to dual thread. This is really nothing more than an example of Amdahl's Law.
Intel produces a computational result (clock for clock) faster than AMD, and as such, the CPU depended code will finish faster single or multithreaded, hence Intel will be faster in games.
You are being fooled and misled by the forum posters who run their tests upto the GPU limit then you make an incorrect conclusion that it is somehow manefested in single/multithreaded. This happens all the time....
At low resoltutions, Intel wins by 20 to 30 to up to 50% clock for clock in gaming, but at high resolutions they show tied... this is again due to GPU bottlenecking the computation flow.
nVidia states it in their own words:
http://http.download.nvidia.com/deve...erformance.pdf
They show you a bottleneck flow chart that does exactly what we have been telling you.... to find the GPU bottleneck, vary the resolution, if the FPS varies it is the GPU or some component within the GPU pipeline, if not it is the CPU:
http://http.developer.nvidia.com/GPU...s/fig28-02.jpg
Varying or increasing the graphically important parameters in a game changes the computational workload on the GPU (NOT THE CPU). This is why, when one wants to assess the computational capability of a CPU to the gaming code, it is important to observe the CPU as the limiter (i.e. low resolutions) in order to make a statement on how well the CPU can handle the code of the game the requires the CPU (i.e. non graphical code such as physics, AI, boundary collisions, etc.)
Let's go back to your latency question.... it is MOOT, even if the FSB latency was 3x longer it would not make a difference.
At 200 frames per second, the GPU is busy rendering ~ 1/200 seconds or 0.005 seconds. This is 5 milliseconds, or 5000 microseconds, or 5,000,000 nanoseconds. Latency of even 200 nano seconds is a wink compared to the time the GPU is spending in it's calculation, even for a very high frame rate.
The more interesting question to ask is what architectural feature of the Core uArch is allowing Intel to perform so much better at executing gaming code vs AMD's solution?
Jack
What is the volume?
What I wrote was the points in 3d space is calculated and sent from the CPU. I know that DMA (http://en.wikipedia.org/wiki/Direct_memory_access) is used to load these textures etc that is used to "paint" the picture. If all this data would be sent to the gpu for every picture than it would be so slooooow ;). I think that there is enough data that is sent any way. Just look at 3d drawings that have that grid like looks. And add to that all command used to inform how to "paint" the picture
What I asked for was if there is information about latency comparing Hypertransport and the Front Side Bus?
EDIT: If the GPU needs to use RAM (access ram on the motherboard) during gameplay the performance is going down the toilet as we say :)
Less than what 800 Mhz FSB can support :) you don't need to know, you can setup an experiment to see if it matters.... I.e. change the FSB speed and see if it changes the results..... I showed you that data http://www.xtremesystems.org/forums/...6&postcount=59.
Nonetheless, the thought exercise was not about the size but how each platform retrieves a chunk of data.
I also showed you that Intel scales a multicored multithreaded game along the same curve as an AMD CPU:
http://forum.xcpus.com/gallery/d/660...reScaling2.jpg
However, if you want to measure it.... you can download Intel's Vtune or AMD's CodeAnalyst and monitor the counters of the bus busy line.
The vertexes are already loaded to GPU memory, changes in that geometry (such as a wall blowing up) is sent by the CPU, but the entire 3D mesh is not recalculated everytime... camera position and perspective is, and this is what the GPU uses to render the image. The GPU also stores the Z-buffer, which determines which surfaces are visible in front of one another within the perspective of the camera.Quote:
What I wrote was the points in 3d space is calculated and sent from the CPU. I know that DMA (http://en.wikipedia.org/wiki/Direct_memory_access) is used to load these textures etc that is used to "paint" the picture.
This is by design, you are correct in saying that the FSB is slow... so is HT... in fact, HT is slower than the FSB in one direction (which would be from CPU to GPU). 2000 Mhz HT line gives 2 bytes of data in one direction, or 4000 MB/sec or 4.0 GB/sec... FSB is half/duplex, giving 1333x8 in one direction or 10.6 GB/sec.... technically speaking, if data needs to get from the CPU to the GPU, Intel would provide more peak BW.
Nonetheless, GPU makers and the HW/software has evolved to move all the BW necessary components to the local memory of the GPU and design in 100 GB/sec BW from v-RAM to GPU for this very reason. All the GPU needs from the CPU is what do I need to do next (a command list, hence a command buffer).
Now, when the resolution goes up high enough and the size of the textures are larger than what can be held in VRAM, then yes... a huge performance hit is taken because now the GPU must fetch a texture it does not have from system memory across that slow FSB or HT link. I have only seen recent examples of this:
http://www.anandtech.com/video/showdoc.aspx?i=3372&p=9
http://www.guru3d.com/article/radeon...w-crossfire/11
Notice the 512 MB cards dropping like a rock going from 1900x1200 to 2560x1600 ... at 2560x1600 the textures are too large to fit into VRAM... and the reviewers correctly conclude that. This is called texture thrashing. I even provided you a link showing modern games VRAM usage earlier. The vast majority games, in fact all that I have seen so far, are able to fit a levels worth of textures into 512 Meg. Grid is the first I have seen that will exceed the 512 MB barrier -- and it only does so at 2560x1600.
Did you even read the above post... did you not see the bottlenecking flow chart by nVidia... even nVidia argues a GPU or CPU limited scenario.Quote:
If all this data would be sent to the gpu for every picture than it would be so slooooow ;). I think that there is enough data that is sent any way. Just look at 3d drawings that have that grid like looks. And add to that all command used to inform how to "paint" the picture
I am not sure what you are talking about in terms of 3D drawings... Ratracying? Intel is faster there to... significantly faster.
I have not seen any study or data comparing the latency of just the bus, it has always been a convoluted measure of latency through the bus to something else. I can look for you...Quote:
What I asked for was if there is information about latency comparing Hypertransport and the Front Side Bus?
Ok... there is not a good explanation for this... even Lost Planet, at higher resolutions, shows AMD can support higher FPS in GPU bound scenarios... but if you followed the thread. I can get the Intel's high res GPU bound FPS to exceed AMD's by increasing the PCIe frequency (i.e. the BW of the PCIe bus).Quote:
If this communications is very fast it could be one explanation why AMD is even or sometimes have a bit better numbers on some tests when they test single threaded games on very high detail and high resolution.
My hypothesis is that the AMD implementation of PCIe 2.0 in the chipset is better than Intel's.
jack
4870 X2 update: well guys, it is gonna be a bust for a while I suspect. Only on a few occasions can I get frame rates that exceed a 8800 GTX under the same test conditions (regardless of CPU used). I am most certain that this is a driver issue, and that the drivers that ship with the card is different than the drivers used by the press-reviews that we saw. I am getting in some cases 1/2 the FPS of what other reviewers have shown, under similar settings, setup, etc. If AMD does not produce a press-like quality driver within the next week, the cards are going back.
This is the quote from Guru3D: "With the latest press-driver used in this review the X2 finally is starting to show some better performance scaling." ... I cannot be certain that the drivers I am using are correct.
jack
And I don't really understand your answer. I asked one simple question and am getting big answer where you tell me that I don’t understand.
The key to performance on the video card is the same as when they compress videos as much as possible. You just redraw what needs to be redrawn. Don’t refresh data that hasn’t been changed. This isn’t a problem if there isn’t any action in the game. Having as high FPS then isn’t that important. You don’t want low performance when there is action though and when there is action then much data needs to be processed. When there is action in the game then the need for fast communication is very important. Processors are extremely fast. They mostly sit and wait for data and moving all that data needs to be fast.
If you mean that they write to memory on the motherboard first and then copy memory from there to the GPU that would seem rather stupid. If the call is asynchronous they could get more performance (the processor will only need to wait to copy it to ram) for that command but that is very hard to do because you don’t know when the command is ready (you need synchronization). You have to check that (or the driver). If it is synchronous then you need to wait for two memory transfers for the same data. Also I don’t think that is one improvement compared to prepare buffers on the stack and just send it to allocated data on the video card. Stack data is normally in L1 or L2 (for both amd and intel) cache if it isn’t too big if I am right.
HyperTransport 3.0 is 20.8 GB/s, I have read that they can’t use all that bandwidth on the pc mothterboards for amd but the same goes for the FSB. You need some insane OC to go over 10 GB/s and that is only achieved if data is transferred in long "trains".
That would be very interesting, I looked some before but finding data on speed between cpu and gpu was not easy to find. I did find other data on the speed but it seemed to low to be true for video comunication.
The released drivers on the CD ... 8.7 from AMD's website will not install, says valid HW not found. I put beta 8.8 drivers on (a scratch partition 'dirty' build), and same results. What is looks like to me is that ATI seeded the review sites with an alpha driver, with the correct profiles for the games they were using. I can match 3DMark06 for example, and some settings on COH, but most others are a bust.
The reason I am thinking this is that a few review sites are getting the same results I am:
http://www.tomshardware.com/reviews/...md,1992-4.html (pains me to link this :) )....
They likely use drivers out of the box.... I am just not getting the high octane results I have seen on the other websites, even with the same HW supporting.
EDIT: the actual driver version reported by catalyst is 8.52.6-080709a-048489c-ATI
Because you don't understand, even the context of your question is ludicrously silly. First you complain that no one is explaining, now that it is explained, you do not want a long answer you do not understand.Quote:
asked one simple question and am getting big answer where you tell me that I don’t understand.
Obviously, no matter what data I show you, no matter if I link up even the GPU makers themselves, you will not understand I have explained it in a simple terms as I can... so I am finished.
Just a bit advice, do not try to pair a Phenom with a high end GPU, you will be disappointed.
jack
I am pretty certain that is the case. nVidia has publicly complained about Intel's PCIe implementation for years -- they use that as the reason they don't release SLI on Intel chipsets.
I get better performance out of my 8800 GTX than what I am seeing right now with the 4870 X2's, with just a few exceptions.
Jack, don't even bother with gosh. No matter how much data you show him, he won't try to understand because something green blinds him. :rolleyes:
Remember that much of the data that goes through the PCIe to GPU ALSO has been sent through the FSB. During gaming most of the data sent through the PCIe comes from CPU and that data will always travel through the FSB.
The question is… Is it really PCIe that is bad or is it the FSB that is the main bottleneck.
Why has Intel removed the FSB on Nehalem?
It's the PCIe, I can change the FSB speed all I want... no change, I can bump the PCIe 10% and get 5% improvement off the bat. This is wasted time and effort, data means nothing to you.
Intel has moved away from the FSB because as core count goes higher the need for more BW will be needed, they are starting now before 6 and 8 core ht. 1333 Mhz is plent to satisfy any DT need on most all applications, games included. But 4, 6 or 8 core in server and HPC need that BW.
Well, it could be the other way around ;). I have informed about strong areas on Intel, but when you say anything good about AMD compared to Intel these seems to create some allergic reaction and then you will be told explanations that is very hard to make something out if you know a bit about the subject because it doesn’t make sense. If you show tests where AMD wins then there is some error, if Intel wins then it is ok. :shrug:
Crysis on a 4870 X2. I ran both the CPU and GPU bench, using resolutions of 1024x768, 1280x1024, and 1680x1050. Each was tested at 4x AA and again at 16x AA so a total of 6 runs for each bench
Phenom @ 2.5 GHz
GPU Bench
http://forum.xcpus.com/gallery/d/739...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG
QX9650 @ 2.5 GHz (matched clock speed for clock for clock)
GPU Bench
http://forum.xcpus.com/gallery/d/740...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG
Output is attached.
334 > 200
& not ?
200 = 200
gosh, said you can't clock AMD's L3 cache thats not true
Me and charged3800z24 both have gotten our 2.4ghz NB which is L3 cache speed.
http://www.xtremesystems.org/forums/...1&d=1219028090
And here is some extra data for you. Seems like Crysis doesn't scale so well with CF at all. CPU @ 3.6GHz and single HD4870 at 780/1000.
Outstanding job, thank you!
Yeah, I have stopped messin' around with Crysis at the moment and went to FEAR.... and all I can say is WOWSER. At 1900x1200, everything maxed to the hilt, and it is CPU limited.
It is pointless to run this card < 1900x1200, absolutely pointless. FEAR is old, I know, but still a gorgeous game... amazing to see it 1900x1200 cranked to the max, max AA, max AF everything max.
Ok, about to post FEAR data ... of course, in the world according to Gosh FEAR is single threaded so Phenom should do poorly.
FEAR on the 4870 X2 Phenom vs QX9650 using the 4870 X2 cards......
I did several experiments, I am posting the results for 3 of them. One at my normal stock baseline settings (that I do all my clock for clock studies on), i.e. both processors at 2.5 Ghz and DDR2-800. I then overclock both processors to 3.0 Ghz and set the memory to DDR2-1067. I know in FEAR at low res on a 8800 GTX that the difference between DDR2-800 and DDR2-1067 can have a substantial impact on the Phenom performance, thus I am also posting a 2.5 Ghz DDR2-800 vs DDR2-1067 for phenom to show the impact.
Bus speeds are not changed in any experiments, all OC is done via the multiplier (hence the reason I only buy unlocked multi CPUs)
Fear does not have a window mode so i cannot capture a CPUID for validation, you will need to take my word for it.
Phenom @ 2.5 GHz DDR2-800 max =253 min =37 Ave = 96
http://forum.xcpus.com/gallery/d/743...00_ALL+MAX.JPG
QX9650 @ 2.5 GHz DDR2-800 max =393 min =40 Ave =134
http://forum.xcpus.com/gallery/d/741...200_ALLMAX.JPG
About 40% (EDIT: ooops, miscalculated in my head, had to change the old 35% to the correct value) faster clock for clock at high res, max everything. Note, the min framerate is the same -- meaning both CPUs will give you the same quality gameplay.
Next experiment, OC to 3.0 GHz
Phenom @ 3.0 GHz DDR2-1067 max = 305 min= 48 ave = 113
http://forum.xcpus.com/gallery/d/741..._DDR2_1067.JPG
QX9650 @ 3.0 GHz DDR2-1067 max =466 min =40 Ave = 159
http://forum.xcpus.com/gallery/d/742..._DDR2-1067.JPG
On average 40% faster clock for clock....
Finally, here is Phenom at DDR2-1067 but 2.5 Ghz to compare above to see the impact of faster memory:
Phenom @ 2.5 Ghz DDR2-1067 Max = 273 Min = 37 Ave = 102
http://forum.xcpus.com/gallery/d/741..._DDR2_1067.JPG
Ok, there you have it. Both scale the response with clock speed the same. Oh, and incase you are wondering ... yes, I can reproduce the Techpower up FEAR number within a few FPS using the same CPU clock speed he used.
Jack
Wow... QX9650 just totally spanked FEAR alive. Max 466... seriously?
But... I'm still seeing something weird here. Seems like the QX9650 always has that 2% between 25 to 40fps no matter what. What exactly happened in there? Maybe a FRAPS measure would give us a better picture. Maybe some spots in the FEAR benchmark was squeezing FSB bandwidth.
P.S.: And you are welcome about the data. :)
Not quite understanding your question... could you clarify... however, I will do a fraps run if you tell me exactly which one you would like to see. EDIT: Ohhh, I see you are looking at the % bin statistics... good question ... Hold on I will FRAPS the 2.5 Ghz DDR2-800 runs again.
EDIT: It is obviously very clear that the drivers that shipped with the card, and the drivers given to reviewers were two different realities. Some reviewers I would appear skipped the 'press-drivers' and installed the shipped drivers -- Extremetech and Tom's came away with bad impressions of the 4870 X2 for example, and my numbers are matching theirs most closely in general. I suspect in a month or two, this card is just gonna get better and better.
jack
Uhm... basically, I'm wondering why on the QX9650, there was always 2% of the time that fps was between 25 to 40. If you look at the screenshots again, on the QX9650, although fps is sky rocketting, 2% is always between 25 to 40. On the Phenom system, upon overclocking to 3GHz, it's always over 40. And minimum was at 45.
yeah, i figured that out... I am FRAPing them now.
It happens as you pass through the glass in the door at the end of the perf run. FRAPs samples about once per second, it is not capturing the highest or lowest. It would take several runs and a lot of luck to actually capture the event that produces the max and min.
This is a FRAPs plot of the 2.5 GHz DDR2-800 runs for both CPUs....
http://forum.xcpus.com/gallery/d/743...9850vs9650.JPG
Could be something flunking in the background, too. I have that a lot with my old IDE HDD. Actually, it's flunking so much on my system that my 3DMark Vantage scores vary in the range of 1000...
next up.... quake wars enemy territory.
Ok... so for Quake Wars Enemy Territory, I utilize the HOCBenchmark utility, which can be downloaded here. Because I keep historical records, I am using an older version of QWET as well.
You can download the benchmark utility here: www.hocbench.com (as of the time of this post, their server appears down).
I setup the HOC benchmark to run 3 runs at 1024x768, 1280x1024, and 1900x1200. The quality settings are set to high, and I am using 16x AF and 4x AA in the HOC utility. I am also using the Quarry script/scene. The output comes in the form of an HTML file, no screen dumps of the actual game.
Phenom @ 2.5 GHz DDR2 - 800 output:
esolution: 1024×768
Score = 150 FPS
Score = 153 FPS
Score = 154 FPS
Average score = 152 FPS
Resolution: 1280×1024
Score = 150 FPS
Score = 150 FPS
Score = 152 FPS
Average score = 150 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 147 FPS
Score = 147 FPS
Score = 147 FPS
Average score = 147 FPS
=================================
QX9650 @ 2.5 GHz DDR2 - 800 output:
Resolution: 1024×768
Score = 175 FPS
Score = 178 FPS
Score = 175 FPS
Average score = 176 FPS
Resolution: 1600×1200
Score = 171 FPS
Score = 171 FPS
Score = 170 FPS
Average score = 170 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 165 FPS
Score = 164 FPS
Score = 166 FPS
Average score = 165 FPS
The output files are attached.
EDIT: Darnit my bad... the QX9650 instead of 1280x1024, I ran 1600x1200... rerunning the 1280x1024 to add that info.... sorry.
EDIT2: Here is the QX9650@2.5GHz DDR2-800 run at 1280x1024
Resolution: 1280×1024
Score = 176 FPS
Score = 176 FPS
Score = 171 FPS
Average score = 174 FPS
The attachment has been updated with that run as well.
BTW -- QWET uses all 4 cores.
1600 x 1200 or not, even the 1920 x 1200 scores are beating Phenom to pulp, even when Phenom is at 1024 x 768. I think we pretty much have a conclusion.
Not done yet...
http://forum.xcpus.com/gallery/d/7437-2/DESKTOP.JPG
(View to see the taskbar menu slide ups for installed software)
This is what is currently installed... the Phenom has exactly, program for program, the exact installation. So there is still a lot of comparing to be done :)
EDIT: I am gonna stick on QWET for a moment, running again on the QX9650 at 200 Mhz (800 Mhz FSB) speed, @ 2.5 Ghz ....
Well, color me purple, check this out.... QWET at 200 Mhz system clock, 800 MHz FSB..... (all the above runs were done at 333 Mhz or 1333 MHz FSB) EDIT: NOTE -- memory divider was not changed, this run is at DDR2-400 Mhz :) .... my bad.
Resolution: 1024×768
Score = 147 FPS
Score = 140 FPS
Score = 147 FPS
Average score = 144 FPS
Resolution: 1280×1024
Score = 144 FPS
Score = 141 FPS
Score = 139 FPS
Average score = 141 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 137 FPS
Score = 140 FPS
Score = 142 FPS
Average score = 139 FPS
Let's see what we get when we go to 1600 MHz FSB.
EDIT: OOOPS my bad... this was 200 MHz (800 MHz FSB), but I forgot to up the memory divider.... I am rerunning now at 200 Mhz FSB + DDR2-800 instead of DDR2-400. This may prove informative since I mentioned above that the GPU can go straight to system memory in one hop.
Output for DDR2-800 ...
Resolution: 1024×768
Score = 158 FPS
Score = 163 FPS
Score = 160 FPS
Average score = 160 FPS
Resolution: 1280×1024
Score = 157 FPS
Score = 157 FPS
Score = 160 FPS
Average score = 158 FPS
Resolution: 1920×1200 (HD WideScreen)
Score = 155 FPS
Score = 156 FPS
Score = 156 FPS
Average score = 155 FPS
So in this case a 40% decrease in FSB BW translates into a 7-10% hit in FPS.
Ok.... last one for the day... Half-Life2-Lost Coast, I will do episode one and two a bit later I suspect. I had to run it windowed so I could screen grab with a shot of CPUID. The window that HL2 creates is always on top, so after the bench I had to move it slightly off screen in order to reveal the CPUID window.
My standard baseline, Phenom@2.5 Ghz DDR2-800 4870 X2 1920x1200, max everything including AA and AF.
http://forum.xcpus.com/gallery/d/744..._1920x1200.JPG
QX9650 @2.5 Ghz DDR2-800 4870 X2 1920x1200, max everything including AA and AF.
http://forum.xcpus.com/gallery/d/744..._1920x1200.JPG
From my personal opinion C2Q has no problem with bandwidth itself and never had. The problem with the fsb is its latency. Increasing or decreasing FSB has just an effect in terms of bandwidth but latency stays always the same. You can easy check this with low level Benchmarks.
In other words FSB is not a big problem in terms of bandwidth but there is a physical way on the PCB the Data must go through. Wheter the Data has to go through PCB layers decide in comparision to K10 and Nehalem, whether the coherenc latency is μs or ns. This is a very significant factor, which allows K10 and Nehalem better scaling vs Core2Q.
So what an effect does this have in real life conditions. In a good case the prefetcher can offset this physical latency, the data is already in the L2 Cache and can be calculated (in fact in this case there is no latency) in a worse case the prefetcher works inefficient and the physical latency of the FSB results in poor performance. And exactly thats when K10 outperforms an Inte Core2Quad.
I excuse my bad english, if there are any question feel free to ask.
Give me an Intel Rig and i sure will ;). And again the FSB does not limit bandwidth wise, it all depends on the prefetcher and how well its actually working.
"Wrong" Data in the Cache (simplyfied) -> accessing memory -> high latency -> bad coherency -> bad performance ;)Quote:
Anandtech:
This is the test that actually screws the whole thing for Intel. It turns out that CBALLS2 calls a function in the Microsoft C Runtime Library (msvcrt.dll) that, when combined with Vista SP1, can magnify the Core architecture's performance penalty when accessing data that is not aligned with cache line boundaries.
jack im impressed with your data, have you ever thought about making your own review site?
This site would be my number one, since your comparisons are top notch. :up:
Yes, especially for processor to memory... latency across the bus to other parts, such as SB IO or even Graphics card is not a huge issue because even with the longer latency, the timing of the IO and graphics card overwhelms any latency on the bus.
This is where the large cache and aggressive prefetches work well, only in a few cases can you see this really be a problem