Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

Printable View

Show 100 post(s) from this thread on one page

08-15-2008, 08:54 AM
Loser777

gosh, you posts are just a bunch of excuses for why AMD fails to match or beat Intel.
08-15-2008, 09:13 AM
gosh

Quote:

Originally Posted by Loser777

gosh, you posts are just a bunch of excuses for why AMD fails to match or beat Intel.

Well I can tell you that even I was suprised when I saw results from World in Conflict that Jack did. I was starting to wounder if he had forgotten to disable the TLB bug. The difference was so big that it can’t even be explained by differences between how the processors work. You need to optimize for one processor to get that big difference so I looked it up. And found it, Intel has been helping them…

I know it’s hard for a non programmer to understand. But if one game scales to large threads and don’t share memory and don’t syncronize threads then it works well on intel. It’s like running separate single threaded applications that has some point where they joins work and then they go back to work again. This is also easier to do for programmers but it isn’t effective if you really want to use all the power that the processor has. I did show you a link bout the render split design before.

I wouldn’t be surprised if the did use intel’s compiler on the World in Conflict also.

http://yro.slashdot.org/comments.pl?...3&cid=13042922
http://aceshardware.freeforums.org/c...iler-t428.html
08-15-2008, 10:48 AM
RunawayPrisoner

Again with the programmer excuse. Seriously, pal, no disrespect intended, but knowing how to code doesn't mean anything at all. Just because you are a programmer does not mean you know more than others. And you are the one who suddenly come out with Phenom winning against even Q9450 while hundreds of others find Phenom not even measuring up against Q6600 in most cases.

If you are so knowledgeable, would you mind explaining why Intel processors do so much better than AMD processors with PCSX2? (In the same manner you have been "explaining" things, that is)
08-15-2008, 11:26 AM
highoctane

Oh geez the coding again, anyways if it where a coding issue whats the point in buying a phenom rig for gaming considering everything is better optimized to run on Intel anyways if thats the case.

Hmmmm, so basically what we have is build Intel and get better performance in a desktop environment say 95% of the time due to universally optimized coding or buy Phenom for better bandwidth but is slower 95% of the time at the same task due to lack of optimized code.
08-15-2008, 11:39 AM
gosh

Quote:

Originally Posted by highoctane

Oh geez the coding again, anyways if it where a coding issue whats the point in buying a phenom rig for gaming considering everything is better optimized to run on Intel anyways if thats the case.

Because AMD will run all that Intel runs without problems too, and if you add applications doing normal work AMD will like it more.
You can also be certain of that Intel is doing what they can to change how developers create applications. Nehalem/Larrabee etc is thread monsters (more than phenom), you need to scale well in order to use all the power in those processors. If ray tracing will arrive then you probably also going to need one processor that is good at scaling to threads.

The memcpy Intel seems to do with their compiler is really ugly, it is one of the most common functions used in DirectX games (maybe game developers know this if they use intel compiler and write their own function)
08-15-2008, 12:15 PM
RunawayPrisoner

Does not matter. Regardless, games were made for Intel processor, and they run better on Intel as they were written for Intel (you are implying this) so end of story.

You just literally admitted to Intel being better at handling game, sir.
08-15-2008, 12:19 PM
gosh

Quote:

Originally Posted by RunawayPrisoner

You just literally admitted to Intel being better at handling game, sir.

What type of game?
08-15-2008, 12:29 PM
RunawayPrisoner

Every type of game. Unless you are telling me you know precisely how each game works even without looking at their source code?
08-15-2008, 01:10 PM
gosh

Quote:

Originally Posted by RunawayPrisoner

Every type of game. Unless you are telling me you know precisely how each game works even without looking at their source code?

Well it isn’t.
There are some links in this thread showing that AMD performs better on high resolutions.

Single threaded games you may be right, core 2 is designed for that
08-15-2008, 01:45 PM
skinnee

:brick:

I read this entire thread...no amount of factual information is going to get gosh to finally come around. He will continue on his path of circular reasoning until you guys flare out...and then he will claim victory.
08-15-2008, 01:55 PM
gosh

Quote:

Originally Posted by skinnee

and then he will claim victory.

Is this a competition?
Do you feel better if I say that you win?

This discussion is about processors. I think all knows the future, it's about threading. Applications will shift from single threaded to multithreaded more and more. When you are going to gain on using a quad depends on what type of applications he/she is using today. Most of us will do very well on duals.

The main argument in gaming when threaded games are used for AMD hasn’t been talked about as much here. It’s about bottlenecks. We mostly talks about which can get highest avg FPS.

What really sad is that reality isn’t shown in tests and that favours Intel when all these tests are done. Intel will be stronger and stronger, if AMD was a bit more late with the 48xx serie than they might had been in SERIOUS trouble. If there isn’t any AMD then we all loose and all shareholders at Intel will be happy.
08-15-2008, 02:13 PM
skinnee

victory in the intellectual sense...:brick:
08-15-2008, 02:14 PM
RunawayPrisoner

Quote:

Originally Posted by gosh

Well it isn’t.
There are some links in this thread showing that AMD performs better on high resolutions.

Single threaded games you may be right, core 2 is designed for that

There are over a hundred links on Google to show otherwise. And there are more links and data in this thread that say otherwise, too.

Plus you did not answer my question. Why does Intel Core 2 work better with PCSX2 than AMD Phenom? We DO have the PCSX2 source code (the latest SVN is 384), so you can peep in it for technical details right there. Care to elaborate based on your expertise in programming?
08-15-2008, 02:18 PM
gosh

Quote:

Originally Posted by RunawayPrisoner

There are over a hundred links on Google to show otherwise. And there are more links and data in this thread that say otherwise, too.

Have been asking for those links before, can you show me?

Quote:

Originally Posted by RunawayPrisoner

Plus you did not answer my question. Why does Intel Core 2 work better with PCSX2 than AMD Phenom?

Show me the code and then I will be able to answer. I don't know how they have coded that application and I think that even you will understand that is impossible for anyone to explain why.

I can give you some clues tough. It’s difficult to do threading; it is probably very difficult emulating threads from other code effectively. That program probably want raw processor power (high Hz) and fast cache.
08-15-2008, 02:51 PM
JumpingJack

Quote:

Originally Posted by gosh

About World in Conflict

http://www.yougamers.com/articles/45...terview-page6/

YouGamers: Let's talk about high end. Do you have a threaded engine for multicore processor support?

Westberg: Yes, we do. On the CPU side, Intel has been very supportive helping us out, because it's a big step moving to a threaded architecture. We've been working with them, and we scale quite well, but if you have a quad-core you won't run the game four times as fast, because it's really hard to reach that. Also, if you have a quad-core, each of those four cores is pretty fast, and we still have to scale down to this 2 GHz machine that's our low-end spec for everyone to be able to run the game.

So we have this 2GHz processor here, and then we have four CPU's that are twice as fast, so we have eight times more [processing power] over here. It's hard to scale all the way [across that]. It does scale, so a dual core runs faster than a single core, and a quad-core runs even faster.

What we do thread is the entire physics update on a separate thread, we thread our shadow volume updates, [...] particle updates and tree updates. And then the obvious things like everybody is probably doing, like sound, voice over IP and things like that. But the four I mentioned are the [threaded processes we render] frame to frame.

:) :) Well I will never break you of your paranoia but look at this last line... the thread their updates per physics, shadow volumes (fog), particles, and tree .... hmmmm, after the update... that get's sent to the GPU. Frame by frame, each thread works in parallel to complete a update to the frame which is then sent to the GPU ....

Thanks for the link...

Jack
08-15-2008, 02:56 PM
JumpingJack

Quote:

Originally Posted by gosh

Well I can tell you that even I was suprised when I saw results from World in Conflict that Jack did. I was starting to wounder if he had forgotten to disable the TLB bug.

I know it’s hard for a non programmer to unde

http://yro.slashdot.org/comments.pl?...3&cid=13042922
http://aceshardware.freeforums.org/c...iler-t428.html

:) ... I am using B3 silicon, it does not have the TLB bug and turning the patch on or off via BIOS or AOD makes no difference.

Here is a comprehensive effect of the TLB patch on /off on a 2.3 GHz 9600 BE http://forum.xcpus.com/motherboard-c...om-menace.html

I haven't generated the charts yet, but for the all low res/low detail (CPU bound condition) at 1280x1024

pre Patch: Max = 139 Ave = 61 Min = 30
post Patch: Max = 111 Ave = 51 Min = 28

Unlike some of the silly runs here just to answer questions, the data in the article above is usually run 3 times for reproducibility, averaged, and reported as an average, the data I quote above is from the first run.

I can post screen shots if you like.

Computationally, the TLB patch is a huge hit -- especially on games, but considering that games page most of their memory and are so branchy, it is not surprising. However, I have also never observed a TLB bug manifest itself, it was an unfortunate PR disaster for AMD, but running a B2 (TLB bug) processor is perfectly stable over the 20 or so games I tested on it. I would have no reservations recommending someone take a B2 CPU if they so desired, the TLB bug was blown way way out of proportion.

Jack
08-15-2008, 03:00 PM
RunawayPrisoner

To Gosh:

Phenom 9850
http://www.sharkyextreme.com/hardwar...261_3737046__9
http://www.techspot.com/review/93-am...on/page10.html
http://www.tbreak.com/reviews/articl...0&pagenumber=3
http://www.bit-tech.net/hardware/200...9550_b3_cpus/7
http://www.bit-tech.net/hardware/200...9550_b3_cpus/8
http://hothardware.com/Articles/AMD-...vision/?page=7
http://www.extremetech.com/article2/...2282402,00.asp

(The following has 3870X2 data)
http://www.firingsquad.com/hardware/...view/page5.asp
http://www.firingsquad.com/hardware/...view/page6.asp
http://www.firingsquad.com/hardware/...view/page7.asp
http://www.firingsquad.com/hardware/...iew/page10.asp

(The following has TLB and without TLB data)
http://techreport.com/articles.x/14424/4
http://techreport.com/articles.x/14424/5

Phenom 9900
http://www.legitreviews.com/article/597/8/
http://www.legitreviews.com/article/597/9/

I would post more but I am really tired of copying and pasting... there are just too many. And most of them show that between 1280 x 1024 to 1600 x 1200 for instance, Intel leads. And Phenom occasionally breaks through in one or two specific games (F.E.A.R., Company of Heroes) but Phenom can't even measure up to Q6600/Q6700 in the rest of the test. In the Q9300 showdown, Phenom only had an edge with CoH, and mind you, E8500 still won over Phenom in that test at high res.

And PCSX2 source code can be found at http://www.pcsx2.net
Does it hurt to actually do the Googling yourself every once in a while?
08-15-2008, 03:09 PM
JumpingJack

ooops... weird, double post.
08-15-2008, 03:17 PM
JumpingJack

1 Attachment(s)

Quote:

Originally Posted by gosh

Well it isn’t.
There are some links in this thread showing that AMD performs better on high resolutions.

Single threaded games you may be right, core 2 is designed for that

:) This is because of the configuration -- it throttles at the GPU :) .... you just can't get that :) and even observing the data, watching faster GPUs relieve that bottle neck and go to higher FPS doesn't convince you. This is just weird.

It's ok Gosh -- nobody believes you, this is why you get the gruff you get. You're really a nice guy overall. Bottom line, it does not matter -- if running at high resolution -- who is better. Even if the GPU is the limiter and AMD scores 4 FPS higher by some forum random postings, the game play will be the same on either.

Unfortunately, if you bought a Phenom with this misconception on a 8800GTX card, thinking you would be able to exploit the latest round of cards ... you have wasted your money on a GPU upgrade, the Phenom will just not feed it fast enough (not because of lack of BW, but because it takes the Phenom longer to complete the computational cycle).

The X2's came today, I should have some data by tonight....
08-15-2008, 03:27 PM
JumpingJack

Quote:

Originally Posted by skinnee

:brick:

I read this entire thread...no amount of factual information is going to get gosh to finally come around. He will continue on his path of circular reasoning until you guys flare out...and then he will claim victory.

I won't flare out :) ... this is fun.
08-15-2008, 03:29 PM
JumpingJack

Quote:

Originally Posted by gosh

Is this a competition?
Do you feel better if I say that you win?
The main argument in gaming when threaded games are used for AMD hasn’t been talked about as much here. It’s about bottlenecks. We mostly talks about which can get highest avg FPS.
y.

All the data produced in this thread are games that are multithreaded across all 4 cores...

Lost Planet, GRID, World in Conflict utilize all 4 cores. Gosh -- clock for clock, Intel produces a computational result faster than AMD does single threaded, dual threaded or quadruple threaded... there is no question.

Does this make the Phenom a bad CPU? No. Does it require AMD to price it accordingly? Yes. This is why AMD is losing billions, they cannot make a profit and charge 200 bucks or less for a quad core processor, while trying to support the low end with dual cores that are even slower than the competition.
08-15-2008, 03:47 PM
RunawayPrisoner

Quote:

Originally Posted by JumpingJack

Even if the GPU is the limiter and AMD scores 4 FPS higher by some forum random postings, the game play will be the same on either.

True enough. I get about 20fps in Lost Planet with 8x AA and 16x AF, but the game looks pretty, and still runs mirror smooth. Same goes for Crysis. It just doesn't matter which one runs better. 1 or 2fps isn't that important.

Quote:

The X2's came today, I should have some data by tonight....

Grats! Start spanking something? I'm expecting lots of graphss and lots of paragraphs in a separate thread. :up:
08-15-2008, 05:28 PM
gosh

Quote:

Originally Posted by JumpingJack

The X2's came today, I should have some data
by tonight....

very interesting to see :)

is this done with phenom (or is it fake maybe)?

http://www.xtremesystems.org/forums/...d.php?t=197648
http://www.xtremesystems.org/forums/...1&d=1218151935
08-15-2008, 05:41 PM
Boschwanza

Quote:

Originally Posted by JumpingJack

All the data produced in this thread are games that are multithreaded across all 4 cores...

Lost Planet, GRID, World in Conflict utilize all 4 cores. Gosh -- clock for clock, Intel produces a computational result faster than AMD does single threaded, dual threaded or quadruple threaded... there is no question.

I partly agree with you ;), there is at least one szenario where Phenom will outperform an Intel2Q just due superior architecture. Excuse me for not adding some information about gpu bound discussionen.

I just quote myself

Quote:

That is, in principle, the huge advantage of the C2D. It loads the data early enough to perform efficiently, so the application can count immediately, without accessing the memory .

This principle can get problems with SMT and crashes with ongoing data transfer (depending of the type of data transfer , its important how much data has to be calculated). In terms of streaming Data K10 wins, in terms of ex. video encoding there is more calculating and the C2D takes advantage of its huge L2Cache an the mighty prefetcher in the background.

Now here comes the problem: with SMT and 4 (or more) Threads the prefetcher has problems finding out which Data is needed an works partially effective. Does the prefetcher well the result is very good and L2 and prefetcher works great. But when something unexpected happens a very long memory access is needed and the whole structure collapses while k10 still can handle the coherency due to the shared L3 Cache and even if the Data is not in the L3 Cache K10 can loads the Data 3 times faster then C2Q. Again : this will happen only with massive SMT .

At least for a game such a szenario will be very rare just because of the fact most of the games have one "main" thread for gpu rendering and some help threads for the other stuff (KI...) . For an Intel thats very great the prefetcher can "always" allocat the Data to the right core. From my personal technological view C2D is the best DualCore Processor but i terms of QuadCore processors K10 is the better (and so will be Nehalem)
08-15-2008, 05:44 PM
JumpingJack

Quote:

Originally Posted by Boschwanza

I partly agree with you ;), there is at least one szenario where Phenom will outperform an Intel2Q just due superior architecture. Excuse me for not adding some information about gpu bound discussionen.

I just quote myself

At least for a game such a szenario will be very rare just because of the fact most of the games have one "main" thread for gpu rendering and some help threads for the other stuff (KI...) . For an Intel thats very great the prefetcher can "always" allocat the Data to the right core.

Fair enough, I should have been a bit less commanding and left room in the wording... my bad. ... there are situations where the architecture does shine ... h264 encoding is a great example.
08-15-2008, 05:53 PM
JumpingJack

Quote:

Originally Posted by gosh

very interesting to see :)

is this done with phenom (or is it fake maybe)?

http://www.xtremesystems.org/forums/...d.php?t=197648
http://www.xtremesystems.org/forums/...1&d=1218151935

If you read through the forums, the link that generated this data came from chiphell...

This is the CPU that generated it:
http://www.chiphell.com/?action-view...mid-211-page-3

Q6600, not Phenom.

EDIT: Oooops, then I read lower in the thread and this is not the link. Hold on let me search some more.

EDIT2: What CPU, I see no mention of phenom in there. And I don't know if it is fake because Chiphell does not have a 4870X2 review posted.

Jack
08-15-2008, 06:54 PM
RunawayPrisoner

The CPU that did that was a QX9650 at 4GHz.

Why? Cuz...

http://bbs.chiphell.com/viewthread.p...extra=page%3D1

And cuz...

http://bbs.chiphell.com/viewthread.p...extra=page%3D1

Well... cuz the data matches, more or less.
08-15-2008, 09:16 PM
Loser777

Quote:

Originally Posted by JumpingJack

Fair enough, I should have been a bit less commanding and left room in the wording... my bad. ... there are situations where the architecture does shine ... h264 encoding is a great example.

I seem to recall the P4's shine in Video Encoding when A64 was kicking Prescott's butt. :rolleyes:
08-15-2008, 09:51 PM
JumpingJack

Quote:

Originally Posted by Loser777

I seem to recall the P4's shine in Video Encoding when A64 was kicking Prescott's butt. :rolleyes:

The one thing they could do well :) (aside from heating a small room) ... that is because video encoding is not very 'branchy'.

jack
08-15-2008, 10:04 PM
JumpingJack

3DMark06

Phenom@2.5 Ghz 4870X2 1680x1050 2xA 16xAF
http://forum.xcpus.com/gallery/d/736...70X2_hires.JPG

Phenom@2.5 Ghz 4870X2 Default Settings
http://forum.xcpus.com/gallery/d/736...X2_default.JPG

QX9650@2.5 Ghz 4870X2 1680x1050 2xA 16xAF
http://forum.xcpus.com/gallery/d/736...70X2_HIRES.JPG

QX9650@2.5 Ghz 4870X2 default
http://forum.xcpus.com/gallery/d/737...X2_default.JPG

Scores are low partly due to clock speed, partly due to me not optimizing. Catalyst is set to default as installed.
08-15-2008, 10:09 PM
adamsleath

Quote:

".. an ounce of honest data is worth a pound of marketing hype."

...
08-15-2008, 10:10 PM
Loser777

Comon, gosh is just gonna argue that the Phenom lost because the 3dmark SM2 and SM3 tests are mostly singlethreaded and that the CPU test is unfair.

Quote:

".. an ounce of honest data is worth a pound of marketing hype."

I remember when I bought into the whole Phenom is 40% faster than C2Q... :rolleyes:
08-15-2008, 10:11 PM
adamsleath

mostly single threaded, yes indeed.
phenom could be much more compelling in a multi-threaded universe.
anyway; it seems close anyway, based on the limited info i have read :D
it is single threaded apps (and software generally) that are limiting uptake of quad core cpu's in my opinion.
08-15-2008, 10:58 PM
JumpingJack

So yesterday, I produced World in Conflict bench runs, trying to match at least the game settings. This is what I had from yesterday, just copied over from the the other page:

Quote:

QX9650 @ 2.67 GHz (333x8) DDR2-1067 8800 GTX in order of resolution

http://forum.xcpus.com/gallery/d/728...C_SETTINGS.JPG http://forum.xcpus.com/gallery/d/729...C_SETTINGS.JPG http://forum.xcpus.com/gallery/d/728...C_SETTINGS.JPG

Phenom 9850 @ 2.5 GHz (200x7.5) DDR2-800 8800 GTX
http://forum.xcpus.com/gallery/d/730...C_SETTINGS.JPG http://forum.xcpus.com/gallery/d/730...C_SETTINGS.JPG http://forum.xcpus.com/gallery/d/730...C_SETTINGS.JPG

Phenom 9850 @ 2.5 Ghz (200x7.5) DDR2-1067 8800 GTX

http://forum.xcpus.com/gallery/d/732...C_SETTINGS.JPG http://forum.xcpus.com/gallery/d/732...C_SETTINGS.JPG http://forum.xcpus.com/gallery/d/732...C_SETTINGS.JPG

QX9650 @ 2.67 GHz
1024x768. Max = 123 Ave = 50 Min = 21
1280x1024 Max = 104 Ave = 47 Min = 22
1650x1080 Max = 95 Ave = 46 Min = 21

Phenom @ 2.5 Ghz DDR2-800
1024x768. Max = 69 Ave = 27 Min = 10
1280x1024 Max = 69 Ave = 27 Min = 10 (this is odd, exactly the same)
1650x1080 Max =70 Ave = 28 Min = 10

Phenom @ 2.5 GHz DDR-1067
1024x768. Max = 67 Ave = 28 Min = 11
1280x1024 Max = 68 Ave = 27 Min = 9
1650x1080 Max =72 Ave = 29 Min = 9

Here is 1024x768 run on QX9650 @ 2.67 G 4870 X2
http://forum.xcpus.com/gallery/d/737...C_SETTINGS.JPG

Here are the same runs on the 4870 X2

Phenom 9850@2.5G DDR2-800 4870X2 1024x768 0xAA 16xAF
http://forum.xcpus.com/gallery/d/737...C_SETTINGS.JPG

QX9650@2.67G OCC Settings Max = 106 Ave = 49 min = 17
Phenom@2.5G OCC Settings Max = 78 Ave = 33 min = 11

Hardly did anything at all going to a new card -- hmmmmmm weird.
08-15-2008, 11:10 PM
JumpingJack

Ok, next the fun one....

Here is my settings, meant to stress the CPU and take the GPU out of the equation.... with a 8800 GTX I got these results:

Quote:

QX9650 @ 2.5 GHz, DDR2-800 (in this case, all my baseline data is there)

http://forum.xcpus.com/gallery/d/732...80X1024_r1.JPG

Phenom 9850 @ 2.5 GHz, DDR2-800

http://forum.xcpus.com/gallery/d/733...80x1024_R1.JPG

Here is today's new 4870 results (mind you, in this experiment I match the clock speeds of the processors).

Phenom@2.5 1280x1024
http://forum.xcpus.com/gallery/d/738...ighphysics.JPG

QX9650 2.5G 1280x1024
http://forum.xcpus.com/gallery/d/738...s_SETTINGS.JPG

To summarize

Phenom 9850 @ 2.5G 1280x1024
...8800 GTX...max =148 ave =61 min =28
...4870 X2.....max =154 ave =68 min = 32

QX9650 @ 2.5G 1280x1024
...8800 GTX...max =302 ave =121 min =53
...4870 X2.....max =234 ave =101 min = 48

The phenom gained about 10% on avereage, but the X2 lost. I will need to check if there is something wrong. But my suspicion is that there is some more work for the ATI programmers in order to get this better on WIC.
08-16-2008, 12:24 AM
BenchZowner

For a more real test Jack, could you test WiC @ 1680x1050 4xAA 8xAF and all the game settings @ Full ? ( except DX10 if you're running XP )
08-16-2008, 02:48 AM
Hornet331

Quote:

Originally Posted by BenchZowner

For a more real test Jack, could you test WiC @ 1680x1050 4xAA 8xAF and all the game settings @ Full ? ( except DX10 if you're running XP )

then your testing the 4870x2, not the cpu. :p:
08-16-2008, 03:33 AM
gosh

Quote:

Originally Posted by Loser777

Comon, gosh is just gonna argue that the Phenom lost because the 3dmark SM2 and SM3 tests are mostly singlethreaded and that the CPU test is unfair.

No I don't!
I have tried to explain in this thread.

Here is some of AMD strong points (what I think) in gaming and why those will show on more complex games. I am NOT a game programmer (that type of programming is very boring) but have some knowledge about DirectX (very little so I might be wrong here).

If a game has live graphics (faces that can show feelings, wind that can move trees etc) then processor need to calculate the picture. This will of course increase the burden for the processor but the cache used for that type of calculation is probably similar in AMD and Intel. They calculate points in a 3d space and pack them in memory blocks. When the points for one block has been calculated it s transferred to the video card. This is done by allocating video memory (I think that memory on the video card is some how mapped to one address space for the computer memory area). There are special commands for this. When the block has been allocated the calculated points are copied to the video card, maybe they use memcpy in C++ for this. When the points has been copied some sort of command is used that can work on those points. Video card communication has high latency so there can’t be too much requests between the video card and the processor, the bandwidth is high so if you can pack more data in blocks you will gain speed.

AMD is using hypertransport for GPU data. If this copying of memory travels using hypertransport these blocks will not hold back any memory transfers because memory transfers are done using the IMC. On Intel all these copying of points to the GPU needs to go through the FSB and that can sometimes block memory transfers and the latency will go up. If one thread needs that it will be slower and if another thread is waiting for that thread then it will also be slower. It’s like a chain reaction in worst case scenario.

In single threaded applications all operations are done in sequence. And there are no conflicts. Advantages using a different passage for one type of traffic don’t exist there. In multithreaded applications one thread could be used for sending data, other threads prepare buffers for the sender thread in order to do work in parallel. If they are using memory and the sender thread is sending data then there is one conflict on Intel and that will delay operations.
I think that live graphics need to send much more data to the video card because the picture needs to be calculated. In race driver grid for example, if there is a crash and smoke then you can se that the processor works harder and is probably more data that is sent between the video card and processor. If the game has parts in the picture that isn’t changed then it probably isn’t that much data that is transferred.

In complex games (live graphics) and the resolution goes up then the game probably is going to use more memory (more data needs to be calculated). That will make the processor use more data from memory instead of finding it in the cache (the cache on Intel is HUGE so it might need some very high and complex picture for this). If threads need to communicate more, smaller threads then this traffic will also increase the burden on the FSB for C2Q. If memory transfers to the video card are running, synchronization will have higher latency. On AMD threads talks to each other using the L3 cache.

When and how different system designs will be better depends on the game of course. But complex games that are calculating much of the picture, is using more than one thread to do this will make the AMD system design to show some of its advantages. Increasing thread count and system design will will shift more to favour AMD.

If you compare only processor speed than Intel is faster because of the cache. If AMD and Intel have the SAME FPS then it is something OTHER than the processor that makes the FPS equal and the answer to that can’t be more than what differs from AMD and Intel.

If you compare these designs you will also be able to se that there exist more bottlenecks on Intel. Raw processor power than Intel wins, but if the game does something special or there are big changes and much data needs to be processed or recalculated then this will need more time on Intel.

EDIT: About mapped memory and hypertransport

http://www.amd.com/us-en/assets/cont...docs/40546.pdf
Appendix B
AMD Family 10h processors support four write-combining buffers. Although the number of buffers available for write combining depends on the specific CPU revision, current designs provide as many as four write buffers for WC memory mapped I/O address spaces. These same buffers are used for streaming store instructions. The number of write-buffers determines how many independent linear 64-byte streams of WC data the CPU can simultaneously buffer.
Having multiple write-combining buffers that can combine independent WC streams has implications on data throughput rates (bandwidth), especially when data is written by the CPU to WC memory mapped I/O devices, residing on the AGP, PCI, PCI-X® and PCI Express® buses including:
•Memory Mapped I/O registers—command FIFO, etc.
•Memory Mapped I/O apertures—windows to which the CPU use programmed I/O to send data to a hardware device
•Sequential block of 2D/3D graphic engine registers written using programmed I/O
•Video memory residing on the graphics accelerator—frame buffer, render buffers, textures, etc.

HyperTransport™ Tunnels and Write Chaining
HyperTransport™ tunnels are HyperTransport-to-bus bridges. Many HyperTransport tunnels use a hardware optimization feature called write-chaining. In write-chaining, the tunnel device buffers and combines separate HyperTransport packets of data sent by the CPU, creating one large burst on the underlying bus when the data is received by the tunnel in sequential address order. Using larger bursts results in better throughput since bus efficiency is increased.
[...]
08-16-2008, 06:33 AM
BenchZowner

Quote:

Originally Posted by Hornet331

then your testing the 4870x2, not the cpu. :p:

Point is...
We already have several tests in CPU Limited scenarios ( relatively low resolutions and noAA noAF, and medium to high details )... and we all know already and it has been proven a million times that the Core 2 processors are faster ( clearly ) than the Phenoms here.

In real-life scenarios ( medium to full details depending on the game and your VGA and monitor, usually combined with some AA & AF ) once again we've seen some tests, but some people are still skeptic.
So... if you're about to gain anything with a Phenom ( assuming that it's faster in high resolutions as gosh insists ) making it a worth buying CPU ( for a gamer ) would be some FPS in real-life scenarios ( Crysis @ High @ 1680x1050 2xAA 8xAF, Call Of Duty 4:M.W. @ Full Details @ 1920x1200 4xAA 16xAF, etc ).

In this case my bet goes to "equality".
I believe that both platforms will be "scoring" nearly the same, or within the normal run to run variation margin.
Maybe 1 to 3 exceptions will be some games that are very CPU limited even at those settings ( x3: The Threat, Supreme Commander, come to my mind atm ), maybe not...
08-16-2008, 08:58 AM
RunawayPrisoner

Quote:

Originally Posted by gosh

Here is some of AMD strong points (what I think) in gaming and why those will show on more complex games. I am NOT a game programmer (that type of programming is very boring) but have some knowledge about DirectX (very little so I might be wrong here).

If a game has live graphics (faces that can show feelings, wind that can move trees etc) then processor need to calculate the picture. This will of course increase the burden for the processor but the cache used for that type of calculation is probably similar in AMD and Intel. They calculate points in a 3d space and pack them in memory blocks. When the points for one block has been calculated it s transferred to the video card. This is done by allocating video memory (I think that memory on the video card is some how mapped to one address space for the computer memory area). There are special commands for this. When the block has been allocated the calculated points are copied to the video card, maybe they use memcpy in C++ for this. When the points has been copied some sort of command is used that can work on those points. Video card communication has high latency so there can’t be too much requests between the video card and the processor, the bandwidth is high so if you can pack more data in blocks you will gain speed.

Whaa...? If someone can at least explain to me what you are trying to say... :eek: "Maybe they use memcpy in C++" for this...? I am starting to question your programming knowledge, sir. Seriously. (But I don't want to jump to conclusions, as I have noticed that English is not your first language. Either way, I think I'll need to lay off this until someone properly explains exactly what you are trying to say)
08-16-2008, 10:58 AM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

Whaa...? If someone can at least explain to me what you are trying to say... :eek: "Maybe they use memcpy in C++" for this...? I am starting to question your programming knowledge, sir. Seriously. (But I don't want to jump to conclusions, as I have noticed that English is not your first language. Either way, I think I'll need to lay off this until someone properly explains exactly what you are trying to say)

:) Ok, I think finally with this post everyone can see where the confusion lies. Let's not gang up on this but try to explain where the problems are.

@ Gosh

In a nutshell -- the CPU is not responsible for creating the actual image the generates a frame to be displayed. This is what the GPU is for, and the reason it is called a Graphics Processing Unit. Increasing resolution affects the computational load that the GPU must endure, not the CPU.

I will go through this point by point later.... busy at the moment. I will do it in several posts, and it may take a few days :) so please be patient, keep an open mind, and read carefully what I write and the references that I link.

Jack
08-16-2008, 11:17 AM
JumpingJack

Quote:

Originally Posted by BenchZowner

For a more real test Jack, could you test WiC @ 1680x1050 4xAA 8xAF and all the game settings @ Full ? ( except DX10 if you're running XP )

Yeah... will do.
08-16-2008, 11:31 AM
gosh

Quote:

Originally Posted by RunawayPrisoner

Whaa...? If someone can at least explain to me what you are trying to say... :eek: "Maybe they use memcpy in C++" for this...?

It’s a very common function used in C++ for moving memory. In assembler there are all these mov (wich is used in memcpy also, one processor don't know that many different commands) and variants of that command. What you do is moving memory from one location to another. Game applications are very much about moving memory. Why I wrote that function was because the Intel compiler deoptimized it for non intel processors.

I hope someone else can explain in better english what I wrote :)
08-16-2008, 05:16 PM
JumpingJack

To those following this thread and interested in an X2 update. My general comment. It is clear that the drivers are still beta like, some games are just blazing fast. Other games it is benching in slower than my old 8800 GTX -- examples.

Lost Planet Snow is slower than on an 8800 GTX.
Company of Hero's is much faster (much much faster)

But it appears, from the TechPowerUp data that it is running 1 GPu only, and no x-fire option in CCC ... so I am working on figuring it out. Good news is that I am reproducing the TechPowerUp single GPU numbers within a few FPS.

I am anxious for a driver refresh, hopefully they will be able to update the x-fire profile for all the games to harness the true potential of these cards.

jack
08-16-2008, 05:35 PM
jspace

OK, I didn't really read this entire thread but, from what I gather, AMD has a faster GPU solution than Intel? Funny, I didn't know Intel released Larrabee or any other discrete GPU.

@gosh

Quote:

Originally Posted by gosh

Game: STALKER
Settings: 1920x1200, 16xAF everything MAX
Q9450 = 83
Phenom 9850 = 84

Game: TDU
Settings: 1920x1200, 4xAA/16xAF, everything HIGH
Q9450 = 45
Phenom 9850 = 47

Game: Half Life 2
Settings: 1920x1200, 4xAA/16xAF, MAX everything.
Q9450 = 295
Phenom 9850 = 300

These are within a generally accepted margin of error. There could easily be a larger separation even on the same platform. If the tests were done correctly, there would be multiple runs, averages, and charts for good measure. Because charts make everything seem more official.:D This would help eliminate that margin of error that ONE SCREEN SHOT has.

You do realize that a good dual core (read: E8500) will beat both of those right?

Of course, you have been told this, or similar, many times. IMO, this thread should be closed.

@JJ
Awaiting your results:up:
08-16-2008, 06:10 PM
JumpingJack

Well, gotta figure out why it is not x-firing the two CPUs on the card. All my scores align with a single GPU, not the dual GPU. Frustrating
08-16-2008, 06:13 PM
RunawayPrisoner

Just out of curiosity, what is your score in Lost Planet Snow? (At the moment)
08-16-2008, 06:21 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

Just out of curiosity, what is your score in Lost Planet Snow? (At the moment)

I haven't played too much .... I wanted to see low res first...

8800 GTX 640x480 is 185 FPS
4870 X2 640x480 is ~140 FPS

At higher res (1280x1024)
8800 GRX is 101ish
4870 X2 is 84 ish

I have a new PCI device that HW wizard cannot find drivers for... but i think it is due to the HDMI drivers for the card, not anything to do with the actual issue I am seeing here. All i can say is thank god for Wizzard and TechPowerUp... without their data I would have not understood as clearly what was going on.

EDIT: well, resolved the unknown PCI device issue. There is a microsoft UAA bridge driver that was not allowing the identification of the HDMI capability on the 4870 X2. I have been running XP SP2 for the longest time on this build (for consistency), on these two builds I have 4 partitions on the system drive, first partition is for the 'identical builds' of the two rigs, 2nd partition is for 'scratch' work, i.e. testing, checking out new drivers, etc before I disturb the primary build.

XP is great, I really appreciate the fact MS lets you activate XP as many times as you want on the same HW... i could install 20 copies of XPS on the same computer if I wanted. They hosed us with Vista.
08-16-2008, 07:23 PM
RunawayPrisoner

Mmm... couldn't find 1280 x 1024, so I stuck with 1280 x 960 instead. Settings at default, with Texture Filtering set to "Trilinear"

Q9450 @ 2.66GHz w/ 2GB DDR2 667MHz and HD4870 at 780/1000 (default, my card is a MSI pre-overclocked card)

Snow: 92
Cave: 84

GPU downclocked to 750/900 (default of reference HD4870):

Snow: 87
Cave: 82

CPU overclocked to 3.6GHz and GPU at default 780/1000:

Snow: 93
Cave: 112

Hope that helps... somehow.

Edit: Gameplay data through snow (first mission) at 3.6GHz, 900MHz RAM, 780/1000. All settings at High. 8x AA, 16x AF, retail version, unpatched:

Min: 32fps
Avg: 41fps
Max: 78fps
08-16-2008, 07:44 PM
JumpingJack

It does help! I am thinking there will be major improvements in this game after a few revisions of Catalysts.
08-16-2008, 08:01 PM
RunawayPrisoner

Or in all games. Although the card is too overkill for my system right now (Crysis works perfectly fine at 4x AA and 1280 x 1024), I still think there's room for improvement on ATI's part. This thing seems like it can be much faster, and currently, the drivers seem to offload some work on the CPU (all 4 cores utilized while running some dual-core games, especially noticeable in applications that fully stress the graphics card).

Actually, if you want to see it offloading work on the CPU, try PCSX2. I've seen up to 50% of the extra two cores of the CPU being utilized while the graphics card is stressed in that software.
08-16-2008, 08:34 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

Or in all games. Although the card is too overkill for my system right now (Crysis works perfectly fine at 4x AA and 1280 x 1024), I still think there's room for improvement on ATI's part. This thing seems like it can be much faster, and currently, the drivers seem to offload some work on the CPU (all 4 cores utilized while running some dual-core games, especially noticeable in applications that fully stress the graphics card).

Actually, if you want to see it offloading work on the CPU, try PCSX2. I've seen up to 50% of the extra two cores of the CPU being utilized while the graphics card is stressed in that software.

Well, the GPU people work differently than the CPU people... each game has a tailored profile, and the x-fire or SLI usually takes more time to tweak out.
08-16-2008, 09:30 PM
JumpingJack

Well ... I am beginning to get this all figured out. The 4870 X2 is a mixed bag at the moment, and there is a lot of sensitivity to configuration, expecially CPU clockspeed and core count. I will not be doing anything systematic just yet until I feel comfortable that I can reproducibly recreate scores from various reviews (particularly, TechPowerUp -- that was one of the best on the net). However, my problems that I mentioned above were not because of a single GPU only firing, it was because I was clocked too low on my processor.

To make a long story short.... I am interested in comparing processors at a base clock speed that is valid, my 9850 is stock at 2.5 Ghz, so I have been running most of my testing around there. So in some games I am seeing huge improvements, in others not so much -- I thought very odd. However, it is clear that the X2 pushes the current gaming matrix (with a few notable exceptions, Crysis, WIC) to the CPU limited domain.

To see what I mean...

Here is 3DMark06 (Default values, 1280x1024, no AA, no AF) for a 2.5 Ghz QX9650:
http://forum.xcpus.com/gallery/d/737...X2_default.JPG

Here is 3DMark06 for a 3.67 GHz QX9650:
http://forum.xcpus.com/gallery/d/739...0X2_WOWSER.JPG

Here is why I say 'core count' with respect to 3DM06
Techreport's 3.6GHz 8400 got a 3DM06 score of 17555

Ok, so the summary:

2.5GHz QX9650: 3DMark06 score 14471, SM2 5231, SM3 7316, CPU 4025
3.67GHz QX9650: 3Dmark06 score 20609, SM2 7655, SM3 10285, CPU 5753
08-16-2008, 10:53 PM
Loser777

I had the same behavior with my Q6600 and 9800GX2... it's because the SM2 and SM3 tests are singlethreaded, so core clock is very important. Try overclocking the GPU with the Qx9650 @ 2.5GHz and you won't see any improvement.
08-16-2008, 10:58 PM
JumpingJack

Quote:

Originally Posted by Loser777

I had the same behavior with my Q6600 and 9800GX2... it's because the SM2 and SM3 tests are singlethreaded, so core clock is very important. Try overclocking the GPU with the Qx9650 @ 2.5GHz and you won't see any improvement.

You, know I haven't checked... that would make sense. I swear, COH must be the most patched game ever.
08-17-2008, 07:46 AM
gosh

Question: Is there information about the difference in latency reading and writing data comparing Hypertransport and the Front Side Bus?
Hypertransport is designed to be a very fast point to point communication (if I am right). If this communications is very fast it could be one explanation why AMD is even or sometimes have a bit better numbers on some tests when they test single threaded games on very high detail and high resolution.
If communication to external hardware is similar on AMD and Intel then Intel should always win ins single threaded games if they are clocked the same even if the main bottleneck is the GPU. 6 MB L2 cache at 15 clocks that can be used for one core compared to 512 KB L2 cache on AMD does make a huge difference (I think more than 10% in all single threaded games) if processor performance are compared.
08-17-2008, 08:36 AM
JumpingJack

Quote:

Originally Posted by gosh

Question: Is there information about the difference in latency reading and writing data comparing Hypertransport and the Front Side Bus?
Hypertransport is designed to be a very fast point to point communication (if I am right). If this communications is very fast it could be one explanation why AMD is even or sometimes have a bit better numbers on some tests when they test single threaded games on very high detail and high resolution.
If communication to external hardware is similar on AMD and Intel then Intel should always win ins single threaded games if they are clocked the same even if the main bottleneck is the GPU. 6 MB L2 cache at 15 clocks that can be used for one core compared to 512 KB L2 cache on AMD does make a huge difference (I think more than 10% in all single threaded games) if processor performance are compared.

Gosh ....

You are confused a bit about the communication to and from the GPU on the different platforms.

Let's take getting a chunk of data from memory to the GPU (which does happen, just not the volume you think it is)....

AMD's hypertransport connects the chipset to the CPU, then the CPU to memory. Intel's layout connects the memory to the chipset then to the GPU. PCIe 2.0 is spec'ed to be it's own bus master, as opposed to earlier implementations which used DMA. On Intel's platform, the GPU has direct access to low level system memory for various data and the CPU simply writes the command buffer on GPU memory simply because it does not need to access the frontside bus to get to the memory data to begin with. The GPU is only one hop away from memory on the Intel platform, it is two hops away on AMD's.

In terms of the data the GPU gets, it is in fact very little, not enough to saturate the FSB or HT ... the CPU populates the command buffer on the GPU for the GPU to do it's action, all the other large data elements are precached onto the GPU memory (hence the reason GPU card makers keep upping memory, to keep pace with the large textures of todays games)

http://people.cs.uchicago.edu/~robis.../gpu_paper.pdf

Quote:

The GPU is able to make calls to a certain window of the system’s main memory and is responsible for loading the data it will operate on into its own memory. The CPU directs this activity by writing commands to a command buﬀer on the GPU. On the old ﬁxed-function pipeline, these commands associated matrices with vertex data. Now, CPU commands could very well point to program code that will be fetched and executed on the GPU.

This is essentially your second misunderstanding, in that you are thinking that all the data for a game event is stored in system memory, it is not .. I provided you a link that showed the different usages of video memory per game, perhaps you did not realize that that was reporting the video memory on the card and not system memory. Not sure, but all the heavy duty data that is needed for rendering a level is first loaded into the GPU's local memory (textures, vertex data, etc.) this is why when you start a game it takes several seconds (20, 30 or even a few minutes) to load... it is transferring that data over the low BW bus (both HT and FSB are low BW compared to the memory BW of a GPU).

Even nVidia provides you the concept of the partition between main and GPU memory:
http://http.developer.nvidia.com/GPU...s/fig28-01.jpg
http://http.developer.nvidia.com/GPU...gems_ch28.html

The point is.... on an Intel platform the Graphics Memory Controller Hub (GMCH) provides one hop access for the GPU to memory. AMD's arrangement puts it as a two hop access ... if anything, Intel provides lower latency access for the graphics card to main memory, in either case ... it is irrelevant since all the texture and geometry data is loaded to video ram (with it's high BW interface) before run time.

This is moot regardless, because the volume of data needed by the GPU from main memory is very small, since all the data that the GPU needs is placed in the Vertex, Texture, Mesh, and other buffers on the local GPU memory. The rendering for the scene is done by the GPU via commands written to the command buffer.

If the bottleneck is the GPU, then AMD and Intel will tie +/- a few FPS just on noise of the measurement, single threaded, multithreaded -- it does not matter. Again, Gosh ... moving to parallel computational methods for gaming code or any code, will simply speed up the computational result than that to be had over a single thread. A single task application will always speed up if you can run segments in parallel over simple sequential execution. The trick, and challenge, of multithreading in gaming is the interdependency of segments on the other. This is why you see some speed up but not a 2x gain, for example, going from single to dual thread. This is really nothing more than an example of Amdahl's Law.

Intel produces a computational result (clock for clock) faster than AMD, and as such, the CPU depended code will finish faster single or multithreaded, hence Intel will be faster in games.

You are being fooled and misled by the forum posters who run their tests upto the GPU limit then you make an incorrect conclusion that it is somehow manefested in single/multithreaded. This happens all the time....

At low resoltutions, Intel wins by 20 to 30 to up to 50% clock for clock in gaming, but at high resolutions they show tied... this is again due to GPU bottlenecking the computation flow.

nVidia states it in their own words:
http://http.download.nvidia.com/deve...erformance.pdf

They show you a bottleneck flow chart that does exactly what we have been telling you.... to find the GPU bottleneck, vary the resolution, if the FPS varies it is the GPU or some component within the GPU pipeline, if not it is the CPU:
http://http.developer.nvidia.com/GPU...s/fig28-02.jpg

Varying or increasing the graphically important parameters in a game changes the computational workload on the GPU (NOT THE CPU). This is why, when one wants to assess the computational capability of a CPU to the gaming code, it is important to observe the CPU as the limiter (i.e. low resolutions) in order to make a statement on how well the CPU can handle the code of the game the requires the CPU (i.e. non graphical code such as physics, AI, boundary collisions, etc.)

Let's go back to your latency question.... it is MOOT, even if the FSB latency was 3x longer it would not make a difference.

At 200 frames per second, the GPU is busy rendering ~ 1/200 seconds or 0.005 seconds. This is 5 milliseconds, or 5000 microseconds, or 5,000,000 nanoseconds. Latency of even 200 nano seconds is a wink compared to the time the GPU is spending in it's calculation, even for a very high frame rate.

The more interesting question to ask is what architectural feature of the Core uArch is allowing Intel to perform so much better at executing gaming code vs AMD's solution?

Jack
08-17-2008, 08:54 AM
gosh

Quote:

Originally Posted by JumpingJack

Let's take getting a chunk of data from memory to the GPU (which does happen, just not the volume you think it is)....

What is the volume?

Quote:

Originally Posted by JumpingJack

Your second misunderstanding it thinking that all the data for a game event is stored in system memory, it is not ..

What I wrote was the points in 3d space is calculated and sent from the CPU. I know that DMA (http://en.wikipedia.org/wiki/Direct_memory_access) is used to load these textures etc that is used to "paint" the picture. If all this data would be sent to the gpu for every picture than it would be so slooooow ;). I think that there is enough data that is sent any way. Just look at 3d drawings that have that grid like looks. And add to that all command used to inform how to "paint" the picture

What I asked for was if there is information about latency comparing Hypertransport and the Front Side Bus?

EDIT: If the GPU needs to use RAM (access ram on the motherboard) during gameplay the performance is going down the toilet as we say :)
08-17-2008, 09:50 AM
JumpingJack

Quote:

Originally Posted by gosh

What is the volume?

Less than what 800 Mhz FSB can support :) you don't need to know, you can setup an experiment to see if it matters.... I.e. change the FSB speed and see if it changes the results..... I showed you that data http://www.xtremesystems.org/forums/...6&postcount=59.

Nonetheless, the thought exercise was not about the size but how each platform retrieves a chunk of data.

I also showed you that Intel scales a multicored multithreaded game along the same curve as an AMD CPU:
http://forum.xcpus.com/gallery/d/660...reScaling2.jpg

However, if you want to measure it.... you can download Intel's Vtune or AMD's CodeAnalyst and monitor the counters of the bus busy line.

Quote:

What I wrote was the points in 3d space is calculated and sent from the CPU. I know that DMA (http://en.wikipedia.org/wiki/Direct_memory_access) is used to load these textures etc that is used to "paint" the picture.

The vertexes are already loaded to GPU memory, changes in that geometry (such as a wall blowing up) is sent by the CPU, but the entire 3D mesh is not recalculated everytime... camera position and perspective is, and this is what the GPU uses to render the image. The GPU also stores the Z-buffer, which determines which surfaces are visible in front of one another within the perspective of the camera.

This is by design, you are correct in saying that the FSB is slow... so is HT... in fact, HT is slower than the FSB in one direction (which would be from CPU to GPU). 2000 Mhz HT line gives 2 bytes of data in one direction, or 4000 MB/sec or 4.0 GB/sec... FSB is half/duplex, giving 1333x8 in one direction or 10.6 GB/sec.... technically speaking, if data needs to get from the CPU to the GPU, Intel would provide more peak BW.

Nonetheless, GPU makers and the HW/software has evolved to move all the BW necessary components to the local memory of the GPU and design in 100 GB/sec BW from v-RAM to GPU for this very reason. All the GPU needs from the CPU is what do I need to do next (a command list, hence a command buffer).

Now, when the resolution goes up high enough and the size of the textures are larger than what can be held in VRAM, then yes... a huge performance hit is taken because now the GPU must fetch a texture it does not have from system memory across that slow FSB or HT link. I have only seen recent examples of this:
http://www.anandtech.com/video/showdoc.aspx?i=3372&p=9
http://www.guru3d.com/article/radeon...w-crossfire/11
Notice the 512 MB cards dropping like a rock going from 1900x1200 to 2560x1600 ... at 2560x1600 the textures are too large to fit into VRAM... and the reviewers correctly conclude that. This is called texture thrashing. I even provided you a link showing modern games VRAM usage earlier. The vast majority games, in fact all that I have seen so far, are able to fit a levels worth of textures into 512 Meg. Grid is the first I have seen that will exceed the 512 MB barrier -- and it only does so at 2560x1600.

Quote:

If all this data would be sent to the gpu for every picture than it would be so slooooow ;). I think that there is enough data that is sent any way. Just look at 3d drawings that have that grid like looks. And add to that all command used to inform how to "paint" the picture

Did you even read the above post... did you not see the bottlenecking flow chart by nVidia... even nVidia argues a GPU or CPU limited scenario.

I am not sure what you are talking about in terms of 3D drawings... Ratracying? Intel is faster there to... significantly faster.

Quote:

What I asked for was if there is information about latency comparing Hypertransport and the Front Side Bus?

I have not seen any study or data comparing the latency of just the bus, it has always been a convoluted measure of latency through the bus to something else. I can look for you...
08-17-2008, 10:00 AM
JumpingJack

Quote:

If this communications is very fast it could be one explanation why AMD is even or sometimes have a bit better numbers on some tests when they test single threaded games on very high detail and high resolution.

Ok... there is not a good explanation for this... even Lost Planet, at higher resolutions, shows AMD can support higher FPS in GPU bound scenarios... but if you followed the thread. I can get the Intel's high res GPU bound FPS to exceed AMD's by increasing the PCIe frequency (i.e. the BW of the PCIe bus).

My hypothesis is that the AMD implementation of PCIe 2.0 in the chipset is better than Intel's.

jack
08-17-2008, 12:21 PM
JumpingJack

4870 X2 update: well guys, it is gonna be a bust for a while I suspect. Only on a few occasions can I get frame rates that exceed a 8800 GTX under the same test conditions (regardless of CPU used). I am most certain that this is a driver issue, and that the drivers that ship with the card is different than the drivers used by the press-reviews that we saw. I am getting in some cases 1/2 the FPS of what other reviewers have shown, under similar settings, setup, etc. If AMD does not produce a press-like quality driver within the next week, the cards are going back.

This is the quote from Guru3D: "With the latest press-driver used in this review the X2 finally is starting to show some better performance scaling." ... I cannot be certain that the drivers I am using are correct.

jack
08-17-2008, 01:48 PM
gosh

Quote:

Originally Posted by JumpingJack

I am not sure what you are talking about in terms of 3D drawings... Ratracying? Intel is faster there to... significantly faster.

And I don't really understand your answer. I asked one simple question and am getting big answer where you tell me that I don’t understand.

The key to performance on the video card is the same as when they compress videos as much as possible. You just redraw what needs to be redrawn. Don’t refresh data that hasn’t been changed. This isn’t a problem if there isn’t any action in the game. Having as high FPS then isn’t that important. You don’t want low performance when there is action though and when there is action then much data needs to be processed. When there is action in the game then the need for fast communication is very important. Processors are extremely fast. They mostly sit and wait for data and moving all that data needs to be fast.

If you mean that they write to memory on the motherboard first and then copy memory from there to the GPU that would seem rather stupid. If the call is asynchronous they could get more performance (the processor will only need to wait to copy it to ram) for that command but that is very hard to do because you don’t know when the command is ready (you need synchronization). You have to check that (or the driver). If it is synchronous then you need to wait for two memory transfers for the same data. Also I don’t think that is one improvement compared to prepare buffers on the stack and just send it to allocated data on the video card. Stack data is normally in L1 or L2 (for both amd and intel) cache if it isn’t too big if I am right.

HyperTransport 3.0 is 20.8 GB/s, I have read that they can’t use all that bandwidth on the pc mothterboards for amd but the same goes for the FSB. You need some insane OC to go over 10 GB/s and that is only achieved if data is transferred in long "trains".

Quote:

Originally Posted by JumpingJack

I have not seen any study or data comparing the latency of just the bus, it has always been a convoluted measure of latency through the bus to something else. I can look for you...

That would be very interesting, I looked some before but finding data on speed between cpu and gpu was not easy to find. I did find other data on the speed but it seemed to low to be true for video comunication.
08-17-2008, 02:01 PM
RunawayPrisoner

Quote:

Originally Posted by JumpingJack

4870 X2 update: well guys, it is gonna be a bust for a while I suspect. Only on a few occasions can I get frame rates that exceed a 8800 GTX under the same test conditions (regardless of CPU used). I am most certain that this is a driver issue, and that the drivers that ship with the card is different than the drivers used by the press-reviews that we saw. I am getting in some cases 1/2 the FPS of what other reviewers have shown, under similar settings, setup, etc. If AMD does not produce a press-like quality driver within the next week, the cards are going back.

This is the quote from Guru3D: "With the latest press-driver used in this review the X2 finally is starting to show some better performance scaling." ... I cannot be certain that the drivers I am using are correct.

jack

So... which one (drivers) are you using, again?
08-17-2008, 02:12 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

So... which one (drivers) are you using, again?

The released drivers on the CD ... 8.7 from AMD's website will not install, says valid HW not found. I put beta 8.8 drivers on (a scratch partition 'dirty' build), and same results. What is looks like to me is that ATI seeded the review sites with an alpha driver, with the correct profiles for the games they were using. I can match 3DMark06 for example, and some settings on COH, but most others are a bust.

The reason I am thinking this is that a few review sites are getting the same results I am:

http://www.tomshardware.com/reviews/...md,1992-4.html (pains me to link this :) )....

They likely use drivers out of the box.... I am just not getting the high octane results I have seen on the other websites, even with the same HW supporting.

EDIT: the actual driver version reported by catalyst is 8.52.6-080709a-048489c-ATI
08-17-2008, 02:18 PM
JumpingJack

Quote:

Originally Posted by gosh

And I don't really understand your answer. I asked one simple question and am getting big answer where you tell me that I don’t understand.

The key to performance on the video card is the same as when they compress videos as much as possible. You just redraw what needs to be redrawn. Don’t refresh data that hasn’t been changed. This isn’t a problem if there isn’t any action in the game. Having as high FPS then isn’t that important. You don’t want low performance when there is action though and when there is action then much data needs to be processed. When there is action in the game then the need for fast communication is very important. Processors are extremely fast. They mostly sit and wait for data and moving all that data needs to be fast.

If you mean that they write to memory on the motherboard first and then copy memory from there to the GPU that would seem rather stupid. If the call is asynchronous they could get more performance (the processor will only need to wait to copy it to ram) for that command but that is very hard to do because you don’t know when the command is ready (you need synchronization). You have to check that (or the driver). If it is synchronous then you need to wait for two memory transfers for the same data. Also I don’t think that is one improvement compared to prepare buffers on the stack and just send it to allocated data on the video card. Stack data is normally in L1 or L2 (for both amd and intel) cache if it isn’t too big if I am right.

HyperTransport 3.0 is 20.8 GB/s, I have read that they can’t use all that bandwidth on the pc mothterboards for amd but the same goes for the FSB. You need some insane OC to go over 10 GB/s and that is only achieved if data is transferred in long "trains".

That would be very interesting, I looked some before but finding data on speed between cpu and gpu was not easy to find. I did find other data on the speed but it seemed to low to be true for video comunication.

Quote:

asked one simple question and am getting big answer where you tell me that I don’t understand.

Because you don't understand, even the context of your question is ludicrously silly. First you complain that no one is explaining, now that it is explained, you do not want a long answer you do not understand.

Obviously, no matter what data I show you, no matter if I link up even the GPU makers themselves, you will not understand I have explained it in a simple terms as I can... so I am finished.

Just a bit advice, do not try to pair a Phenom with a high end GPU, you will be disappointed.

jack
08-17-2008, 02:48 PM
RunawayPrisoner

Quote:

Originally Posted by JumpingJack

The released drivers on the CD ... 8.7 from AMD's website will not install, says valid HW not found. I put beta 8.8 drivers on (a scratch partition 'dirty' build), and same results. What is looks like to me is that ATI seeded the review sites with an alpha driver, with the correct profiles for the games they were using. I can match 3DMark06 for example, and some settings on COH, but most others are a bust.

The reason I am thinking this is that a few review sites are getting the same results I am:

http://www.tomshardware.com/reviews/...md,1992-4.html (pains me to link this :) )....

They likely use drivers out of the box.... I am just not getting the high octane results I have seen on the other websites, even with the same HW supporting.

So verdict is you can't make use of the 4870X2 "yet" to provide more information, right? Well... that would be enough for now. You confirmed one thing I said in the first page: AMD implemented better PCI-E 2.0 than Intel.
08-17-2008, 02:55 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

So verdict is you can't make use of the 4870X2 "yet" to provide more information, right? Well... that would be enough for now. You confirmed one thing I said in the first page: AMD implemented better PCI-E 2.0 than Intel.

I am pretty certain that is the case. nVidia has publicly complained about Intel's PCIe implementation for years -- they use that as the reason they don't release SLI on Intel chipsets.

I get better performance out of my 8800 GTX than what I am seeing right now with the 4870 X2's, with just a few exceptions.
08-17-2008, 03:01 PM
Clairvoyant129

Jack, don't even bother with gosh. No matter how much data you show him, he won't try to understand because something green blinds him. :rolleyes:
08-17-2008, 03:05 PM
gosh

Remember that much of the data that goes through the PCIe to GPU ALSO has been sent through the FSB. During gaming most of the data sent through the PCIe comes from CPU and that data will always travel through the FSB.

The question is… Is it really PCIe that is bad or is it the FSB that is the main bottleneck.
Why has Intel removed the FSB on Nehalem?
08-17-2008, 03:13 PM
Clairvoyant129

Quote:

Originally Posted by gosh

Remember that much of the data that goes through the PCIe to GPU ALSO has been sent through the FSB. During gaming most of the data sent through the PCIe comes from CPU and that data will always travel through the FSB.

The question is… Is it really PCIe that is bad or is it the FSB that is the main bottleneck.
Why has Intel removed the FSB on Nehalem?

Obviously, because server workloads would gain from QPI. Jack showed you TONS OF DATA, GET IT THROUGH YOUR HEAD.
08-17-2008, 03:15 PM
JumpingJack

Quote:

Originally Posted by gosh

Remember that much of the data that goes through the PCIe to GPU ALSO has been sent through the FSB. During gaming most of the data sent through the PCIe comes from CPU and that data will always travel through the FSB.

The question is… Is it really PCIe that is bad or is it the FSB that is the main bottleneck.
Why has Intel removed the FSB on Nehalem?

It's the PCIe, I can change the FSB speed all I want... no change, I can bump the PCIe 10% and get 5% improvement off the bat. This is wasted time and effort, data means nothing to you.

Intel has moved away from the FSB because as core count goes higher the need for more BW will be needed, they are starting now before 6 and 8 core ht. 1333 Mhz is plent to satisfy any DT need on most all applications, games included. But 4, 6 or 8 core in server and HPC need that BW.
08-17-2008, 03:19 PM
gosh

Quote:

Originally Posted by Clairvoyant129

he won't try to understand because something green blinds him. :rolleyes:

Well, it could be the other way around ;). I have informed about strong areas on Intel, but when you say anything good about AMD compared to Intel these seems to create some allergic reaction and then you will be told explanations that is very hard to make something out if you know a bit about the subject because it doesn’t make sense. If you show tests where AMD wins then there is some error, if Intel wins then it is ok. :shrug:
08-17-2008, 03:23 PM
Clairvoyant129

Quote:

Originally Posted by gosh

Well, it could be the other way around ;). I have informed about strong areas on Intel, but when you say anything good about AMD compared to Intel these seems to create some allergic reaction and then you will be told explanations that is very hard to make something out if you know a bit about the subject because it doesn’t make sense. If you show tests where AMD wins then there is some error, if Intel wins then it is ok. :shrug:

What? This is about you making false claims about the FSB and Jack trying to help you but obviously no matter how much he tries, you don't want to listen to him.
08-17-2008, 03:24 PM
JumpingJack

4 Attachment(s)

Crysis on a 4870 X2. I ran both the CPU and GPU bench, using resolutions of 1024x768, 1280x1024, and 1680x1050. Each was tested at 4x AA and again at 16x AA so a total of 6 runs for each bench

Phenom @ 2.5 GHz
GPU Bench
http://forum.xcpus.com/gallery/d/739...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG

QX9650 @ 2.5 GHz (matched clock speed for clock for clock)
GPU Bench
http://forum.xcpus.com/gallery/d/740...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG

Output is attached.
08-17-2008, 03:27 PM
Clairvoyant129

Quote:

Originally Posted by JumpingJack

Crysis on a 4870 X2. I ran both the CPU and GPU bench, using resolutions of 1024x768, 1280x1024, and 1680x1050. Each was tested at 4x AA and again at 16x AA so a total of 6 runs for each bench

Phenom @ 2.5 GHz
GPU Bench
http://forum.xcpus.com/gallery/d/739...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG

QX9650 @ 2.5 GHz (matched clock speed for clock for clock)
GPU Bench
http://forum.xcpus.com/gallery/d/740...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG

Output is attached.

Jack, I really have to applaud you for being so persistent... but don't you feel like you're talking to a stone? Just give it up...
08-17-2008, 03:27 PM
JumpingJack

Quote:

Originally Posted by Clairvoyant129

Jack, I really have to applaud you for being so persistent... but don't you feel like you're talking to a stone?

:) Yeah!
08-17-2008, 06:38 PM
demonkevy666

334 > 200

& not ?

200 = 200

gosh, said you can't clock AMD's L3 cache thats not true

Me and charged3800z24 both have gotten our 2.4ghz NB which is L3 cache speed.
08-17-2008, 06:56 PM
RunawayPrisoner

1 Attachment(s)

Quote:

Originally Posted by JumpingJack

Crysis on a 4870 X2. I ran both the CPU and GPU bench, using resolutions of 1024x768, 1280x1024, and 1680x1050. Each was tested at 4x AA and again at 16x AA so a total of 6 runs for each bench

Phenom @ 2.5 GHz
GPU Bench
http://forum.xcpus.com/gallery/d/739...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG

QX9650 @ 2.5 GHz (matched clock speed for clock for clock)
GPU Bench
http://forum.xcpus.com/gallery/d/740...2_GPUBENCH.JPG
CPU Bench
http://forum.xcpus.com/gallery/d/740...2_CPUBENCH.JPG

Output is attached.

http://www.xtremesystems.org/forums/...1&d=1219028090

And here is some extra data for you. Seems like Crysis doesn't scale so well with CF at all. CPU @ 3.6GHz and single HD4870 at 780/1000.
08-17-2008, 07:25 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

http://www.xtremesystems.org/forums/...1&d=1219028090

And here is some extra data for you. Seems like Crysis doesn't scale so well with CF at all. CPU @ 3.6GHz and single HD4870 at 780/1000.

Outstanding job, thank you!

Yeah, I have stopped messin' around with Crysis at the moment and went to FEAR.... and all I can say is WOWSER. At 1900x1200, everything maxed to the hilt, and it is CPU limited.

It is pointless to run this card < 1900x1200, absolutely pointless. FEAR is old, I know, but still a gorgeous game... amazing to see it 1900x1200 cranked to the max, max AA, max AF everything max.

Ok, about to post FEAR data ... of course, in the world according to Gosh FEAR is single threaded so Phenom should do poorly.
08-17-2008, 07:53 PM
JumpingJack

Quote:

Originally Posted by demonkevy666

334 > 200

& not ?

200 = 200

gosh, said you can't clock AMD's L3 cache thats not true

Me and charged3800z24 both have gotten our 2.4ghz NB which is L3 cache speed.

Gosh doesn't know what he is talking about....
08-17-2008, 08:21 PM
JumpingJack

FEAR on the 4870 X2 Phenom vs QX9650 using the 4870 X2 cards......

I did several experiments, I am posting the results for 3 of them. One at my normal stock baseline settings (that I do all my clock for clock studies on), i.e. both processors at 2.5 Ghz and DDR2-800. I then overclock both processors to 3.0 Ghz and set the memory to DDR2-1067. I know in FEAR at low res on a 8800 GTX that the difference between DDR2-800 and DDR2-1067 can have a substantial impact on the Phenom performance, thus I am also posting a 2.5 Ghz DDR2-800 vs DDR2-1067 for phenom to show the impact.

Bus speeds are not changed in any experiments, all OC is done via the multiplier (hence the reason I only buy unlocked multi CPUs)

Fear does not have a window mode so i cannot capture a CPUID for validation, you will need to take my word for it.

Phenom @ 2.5 GHz DDR2-800 max =253 min =37 Ave = 96
http://forum.xcpus.com/gallery/d/743...00_ALL+MAX.JPG

QX9650 @ 2.5 GHz DDR2-800 max =393 min =40 Ave =134
http://forum.xcpus.com/gallery/d/741...200_ALLMAX.JPG

About 40% (EDIT: ooops, miscalculated in my head, had to change the old 35% to the correct value) faster clock for clock at high res, max everything. Note, the min framerate is the same -- meaning both CPUs will give you the same quality gameplay.

Next experiment, OC to 3.0 GHz

Phenom @ 3.0 GHz DDR2-1067 max = 305 min= 48 ave = 113
http://forum.xcpus.com/gallery/d/741..._DDR2_1067.JPG

QX9650 @ 3.0 GHz DDR2-1067 max =466 min =40 Ave = 159
http://forum.xcpus.com/gallery/d/742..._DDR2-1067.JPG
On average 40% faster clock for clock....

Finally, here is Phenom at DDR2-1067 but 2.5 Ghz to compare above to see the impact of faster memory:

Phenom @ 2.5 Ghz DDR2-1067 Max = 273 Min = 37 Ave = 102
http://forum.xcpus.com/gallery/d/741..._DDR2_1067.JPG

Ok, there you have it. Both scale the response with clock speed the same. Oh, and incase you are wondering ... yes, I can reproduce the Techpower up FEAR number within a few FPS using the same CPU clock speed he used.

Jack
08-17-2008, 08:29 PM
RunawayPrisoner

Wow... QX9650 just totally spanked FEAR alive. Max 466... seriously?

But... I'm still seeing something weird here. Seems like the QX9650 always has that 2% between 25 to 40fps no matter what. What exactly happened in there? Maybe a FRAPS measure would give us a better picture. Maybe some spots in the FEAR benchmark was squeezing FSB bandwidth.

P.S.: And you are welcome about the data. :)
08-17-2008, 08:31 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

Wow... QX9650 just totally spanked FEAR alive. Max 466... seriously?

But... I'm still seeing something weird here. Seems like the QX9650 always has that 2% between 25 to 40fps no matter what. What exactly happened in there? Maybe a FRAPS measure would give us a better picture. Maybe some spots in the FEAR benchmark was squeezing FSB bandwidth.

P.S.: And you are welcome about the data. :)

Not quite understanding your question... could you clarify... however, I will do a fraps run if you tell me exactly which one you would like to see. EDIT: Ohhh, I see you are looking at the % bin statistics... good question ... Hold on I will FRAPS the 2.5 Ghz DDR2-800 runs again.

EDIT: It is obviously very clear that the drivers that shipped with the card, and the drivers given to reviewers were two different realities. Some reviewers I would appear skipped the 'press-drivers' and installed the shipped drivers -- Extremetech and Tom's came away with bad impressions of the 4870 X2 for example, and my numbers are matching theirs most closely in general. I suspect in a month or two, this card is just gonna get better and better.

jack
08-17-2008, 08:37 PM
RunawayPrisoner

Uhm... basically, I'm wondering why on the QX9650, there was always 2% of the time that fps was between 25 to 40. If you look at the screenshots again, on the QX9650, although fps is sky rocketting, 2% is always between 25 to 40. On the Phenom system, upon overclocking to 3GHz, it's always over 40. And minimum was at 45.
08-17-2008, 08:42 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

Uhm... basically, I'm wondering why on the QX9650, there was always 2% of the time that fps was between 25 to 40. If you look at the screenshots again, on the QX9650, although fps is sky rocketting, 2% is always between 25 to 40. On the Phenom system, upon overclocking to 3GHz, it's always over 40. And minimum was at 45.

yeah, i figured that out... I am FRAPing them now.

It happens as you pass through the glass in the door at the end of the perf run. FRAPs samples about once per second, it is not capturing the highest or lowest. It would take several runs and a lot of luck to actually capture the event that produces the max and min.

This is a FRAPs plot of the 2.5 GHz DDR2-800 runs for both CPUs....

http://forum.xcpus.com/gallery/d/743...9850vs9650.JPG
08-17-2008, 08:46 PM
RunawayPrisoner

Could be something flunking in the background, too. I have that a lot with my old IDE HDD. Actually, it's flunking so much on my system that my 3DMark Vantage scores vary in the range of 1000...
08-17-2008, 08:59 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

Could be something flunking in the background, too. I have that a lot with my old IDE HDD. Actually, it's flunking so much on my system that my 3DMark Vantage scores vary in the range of 1000...

Actually, you know what I think .... I think it is a missing texture that is swapping in. It responds to memory speed.... EDIT: well maybe not :)
08-17-2008, 09:10 PM
JumpingJack

next up.... quake wars enemy territory.
08-17-2008, 09:22 PM
JumpingJack

1 Attachment(s)

Ok... so for Quake Wars Enemy Territory, I utilize the HOCBenchmark utility, which can be downloaded here. Because I keep historical records, I am using an older version of QWET as well.

You can download the benchmark utility here: www.hocbench.com (as of the time of this post, their server appears down).

I setup the HOC benchmark to run 3 runs at 1024x768, 1280x1024, and 1900x1200. The quality settings are set to high, and I am using 16x AF and 4x AA in the HOC utility. I am also using the Quarry script/scene. The output comes in the form of an HTML file, no screen dumps of the actual game.

Phenom @ 2.5 GHz DDR2 - 800 output:
esolution: 1024×768
Score = 150 FPS
Score = 153 FPS
Score = 154 FPS
Average score = 152 FPS

Resolution: 1280×1024
Score = 150 FPS
Score = 150 FPS
Score = 152 FPS
Average score = 150 FPS

Resolution: 1920×1200 (HD WideScreen)
Score = 147 FPS
Score = 147 FPS
Score = 147 FPS
Average score = 147 FPS

=================================

QX9650 @ 2.5 GHz DDR2 - 800 output:
Resolution: 1024×768
Score = 175 FPS
Score = 178 FPS
Score = 175 FPS
Average score = 176 FPS

Resolution: 1600×1200
Score = 171 FPS
Score = 171 FPS
Score = 170 FPS
Average score = 170 FPS

Resolution: 1920×1200 (HD WideScreen)
Score = 165 FPS
Score = 164 FPS
Score = 166 FPS
Average score = 165 FPS

The output files are attached.

EDIT: Darnit my bad... the QX9650 instead of 1280x1024, I ran 1600x1200... rerunning the 1280x1024 to add that info.... sorry.

EDIT2: Here is the QX9650@2.5GHz DDR2-800 run at 1280x1024
Resolution: 1280×1024
Score = 176 FPS
Score = 176 FPS
Score = 171 FPS
Average score = 174 FPS

The attachment has been updated with that run as well.

BTW -- QWET uses all 4 cores.
08-17-2008, 09:34 PM
RunawayPrisoner

1600 x 1200 or not, even the 1920 x 1200 scores are beating Phenom to pulp, even when Phenom is at 1024 x 768. I think we pretty much have a conclusion.
08-17-2008, 09:40 PM
JumpingJack

Quote:

Originally Posted by RunawayPrisoner

1600 x 1200 or not, even the 1920 x 1200 scores are beating Phenom to pulp, even when Phenom is at 1024 x 768. I think we pretty much have a conclusion.

Not done yet...

http://forum.xcpus.com/gallery/d/7437-2/DESKTOP.JPG
(View to see the taskbar menu slide ups for installed software)

This is what is currently installed... the Phenom has exactly, program for program, the exact installation. So there is still a lot of comparing to be done :)

EDIT: I am gonna stick on QWET for a moment, running again on the QX9650 at 200 Mhz (800 Mhz FSB) speed, @ 2.5 Ghz ....
08-17-2008, 10:07 PM
JumpingJack

Well, color me purple, check this out.... QWET at 200 Mhz system clock, 800 MHz FSB..... (all the above runs were done at 333 Mhz or 1333 MHz FSB) EDIT: NOTE -- memory divider was not changed, this run is at DDR2-400 Mhz :) .... my bad.

Resolution: 1024×768
Score = 147 FPS
Score = 140 FPS
Score = 147 FPS
Average score = 144 FPS

Resolution: 1280×1024
Score = 144 FPS
Score = 141 FPS
Score = 139 FPS
Average score = 141 FPS

Resolution: 1920×1200 (HD WideScreen)
Score = 137 FPS
Score = 140 FPS
Score = 142 FPS
Average score = 139 FPS

Let's see what we get when we go to 1600 MHz FSB.

EDIT: OOOPS my bad... this was 200 MHz (800 MHz FSB), but I forgot to up the memory divider.... I am rerunning now at 200 Mhz FSB + DDR2-800 instead of DDR2-400. This may prove informative since I mentioned above that the GPU can go straight to system memory in one hop.

Output for DDR2-800 ...
Resolution: 1024×768
Score = 158 FPS
Score = 163 FPS
Score = 160 FPS
Average score = 160 FPS

Resolution: 1280×1024
Score = 157 FPS
Score = 157 FPS
Score = 160 FPS
Average score = 158 FPS

Resolution: 1920×1200 (HD WideScreen)
Score = 155 FPS
Score = 156 FPS
Score = 156 FPS
Average score = 155 FPS

So in this case a 40% decrease in FSB BW translates into a 7-10% hit in FPS.
08-17-2008, 10:44 PM
JumpingJack

Ok.... last one for the day... Half-Life2-Lost Coast, I will do episode one and two a bit later I suspect. I had to run it windowed so I could screen grab with a shot of CPUID. The window that HL2 creates is always on top, so after the bench I had to move it slightly off screen in order to reveal the CPUID window.

My standard baseline, Phenom@2.5 Ghz DDR2-800 4870 X2 1920x1200, max everything including AA and AF.
http://forum.xcpus.com/gallery/d/744..._1920x1200.JPG

QX9650 @2.5 Ghz DDR2-800 4870 X2 1920x1200, max everything including AA and AF.
http://forum.xcpus.com/gallery/d/744..._1920x1200.JPG
08-18-2008, 12:43 AM
Boschwanza

Quote:

Originally Posted by JumpingJack

So in this case a 40% decrease in FSB BW translates into a 7-10% hit in FPS.

From my personal opinion C2Q has no problem with bandwidth itself and never had. The problem with the fsb is its latency. Increasing or decreasing FSB has just an effect in terms of bandwidth but latency stays always the same. You can easy check this with low level Benchmarks.

In other words FSB is not a big problem in terms of bandwidth but there is a physical way on the PCB the Data must go through. Wheter the Data has to go through PCB layers decide in comparision to K10 and Nehalem, whether the coherenc latency is μs or ns. This is a very significant factor, which allows K10 and Nehalem better scaling vs Core2Q.

So what an effect does this have in real life conditions. In a good case the prefetcher can offset this physical latency, the data is already in the L2 Cache and can be calculated (in fact in this case there is no latency) in a worse case the prefetcher works inefficient and the physical latency of the FSB results in poor performance. And exactly thats when K10 outperforms an Inte Core2Quad.
I excuse my bad english, if there are any question feel free to ask.
08-18-2008, 02:44 AM
BenchZowner

Quote:

Originally Posted by Boschwanza

...in a worse case the prefetcher works inefficient and the physical latency of the FSB results in poor performance. And exactly thats when K10 outperforms an Inte Core2Quad.

Do you have a living example for that ?
I mean, a application, benchmark, game that exhibits that and gives the Phenom a reasonable advantage ( don't tell me something like 1 second faster in a 30s process or 1fps higher at 120fps ) over a C2Q ?
08-18-2008, 04:07 AM
Boschwanza

Sure

http://images.anandtech.com/graphs/a...5125/17165.png
08-18-2008, 04:14 AM
BenchZowner

Quote:

Originally Posted by Boschwanza

Sure

http://images.anandtech.com/graphs/a...5125/17165.png

And can you prove that it's the FSB that limits them on that test ?
08-18-2008, 04:26 AM
Boschwanza

Quote:

Originally Posted by BenchZowner

And can you prove that it's the FSB that limits them on that test ?

Give me an Intel Rig and i sure will ;). And again the FSB does not limit bandwidth wise, it all depends on the prefetcher and how well its actually working.

Quote:

Anandtech:
This is the test that actually screws the whole thing for Intel. It turns out that CBALLS2 calls a function in the Microsoft C Runtime Library (msvcrt.dll) that, when combined with Vista SP1, can magnify the Core architecture's performance penalty when accessing data that is not aligned with cache line boundaries.

"Wrong" Data in the Cache (simplyfied) -> accessing memory -> high latency -> bad coherency -> bad performance ;)
08-18-2008, 04:43 AM
Hornet331

jack im impressed with your data, have you ever thought about making your own review site?

This site would be my number one, since your comparisons are top notch. :up:
08-18-2008, 06:01 AM
JumpingJack

Quote:

Originally Posted by Boschwanza

From my personal opinion C2Q has no problem with bandwidth itself and never had. The problem with the fsb is its latency. Increasing or decreasing FSB has just an effect in terms of bandwidth but latency stays always the same. You can easy check this with low level Benchmarks.

In other words FSB is not a big problem in terms of bandwidth but there is a physical way on the PCB the Data must go through. Wheter the Data has to go through PCB layers decide in comparision to K10 and Nehalem, whether the coherenc latency is μs or ns. This is a very significant factor, which allows K10 and Nehalem better scaling vs Core2Q.

So what an effect does this have in real life conditions. In a good case the prefetcher can offset this physical latency, the data is already in the L2 Cache and can be calculated (in fact in this case there is no latency) in a worse case the prefetcher works inefficient and the physical latency of the FSB results in poor performance. And exactly thats when K10 outperforms an Inte Core2Quad.
I excuse my bad english, if there are any question feel free to ask.

Yes, especially for processor to memory... latency across the bus to other parts, such as SB IO or even Graphics card is not a huge issue because even with the longer latency, the timing of the IO and graphics card overwhelms any latency on the bus.

This is where the large cache and aggressive prefetches work well, only in a few cases can you see this really be a problem

Show 100 post(s) from this thread on one page

All times are GMT -8. The time now is 04:09 AM.

XtremeSystems