Yeah I completely agree....
A few comments ...
It's really kind of intuitive when you think about it, just a high level gerdanken experiment helps to really understand why game code runs more branches than say compressing a file or encoding a movie clip. It is unavoidable, there is no way to program a game engine without it. The input actions of the player are unpredictable and the resulting cause and effect will always require testable conditions, the crux of any gaming algorithm is ultimately indeterministic. For the CPU duties of the gaming engine, yeah it is still important -- the CPU is responsible for receiving player input, tracking AI, culling, etc etc.
Kanter sent me a word copy of this article a few months ago for review, and it is one of the only technical representations that help to really rationalize why C2D did such a good job with gaming code when it launched compared to K8 (it was the most dramatic feature of Conroe and really lit up the forums/net of course) : http://www.realworldtech.com/page.cf...2808015436&p=5
Ironically, at least my opinion is ... that Intel's branch prediction capability, as seen in C2D, was probably the only good thing that ultimately came out of Netburst -- the penalty for a mispredicted branch can be as bad or worse than a L2 cache miss -- you need to flush the pipeline, fetch the new code/date into the front end, reorder again and repopulate the pipeline .. I've seen numbers between 30-150 cycles wasted just to correct a mispredicted branch.... to avoid this, I would not doubt if Intel architects went all out balls to the wall to figure out any and all possible ways to improve branch prediction accuracy, even then that 31 stage pipeline just drug it all back down as two or three mispredictions in a 1000 would cripple a Prescott.
The branch predictor logic most likely carried over in some fashion, to some degree into C2D, i.e. probably the only feature of Netburst to go into C2D... who knows, but makes sense.
So yeah, AMD's branch prediction is weaker than Intel's at the moment, though K8 could kick butt against Netburst gaming wise -- that was because even with great branch prediction, that uber long pipeline just stunk if there was ever a stall .... shortening up the pipeline with C2D, widening it, coupled with strong branch predictors = great scenario for gaming.
EDIT: Here's a good one ... look at figure 4: http://www.research.ibm.com/people/m...s/2004_msp.pdf this is a neat paper, it also shows the L1/L2 misses for FPS games vs other applications ... so you can see why we get this generalized statement "games love L2 cache", true they do ... but they also love good branch predictors -- both of which C2D/Q's have lots of ...![]()
Jack






Yeah I completely agree....
Reply With Quote



Actually after I think more about the L3. 48-way(?) association and heat, I think that it is not going to any kind of a problem as L1 and L2 will (hopefully) have the data already and L3 would be the last resort before DRAM.




Bookmarks