MMM
Results 1 to 25 of 525

Thread: Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

Threaded View

  1. #11
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by gosh View Post
    Do you have a good link about K10?

    About code that needs speed and branches. The general rule is to avoid branches as much as you can. There are numerous ways to do that. It will be harder for the compiler to optimize code also if there is a lot of branches. I don't think games has a lot of branches, logically it would be the opposite. There is a lot of talks looking at game code how to avoid branches.
    If you need a branches then try to have one single flow and avoiding moving the instruction pointer with a conditional branch.
    http://www.realworldtech.com/page.cf...WT051607033728

    The branch prediction in the K8 also received a serious overhaul. The K8 uses a branch selector to choose between using a bi-modal predictor and a global predictor. The bi-modal predictor and branch selector are both stored in the ECC bits of the instruction cache, as pre-decode information. The global predictor combines the relative instruction pointer (RIP) for a conditional branch with a global history register that tracks the last 8 branches to index into a 16K entry prediction table that contains 2 bit saturating counters. If the branch is predicted as taken, then the destination must be predicted in the 2K entry target array. Indirect branches use a single target in the array, while CALLs use a target and also update the return address stack. The branch target address calculator (BTAC) checks the targets for relative branches, and can correct predictions from the target array, with a two cycle penalty. Returns are predicted with the 12 entry return address stack.

    Barcelona does not fundamentally alter the branch prediction, but improves the accuracy. The global history register now tracks the last 12 branches, instead of the last 8. Barcelona also adds a new indirect predictor, which is specifically designed to handle branches with multiple targets (such as switch or case statements). Indirect branch prediction was first introduced with Intel’s Prescott microarchitecture and later the Pentium M. Indirect branches with a single target still use the existing 2K entry branch target buffer. The 512 entry indirect predictor allocates an entry when an indirect target is mispredicted; the target addresses are indexed by the global branch history register and branch RIP, thus taking into account the path that was used to access the indirect branch and the address of the branch itself. Lastly, the return address stack is doubled to 24 entries.

    According to our own measurements for several PC games, between 16-50% of all branch mispredicts were indirect (29% on average). The real value of indirect branch misprediction is for many of the newer scripting or high level languages, such as Ruby, Perl or Python, which use interpreters. Other common indirect branch common culprits include virtual functions (used in C++) and calls to function pointers. For the same set of games, we measured that between 0.5-5% (1.5% on average) of all stack references resulted in overflow, but overflow may be more prevalent in server workloads.
    Now think game code ... a player shoots a weapon....

    a) In the evaluation loop (which a loop is a branch condtion on when to exit), it needs to check does the player pull the trigger (a branch).
    b) If the trigger is pulled, what weapon is he firing (another branch).
    c) Calculate the physics, does he hit the bad guy (yes or no) another branch
    d) Where does he hit the bad guy (head, arm, kneck)

    Game code is the absolute branchiest of all code classes. I am still looking for those papers that show it 30-80% higher than an other major kind of code. The reason for this is the total amount of variability in the propogation of the game. Game code does not know if you are going to jump, crouch, turn left, or right, die or blow up, as opposed to something like say a 3D renderer which only needs to know data and do a calculation, then move to the next pixel, use the information, do the calculation. Same with encoding, take one frame of data, calculate the attributes based on the other pixels around it, move to the next pixel... very linear. This is why P4's could do well at multimedia but sucked to bad at gaming... so long as there was little branching in the code, P4's could handle the load.

    I do agree, branchy code is to be avoided at all costs ... but some applications simply demand a large amount of checks and conditions that generate new code paths (games are the biggest one, how much fun would game be if you did the exact same thing everytime).

    Jack
    Last edited by JumpingJack; 09-07-2008 at 07:31 AM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •