Results 1 to 25 of 986

Thread: Conroe 2.4Ghz on 965G mobo, brief test...

Hybrid View

  1. #1
    Xtreme Mentor
    Join Date
    Aug 2005
    Location
    Boston, MA, USA
    Posts
    2,883
    Quote Originally Posted by agenda2005

    The 128-bit SSE on Conroe will obviously benefit more than AMD 64 because two 64-bit SSE(x) intructions will now require one cycle unlike two in Presscott and Athlon 64. This is nobody's fault since Intel have concentrated more effort on making better CPU than crippling competitors performance on their compiler.
    Just for clarifications: SSE3 always has 128 bit instructions. On earlier platforms they might need more cycles to execute but the 128 bit instructions as such are present in any processor supporting SSE3.

    I don't think it is quite correct to say that two 64 bit SSE operations will now require half the time. This is only the case when ...
    • ... there are indeed independent units to compute. Doing two instructions at the same time requires that non depends on the outcome of the other. That is frequently not the case
    • .... and unless it is hand-coded assembly the compiler has to be sure about the previous fact. It can be nontrivial for the compiler to figure this out in a bulletproof way. If the compiler is not entirely sure it will default to be conservative
    • ... to be most effective the compiler has to be able to do out-of-order processing to scrap two 64 bit operations into one 128 bit one even if they are at different places in the source code. Prooving that this is safe is nontrivial, too, in particular in languages like C/C++ where there is a lot of aliasing going on

  2. #2
    Xtreme Member
    Join Date
    Oct 2004
    Location
    SC, USA
    Posts
    487
    Quote Originally Posted by uOpt
    Just for clarifications: SSE3 always has 128 bit instructions. On earlier platforms they might need more cycles to execute but the 128 bit instructions as such are present in any processor supporting SSE3.

    I don't think it is quite correct to say that two 64 bit SSE operations will now require half the time. This is only the case when ...
    • Agreed, but that is usually the case for highly vecorized codes like MolDyn, BLAS and LINPACK. Two 64-bit SSE and SSE2 instructions can be fetch and decode in one cycle instead of one( As in A64 and Prescott) and the code scale according to the degree of vectorization.

      Quote Originally Posted by uOpt
    • ... there are indeed independent units to compute. Doing two instructions at the same time requires that non depends on the outcome of the other. That is frequently not the case
  3. That is where macro-op fusion comes into play. You can fuse two instructions together(for exapme, "compare" ) and perform the operation in one cycle.
    This will infact cut down branch mis-predictions that have plague prescott for a long time.

    Quote Originally Posted by uOpt
    [*] .... and unless it is hand-coded assembly the compiler has to be sure about the previous fact. It can be nontrivial for the compiler to figure this out in a bulletproof way. If the compiler is not entirely sure it will default to be conservative[/list]
    You can perform a Profile Guided Optimazation and let the compiler know the code structre a priori and then recompile for optimizations.
Core 2 Duo E6600 [L625A] 3330MHz 1.375Vcore 24/7
Core 2 Duo E6600 [L640F] 3330MHz 1.475Vcore
Crucial 10th Anv 2 x 1GB DDR2-667 @ 463MHz 4-4-4-12
ASUS P5B Dlx
FOTRON BLUE STORM 500W
TT BT with stock Fan
Gigabyte Nvidia 7600GSw/ Silent Pipe
WD Cavier 250GB
Antec P160

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •