MMM
Results 1 to 25 of 488

Thread: Intel Core i7 Review Thread

Hybrid View

  1. #1
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by villa1n View Post
    What do you think accounts for such a huge performance advantage, what is the nehalem "un-bottlenecking" ? As those are some pretty numbers, and how we would have hoped sli/xfire would scale like...?
    Won't know until I get a CPU and run the 4870 X2's x-fired. But the BW demands on a FSB (or even HT) in gaming scenarios is not nearly as much as you would think. I posted a lot of data on this in another thread where, for example, I cranked the 2000 Mhz HT on the phenom down to 200 MHz and it made a grand total of about 2% difference at most (basically in the noise). EDIT: Also, while the PCIe lanes will service the command sets from main memory or CPU to card, most of the actual communication between GPUs in multiGPU setups are arbitrated through the SLI or xFire link -- specifically to move away from the back bus bottlnecks.

    Where you will see the interconnect bottlenecking graphics performance is in cases where the texture memory requires exceed the onboard VRAM, which you can see in some 2560x1600 games today, but 1920x1200 512 MB seems sufficient.

    Nehalem has simply improved clock for clock performance, the sites running single GPUs comparing against a QX9770 are simply showing gaming situations already railed up against the GPU performance, so no matter what one does, the overall result will make the 'CPUs look the same'. The fact is, i7 is showing similar gains in gaming code execution as it is in 3D rendering or video encoding ... the current crop of reviews/GPUs hides it because the GPU is capping the results.

    People focus on the QPI and IMC as the major changes to Nehalem, but these were not all the major changes -- Intel also deepened the execution window, and improved branch prediction (both good for gaming code). However, looking at the tri-SLI results from Guru3D and Toms (recently posted) actually surprised me -- I was expecting modest gaming improvements but some are just huge...

    Take for example clock for clock -- 60% improvement in Brothers in Arms (Guru3D data, QX9770 67 FPS, i7 965 107 @ 1920x1200) even Far Cry 2 is massive jump, which surprises me... I was expecting best case maybe 15-20%.

    Who knows for sure at this point, the reviewers are simply publishing their 'study' but do not run various runs to really test out why... I am certain though, raising the FSB (keeping the same core clock speed) will not make up 60% difference.

    In the Guru3D charts comparing the QX9770 and i7 for tri-SLI, the QX9770 is clearly showing CPU capped runs all the way to 1920x1200, the two CPUs only converge at 2560x1600 (which is a resolution you can say is now GPU limited).
    Last edited by JumpingJack; 11-04-2008 at 02:31 AM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  2. #2
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    Won't know until I get a CPU and run the 4870 X2's x-fired. But the BW demands on a FSB (or even HT) in gaming scenarios is not nearly as much as you would think. I posted a lot of data on this in another thread where, for example, I cranked the 2000 Mhz HT on the phenom down to 200 MHz and it made a grand total of about 2% difference at most (basically in the noise). EDIT: Also, while the PCIe lanes will service the command sets from main memory or CPU to card, most of the actual communication between GPUs in multiGPU setups are arbitrated through the SLI or xFire link -- specifically to move away from the back bus bottlnecks.
    dont forget all memory read/writes are placed on the FSB, if all data of 3 GTX280 have to placed on that same FSB, a serious bottleneck is born
    i also thought the fsb isnt bidrectional, so you have to wait before you can send something in a direction

    and i didnt find large gains with 4870x2 either
    Last edited by Bellisimo; 11-04-2008 at 02:43 AM.

  3. #3
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    dont forget all memory read/writes are placed on the FSB, if all data of 3 GTX280 have to placed on that same FSB, a serious bottleneck is born
    i also thought the fsb isnt bidrectional, so you have to wait before you can send something in a direction
    It isn't, that's the fault in your argument -- texture and vertices are stored in VRAM -- this is called precaching -- so you may experience better level loading times, but during actual game play the bulk of the massive slug of data is kept in very fast memory next to the GPU with a very fast BW connection (due to the necessary high throughput).

    The data communicated between the card and CPU is only the command buffer which, in part, contains the pointers to the textures in VRAM... it is a very small data set relatively speaking.

    On top of that, PCIe 2.0 enables bus mastering, so to access the command buffer the GPU does not go through the FSB in the Intel platform, it simply masters straight to system ram.

    Jack
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  4. #4
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    It isn't, that's the fault in your argument -- texture and vertices are stored in VRAM -- this is called precaching -- so you may experience better level loading times, but during actual game play the bulk of the massive slug of data is kept in very fast memory next to the GPU with a very fast BW connection (due to the necessary high throughput).

    The data communicated between the card and CPU is only the command buffer which, in part, contains the pointers to the textures in VRAM... it is a very small data set relatively speaking.

    On top of that, PCIe 2.0 enables bus mastering, so to access the command buffer the GPU does not go through the FSB in the Intel platform, it simply masters straight to system ram.

    Jack
    so the cpu doesnt communicate with it's own memory through the FSB?
    common jack

    it goes like this cpu --> FSB --> NB --> memory

  5. #5
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    so the cpu doesnt communicate with it's own memory through the FSB?
    common jack

    it goes like this cpu --> FSB --> NB --> memory
    It does but not nearly as much as you are alluding to ... 500 MHz over 400 Mhz FSB is not going make up 60% performance delta.... what you are seeing in the guru3D data is pure CPU power over CPU power. This is an easy test.... FSB and mem BW through the FSB is not nearly as critical as you think... if it were, then the superiour BW of the Phenom should be mopping up ... and truth is ... mem BW thorugh the interconnect is one of 100 different components that work together to produce the result.

    Go to the benchmarking section, look at the Phenom vs C2Q thread that is massive, I show 200 Mhz, 400 Mhz FSB gaming on C2Q at 3.0 Ghz -- guess what, no difference.

    Do you have a C2Q?? It is an easy experiment to do ... you won't affect your gaming benches by much more than 5 -8 %. (EDIT: revised this after I pulled up my FSB scaling table).

    Here is an example:
    QX9650 with 200 Mhz FSB using a 4870 X2


    QX9650 with 400 Mhz FSB using a 4870 X2


    Snow is 0% difference, cave is 80.1 in the 200 Mhz case, 85.7 in the 400 Mhz case (that is a 7% difference) ...

    500 Mhz is not gonna amount to a hill of beans.
    Last edited by JumpingJack; 11-04-2008 at 02:59 AM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  6. #6
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    It does but not nearly as much as you are alluding to ... 500 MHz over 400 Mhz FSB is not going make up 60% performance delta.... what you are seeing in the guru3D data is pure CPU power over CPU power.
    offcourse it will not make up the 60% perf delta, i didnt say that
    i said if you would overclock the fsb to 500 and keep the frequency at 3.2 ghz for the qx9770 you would see a similar gain (not as large)
    also dont forget intel cores communicate with eachother (2 dualcore dies on a quadcore) through the FSB
    so FSB is occupied with this:
    - Intercore traffic
    - Memory acces
    - Peripheral stuff
    - Graphics stuff

    a 1600Mhz fsb will not cut it....
    especially when there are 3 very powerfull graphic cards in the system

    thats why i stated the QPI bus is largely responsible for the huge gain with 3-way SLI

    Quote Originally Posted by JumpingJack View Post
    Do you have a C2Q?? It is an easy experiment to do ... you won't affect your gaming benches by much more than 5 or 6%.
    i don't, i have a E8400, but i don't have 3*GTX280 so i wouldnt be able to test it


    http://anandtech.com/cpuchipsets/int...px?i=3448&p=12
    oblivion is screwed, but look at QW with only 2 gpu's
    Last edited by Bellisimo; 11-04-2008 at 03:01 AM.

  7. #7
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post

    i don't, i have a E8400, but i don't have 3*GTX280 so i wouldnt be able to test it
    That's ok, I already have... I already know the answer... and yes you did imply that 500 Mhz was the delta:
    crank your fsb upto 500 and you will have better results for the core2
    this is just because of the qpi link, and few people use 3-way sli, even SLI isnt used a lot
    Which is, as Savantu states, BS.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  8. #8
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    s

    it goes like this cpu --> FSB --> NB --> memory
    Also, don't be condescending or I will put you in your place but fast.

    jack
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  9. #9
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Bellisimo View Post
    dont forget all memory read/writes are placed on the FSB, if all data of 3 GTX280 have to placed on that same FSB, a serious bottleneck is born
    i also thought the fsb isnt bidrectional, so you have to wait before you can send something in a direction

    and i didnt find large gains with 4870x2 either
    Huh ? GPUs don't talk to memory over FSB , they have DMA over the NB.Using DDR3 1600 , GPUs have 25.6GBs of BW ( minus what goes to the FSB ) available.

    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  10. #10
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by savantu View Post
    Huh ? GPUs don't talk to memory over FSB , they have DMA over the NB.Using DDR3 1600 , GPUs have 25.6GBs of BW ( minus what goes to the FSB ) available.
    He clearly does not understand this, or did not read into the concept that PCIe 2.0 can be a bus master.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  11. #11
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by JumpingJack View Post
    ..
    Nehalem has simply improved clock for clock performance, the sites running single GPUs comparing against a QX9770 are simply showing gaming situations already railed up against the GPU performance, so no matter what one does, the overall result will make the 'CPUs look the same'. The fact is, i7 is showing similar gains in gaming code execution as it is in 3D rendering or video encoding ... the current crop of reviews/GPUs hides it because the GPU is capping the results.

    People focus on the QPI and IMC as the major changes to Nehalem, but these were not all the major changes -- Intel also deepened the execution window, and improved branch prediction (both good for gaming code). However, looking at the tri-SLI results from Guru3D and Toms (recently posted) actually surprised me -- I was expecting modest gaming improvements but some are just huge...
    Nehalem isn't just Core on steroids ( IMC+QPI).They've improved every single part of the core, everything was tinkered with.
    Some changes are more radical , like macro-uop fusion in 64bit and being capable to support a larger variety of instructions being fused , SMT , second TLB and so on.

    Where Nehalem stumbles is the slower L1 and small L2.People forget that Penryn was an incredible high standard to start with ( look at AMD being incapable to offer a solution , probably 2010 ) and a nice massive very fast 6MB L2.As long as an app is cache friendly , Penryn will rock.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  12. #12
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by savantu View Post
    Nehalem isn't just Core on steroids ( IMC+QPI).They've improved every single part of the core, everything was tinkered with.
    Some changes are more radical , like macro-uop fusion in 64bit and being capable to support a larger variety of instructions being fused , SMT , second TLB and so on.

    Where Nehalem stumbles is the slower L1 and small L2.People forget that Penryn was an incredible high standard to start with ( look at AMD being incapable to offer a solution , probably 2010 ) and a nice massive very fast 6MB L2.As long as an app is cache friendly , Penryn will rock.
    Yeah, I agree ... the L1 and L2 cache was probably settled upon based on power and die size constraints, it will be interesting to see if Westmere tick bumps the L2 to 512 or more per core.

    However, what Intel did was unique -- they fashioned a synchronous interface between cores and L3 which keeps the L3 latency very low. Their L2 latency (because it is small) is also better than the L2 latency on C2D (note I state this specifically on the absolute level of cache) -- but l1 is one cycle slower.... what it looks like is Intel did a real balancing act to design the cache hierachy to be as balanced as possible.

    Ultimately this proves to be somewhat of a hinderance in single thread, as we can see in the data, but overall the sum of the parts is as good to very slightly better than core 2 (single threaded).

    David did a really nice job summarizing the major items in Nehalem... I am sure you have read it.

    Jack
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •