Page 6 of 20 FirstFirst ... 345678916 ... LastLast
Results 126 to 150 of 488

Thread: Intel Core i7 Review Thread

  1. #126
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by villa1n View Post
    What do you think accounts for such a huge performance advantage, what is the nehalem "un-bottlenecking" ? As those are some pretty numbers, and how we would have hoped sli/xfire would scale like...?
    Won't know until I get a CPU and run the 4870 X2's x-fired. But the BW demands on a FSB (or even HT) in gaming scenarios is not nearly as much as you would think. I posted a lot of data on this in another thread where, for example, I cranked the 2000 Mhz HT on the phenom down to 200 MHz and it made a grand total of about 2% difference at most (basically in the noise). EDIT: Also, while the PCIe lanes will service the command sets from main memory or CPU to card, most of the actual communication between GPUs in multiGPU setups are arbitrated through the SLI or xFire link -- specifically to move away from the back bus bottlnecks.

    Where you will see the interconnect bottlenecking graphics performance is in cases where the texture memory requires exceed the onboard VRAM, which you can see in some 2560x1600 games today, but 1920x1200 512 MB seems sufficient.

    Nehalem has simply improved clock for clock performance, the sites running single GPUs comparing against a QX9770 are simply showing gaming situations already railed up against the GPU performance, so no matter what one does, the overall result will make the 'CPUs look the same'. The fact is, i7 is showing similar gains in gaming code execution as it is in 3D rendering or video encoding ... the current crop of reviews/GPUs hides it because the GPU is capping the results.

    People focus on the QPI and IMC as the major changes to Nehalem, but these were not all the major changes -- Intel also deepened the execution window, and improved branch prediction (both good for gaming code). However, looking at the tri-SLI results from Guru3D and Toms (recently posted) actually surprised me -- I was expecting modest gaming improvements but some are just huge...

    Take for example clock for clock -- 60% improvement in Brothers in Arms (Guru3D data, QX9770 67 FPS, i7 965 107 @ 1920x1200) even Far Cry 2 is massive jump, which surprises me... I was expecting best case maybe 15-20%.

    Who knows for sure at this point, the reviewers are simply publishing their 'study' but do not run various runs to really test out why... I am certain though, raising the FSB (keeping the same core clock speed) will not make up 60% difference.

    In the Guru3D charts comparing the QX9770 and i7 for tri-SLI, the QX9770 is clearly showing CPU capped runs all the way to 1920x1200, the two CPUs only converge at 2560x1600 (which is a resolution you can say is now GPU limited).
    Last edited by JumpingJack; 11-04-2008 at 02:31 AM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  2. #127
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Bellisimo View Post
    crank your fsb upto 500 and you will have better results for the core2
    this is just because of the qpi link, and few people use 3-way sli, even SLI isnt used a lot
    BS.

    What you're seeing is this :

    - at resolutions below 1600/1200 you're CPU limited , the GPUs can do more
    - at 1920/1200 GPUs starts to feel the pressure
    - at 2560/1600 you're GPU limited

    1st case a more powerful CPU helps
    2nd case a more powerful CPU helps very little
    3rd case a more powerful CPU has no effect

    Skulltrail based Nehalem with 4 GPUs is going to be the all around monster.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  3. #128
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    Won't know until I get a CPU and run the 4870 X2's x-fired. But the BW demands on a FSB (or even HT) in gaming scenarios is not nearly as much as you would think. I posted a lot of data on this in another thread where, for example, I cranked the 2000 Mhz HT on the phenom down to 200 MHz and it made a grand total of about 2% difference at most (basically in the noise). EDIT: Also, while the PCIe lanes will service the command sets from main memory or CPU to card, most of the actual communication between GPUs in multiGPU setups are arbitrated through the SLI or xFire link -- specifically to move away from the back bus bottlnecks.
    dont forget all memory read/writes are placed on the FSB, if all data of 3 GTX280 have to placed on that same FSB, a serious bottleneck is born
    i also thought the fsb isnt bidrectional, so you have to wait before you can send something in a direction

    and i didnt find large gains with 4870x2 either
    Last edited by Bellisimo; 11-04-2008 at 02:43 AM.

  4. #129
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by JumpingJack View Post
    ..
    Nehalem has simply improved clock for clock performance, the sites running single GPUs comparing against a QX9770 are simply showing gaming situations already railed up against the GPU performance, so no matter what one does, the overall result will make the 'CPUs look the same'. The fact is, i7 is showing similar gains in gaming code execution as it is in 3D rendering or video encoding ... the current crop of reviews/GPUs hides it because the GPU is capping the results.

    People focus on the QPI and IMC as the major changes to Nehalem, but these were not all the major changes -- Intel also deepened the execution window, and improved branch prediction (both good for gaming code). However, looking at the tri-SLI results from Guru3D and Toms (recently posted) actually surprised me -- I was expecting modest gaming improvements but some are just huge...
    Nehalem isn't just Core on steroids ( IMC+QPI).They've improved every single part of the core, everything was tinkered with.
    Some changes are more radical , like macro-uop fusion in 64bit and being capable to support a larger variety of instructions being fused , SMT , second TLB and so on.

    Where Nehalem stumbles is the slower L1 and small L2.People forget that Penryn was an incredible high standard to start with ( look at AMD being incapable to offer a solution , probably 2010 ) and a nice massive very fast 6MB L2.As long as an app is cache friendly , Penryn will rock.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  5. #130
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    dont forget all memory read/writes are placed on the FSB, if all data of 3 GTX280 have to placed on that same FSB, a serious bottleneck is born
    i also thought the fsb isnt bidrectional, so you have to wait before you can send something in a direction
    It isn't, that's the fault in your argument -- texture and vertices are stored in VRAM -- this is called precaching -- so you may experience better level loading times, but during actual game play the bulk of the massive slug of data is kept in very fast memory next to the GPU with a very fast BW connection (due to the necessary high throughput).

    The data communicated between the card and CPU is only the command buffer which, in part, contains the pointers to the textures in VRAM... it is a very small data set relatively speaking.

    On top of that, PCIe 2.0 enables bus mastering, so to access the command buffer the GPU does not go through the FSB in the Intel platform, it simply masters straight to system ram.

    Jack
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  6. #131
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Bellisimo View Post
    dont forget all memory read/writes are placed on the FSB, if all data of 3 GTX280 have to placed on that same FSB, a serious bottleneck is born
    i also thought the fsb isnt bidrectional, so you have to wait before you can send something in a direction

    and i didnt find large gains with 4870x2 either
    Huh ? GPUs don't talk to memory over FSB , they have DMA over the NB.Using DDR3 1600 , GPUs have 25.6GBs of BW ( minus what goes to the FSB ) available.

    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  7. #132
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    It isn't, that's the fault in your argument -- texture and vertices are stored in VRAM -- this is called precaching -- so you may experience better level loading times, but during actual game play the bulk of the massive slug of data is kept in very fast memory next to the GPU with a very fast BW connection (due to the necessary high throughput).

    The data communicated between the card and CPU is only the command buffer which, in part, contains the pointers to the textures in VRAM... it is a very small data set relatively speaking.

    On top of that, PCIe 2.0 enables bus mastering, so to access the command buffer the GPU does not go through the FSB in the Intel platform, it simply masters straight to system ram.

    Jack
    so the cpu doesnt communicate with it's own memory through the FSB?
    common jack

    it goes like this cpu --> FSB --> NB --> memory

  8. #133
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by savantu View Post
    Nehalem isn't just Core on steroids ( IMC+QPI).They've improved every single part of the core, everything was tinkered with.
    Some changes are more radical , like macro-uop fusion in 64bit and being capable to support a larger variety of instructions being fused , SMT , second TLB and so on.

    Where Nehalem stumbles is the slower L1 and small L2.People forget that Penryn was an incredible high standard to start with ( look at AMD being incapable to offer a solution , probably 2010 ) and a nice massive very fast 6MB L2.As long as an app is cache friendly , Penryn will rock.
    Yeah, I agree ... the L1 and L2 cache was probably settled upon based on power and die size constraints, it will be interesting to see if Westmere tick bumps the L2 to 512 or more per core.

    However, what Intel did was unique -- they fashioned a synchronous interface between cores and L3 which keeps the L3 latency very low. Their L2 latency (because it is small) is also better than the L2 latency on C2D (note I state this specifically on the absolute level of cache) -- but l1 is one cycle slower.... what it looks like is Intel did a real balancing act to design the cache hierachy to be as balanced as possible.

    Ultimately this proves to be somewhat of a hinderance in single thread, as we can see in the data, but overall the sum of the parts is as good to very slightly better than core 2 (single threaded).

    David did a really nice job summarizing the major items in Nehalem... I am sure you have read it.

    Jack
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  9. #134
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    so the cpu doesnt communicate with it's own memory through the FSB?
    common jack

    it goes like this cpu --> FSB --> NB --> memory
    It does but not nearly as much as you are alluding to ... 500 MHz over 400 Mhz FSB is not going make up 60% performance delta.... what you are seeing in the guru3D data is pure CPU power over CPU power. This is an easy test.... FSB and mem BW through the FSB is not nearly as critical as you think... if it were, then the superiour BW of the Phenom should be mopping up ... and truth is ... mem BW thorugh the interconnect is one of 100 different components that work together to produce the result.

    Go to the benchmarking section, look at the Phenom vs C2Q thread that is massive, I show 200 Mhz, 400 Mhz FSB gaming on C2Q at 3.0 Ghz -- guess what, no difference.

    Do you have a C2Q?? It is an easy experiment to do ... you won't affect your gaming benches by much more than 5 -8 %. (EDIT: revised this after I pulled up my FSB scaling table).

    Here is an example:
    QX9650 with 200 Mhz FSB using a 4870 X2


    QX9650 with 400 Mhz FSB using a 4870 X2


    Snow is 0% difference, cave is 80.1 in the 200 Mhz case, 85.7 in the 400 Mhz case (that is a 7% difference) ...

    500 Mhz is not gonna amount to a hill of beans.
    Last edited by JumpingJack; 11-04-2008 at 02:59 AM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  10. #135
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    It does but not nearly as much as you are alluding to ... 500 MHz over 400 Mhz FSB is not going make up 60% performance delta.... what you are seeing in the guru3D data is pure CPU power over CPU power.
    offcourse it will not make up the 60% perf delta, i didnt say that
    i said if you would overclock the fsb to 500 and keep the frequency at 3.2 ghz for the qx9770 you would see a similar gain (not as large)
    also dont forget intel cores communicate with eachother (2 dualcore dies on a quadcore) through the FSB
    so FSB is occupied with this:
    - Intercore traffic
    - Memory acces
    - Peripheral stuff
    - Graphics stuff

    a 1600Mhz fsb will not cut it....
    especially when there are 3 very powerfull graphic cards in the system

    thats why i stated the QPI bus is largely responsible for the huge gain with 3-way SLI

    Quote Originally Posted by JumpingJack View Post
    Do you have a C2Q?? It is an easy experiment to do ... you won't affect your gaming benches by much more than 5 or 6%.
    i don't, i have a E8400, but i don't have 3*GTX280 so i wouldnt be able to test it


    http://anandtech.com/cpuchipsets/int...px?i=3448&p=12
    oblivion is screwed, but look at QW with only 2 gpu's
    Last edited by Bellisimo; 11-04-2008 at 03:01 AM.

  11. #136
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    s

    it goes like this cpu --> FSB --> NB --> memory
    Also, don't be condescending or I will put you in your place but fast.

    jack
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  12. #137
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post

    i don't, i have a E8400, but i don't have 3*GTX280 so i wouldnt be able to test it
    That's ok, I already have... I already know the answer... and yes you did imply that 500 Mhz was the delta:
    crank your fsb upto 500 and you will have better results for the core2
    this is just because of the qpi link, and few people use 3-way sli, even SLI isnt used a lot
    Which is, as Savantu states, BS.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  13. #138
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Clairvoyant129 View Post

    I wonder why SLI/Crossfire with an AMD platform under performs compared to an Intel platform. According to you, it's just the hyper transport, right?


    It's because Phenoms run at relatively low clocks when compared to Core2.But soon we'll have 3Ghz Deneb with hopefully higher Nortbridge/L3 clocks so we'll see how SLI/CF works with that chip on def. and north of its def. clocks.

  14. #139
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    That's ok, I already have... I already know the answer... and yes you did imply that 500 Mhz was the delta:

    Which is, as Savantu states, BS.
    i said, performance would increase, not it would make up the 60% per delta, please just read what i type instead of interpreting it wrongly

    Quote Originally Posted by JumpingJack View Post
    Also, don't be condescending or I will put you in your place but fast.

    jack
    should i be scared? i am just posting some relevant stuff you don't agree with, and you start posting stuff like this? I had a lot of respect for you and your knowledge, but you lost it in 5 minutes

  15. #140
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    i said, performance would increase, not it would make up the 60% per delta, please just read what i type instead of interpreting it wrongly



    should i be scared?
    You said:
    this is just because of the qpi link, and few people use 3-way sli, even SLI isnt used a lot
    this is not true. The BW from the CPU to the cards is not at play here.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  16. #141
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    You said: this is not true. The BW from the CPU to the cards is not at play here.
    well, buy 3 GTX280 and find out yourself?

    i clearly said, extra FSB results in better scaling
    i didnt said ( and didnt mean) that 100mhz extra fsb would result in 60% more performance

    in the next phrase i state that the QPI link is reponsible for the much better scaling compared to the FSB
    Last edited by Bellisimo; 11-04-2008 at 03:10 AM.

  17. #142
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    i said, performance would increase, not it would make up the 60% per delta, please just read what i type instead of interpreting it wrongly



    should i be scared? i am just posting some relevant stuff you don't agree with, and you start posting stuff like this? I had a lot of respect for you and your knowledge, but you lost it in 5 minutes
    You did not show any respect, I was offended. And what you posted has no relevance since it is irrelevant because it is wrong.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  18. #143
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    You did not show any respect, I was offended. And what you posted has no relevance since it is irrelevant because it is wrong.
    obviously you mistook the memory i was referring to with GPU memory, thats why i specified it to cpu - fsb - nb - mem

  19. #144
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by savantu View Post
    Huh ? GPUs don't talk to memory over FSB , they have DMA over the NB.Using DDR3 1600 , GPUs have 25.6GBs of BW ( minus what goes to the FSB ) available.
    He clearly does not understand this, or did not read into the concept that PCIe 2.0 can be a bus master.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  20. #145
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Bellisimo View Post
    obviously you mistook the memory i was referring to with GPU memory, thats why i specified it to cpu - fsb - nb - mem
    Ahhh, I see... so how much bus BW do you think it takes up to write the command buffer to the GPU? Do you actually have data or numbers? I mean, would you expect that if I half the BW to the GPU via the FSB I should see roughly 1/2 the FPS performance?
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  21. #146
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by Bellisimo View Post
    obviously you mistook the memory i was referring to with GPU memory, thats why i specified it to cpu - fsb - nb - mem
    Let me help simplify it for you.

    CPU goes CPU->FSB->MCH->Memory.

    Your GPUs goes GPU->PCIe->MCH->Memory.

    Nowhere does the GPu go over the CPU for its textures.

    i7 and even Core 2 got alot of untapped power. You can also see that in the lowres benches.

    Last edited by Shintai; 11-04-2008 at 03:19 AM.
    Crunching for Comrades and the Common good of the People.

  22. #147
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Bellisimo View Post
    well, buy 3 GTX280 and find out yourself?

    i clearly said, extra FSB results in better scaling
    i didnt said ( and didnt mean) that 100mhz extra fsb would result in 60% more performance

    in the next phrase i state that the QPI link is reponsible for the much better scaling compared to the FSB
    Empiric evidence : why doesn't K10 perform similarly with Nehalem then ? After all , HT offers similar BW as QPI.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  23. #148
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by JumpingJack View Post
    Ahhh, I see... so how much bus BW do you think it takes up to write the command buffer to the GPU? Do you actually have data or numbers?
    i don't see your numbers contradicting with what i am saying, did you check the anand link? not alot of improvement for 790i - qx9770 to x58 - 965

  24. #149
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by Shintai View Post
    Nowhere does the GPu go over the CPU for its textures.
    lol, where did i say that?

    Quote Originally Posted by savantu View Post
    Empiric evidence : why doesn't K10 perform similarly with Nehalem then ? After all , HT offers similar BW as QPI.
    because k10 is a slow cpu....
    but maybe you should ask jack that question, after all, cpu's and their busses are not related to anything a gpu does appearantly
    Last edited by Bellisimo; 11-04-2008 at 03:21 AM.

  25. #150
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by Bellisimo View Post
    lol, where did i say that?
    You didnt, but you obviously dont understand it either for what the GPU needs.

    The only large bandwidth requiring part is textures. And that doesnt go via the FSB on a Core 2. Hence why JJs test showed no difference between 200 and 400.

    And K10 being a "slow" CPU got nothing to do with it.

    You have 3 people including proof against you. I think its time for you to show the edvidence for your claims.
    Last edited by Shintai; 11-04-2008 at 03:23 AM.
    Crunching for Comrades and the Common good of the People.

Page 6 of 20 FirstFirst ... 345678916 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •