MMM
Page 20 of 30 FirstFirst ... 1017181920212223 ... LastLast
Results 476 to 500 of 740

Thread: !!!The Ultimate K8L Thread 2007 & Beyond!!!

  1. #476
    Xtreme Member
    Join Date
    May 2006
    Location
    prospekt Veteranov, Saint-Petersburg, Russia
    Posts
    494
    We've already had confirmation that 40% overall is simply not happening
    Randy Allen and Patrik Patla (AMD directors) told us about 40per cent, and suddenly brentpresley appears and tells us
    that 40% overall is simply not happening
    look better at the picture

    40 % advantage is for rough multitasking environment
    10 % is for single-threaded appl.
    Last edited by MAS; 02-12-2007 at 05:12 AM.

  2. #477
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by brentpresley
    NOT LIKELY. He has A2/A3 silicon (possibly EVEN B1 at this point), and unless there are MAJOR bugs to be fixed, we will not see more than 2-5% more in performance over the ES chips. That puts it AT BEST 15% better than C2D on average, but right now all we know for sure is that current steppings perform only 10% better. That is a FAR cry from the performance lead that AMD used to have. And like I said earlier, how many of us power users buy these is going to completely depend on how well they overclock.

    I've already said that SSE will run better on K10, but you have SEVERELY misplaced your faith if you think programmers are going to optimize and rework their SW just for that (on average it takes 6-12 months to rework a MAJOR piece of SW to take advantage of instruction-level changes like SSE/2/3/4 and most companies just aren't going to put the resources forward to optimize it when they consider current performance as adequate). As someone who has done a few years worth of programming, I can can tell you that the VAST majority of programs will remain integer-based. Only multimedia apps will continue to use floating point operations. There simply is no benefit to making integer apps faster (how FAST can you make Word? Everything is virtually instantaneous as is).

    But hey, keep dreaming.
    funny in my favorite programming class, the teacher whipped code optimization and code splitting into us. knowing when to use integer approximation and when to do massive parallel floating point. Fun class, but definitely not for beginners
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  3. #478
    Banned
    Join Date
    Nov 2006
    Location
    BEYOND THE SUN - SCOTLAND
    Posts
    476
    Quote Originally Posted by brentpresley
    NOT LIKELY. He has A2/A3 silicon (possibly EVEN B1 at this point), and unless there are MAJOR bugs to be fixed, we will not see more than 2-5% more in performance over the ES chips. That puts it AT BEST 15% better than C2D on average, but right now all we know for sure is that current steppings perform only 10% better. That is a FAR cry from the performance lead that AMD used to have. And like I said earlier, how many of us power users buy these is going to completely depend on how well they overclock.
    Everyone that owns a C2D depend on how well it OC's!!!.....they'd be 30%-50% less powerful if they didn't scale as well.

    Well....how many of you power users OC your C2D's? 100% of you?

    When you start talkin' OC's then the performance gap will only grow further.

    We all know a C2D HAS to be overclocked in order to attain the performance levels everyone talks of, why should it be any dif for AMD?
    Last edited by SOLDNER-MOFO64; 02-12-2007 at 07:08 AM.

  4. #479
    Xtreme Enthusiast
    Join Date
    Jun 2006
    Location
    Space
    Posts
    769
    All we can go off are estimates of Performance and a speculative 10% from s7's under NDA guy.

    If this is the only figure we know of, then we expect probably 15% more speed than a c2 (clock for clock). Until other figures are released it is nothing but a pointless argument.

  5. #480
    Banned
    Join Date
    Nov 2006
    Location
    BEYOND THE SUN - SCOTLAND
    Posts
    476
    Quote Originally Posted by brentpresley
    Barcelona INFO straight from AMD:

    http://www.amd.com/us-en/Corporate/V...115794,00.html

    Availability left open as "mid-2007"


    Not much we didn't know, but good to have it in an official statement from AMD.
    Agreed

    Quote Originally Posted by Motiv
    All we can go off are estimates of Performance and a speculative 10% from s7's under NDA guy.

    If this is the only figure we know of, then we expect probably 15% more speed than a c2 (clock for clock). Until other figures are released it is nothing but a pointless argument.
    Here, here

  6. #481
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Just to finish the SSE FUD that somehow started. Look on C2D vs CD. Is the C2D like 6x faster? C2D got 6x higher potential SSE throughput. But its not really that much of the total code thats SSE.

    Less dreaming, more reality please. x87 to SSE patches for games dont even bring that much.

    And SSE is still widely missing at many places....MS tries to force this with no x87 in 64bit. But mandatory SSE. However...dont dream...SSE is a nice boost but no miracle. Its more a matter of cleaning up the stupid x87 and get it removed with time from the CPU.
    Last edited by Shintai; 02-12-2007 at 08:04 AM.
    Crunching for Comrades and the Common good of the People.

  7. #482
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by brentpresley
    Then you know first-hand as well how hard it is.

    It is not just a matter of changing a compiler flag and there you go.
    absolutely, fortunately a well made and documented program can be updated rather quickly. I remember helping in a project to convert an Audio encryption from Integer to SSE3, took a couple days but the performance boost was huge.
    So it ultimately how important performance is to you.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  8. #483
    Xtreme Enthusiast
    Join Date
    Sep 2005
    Location
    Russia, Moscow
    Posts
    548
    the clue is "estimates"
    i hope its atleast half true would still give c2d a run for the money

  9. #484
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by brentpresley
    he he, haven't seen too many of those.

    honestly.
    I definitely agree with you there, heck take ten seconds to look at Microsoft source code and you'll wonder how the hell they got it to run. Some of them just seem to love the "goto statements" But I must admit their Binary interfaces and the assembly they use for it are extremely well made.
    Unfortunately the technically skilled aren't the ones writing the most code.
    And if you really want to see a 300% speed increase, transcribe .Net programs to pure C code. Talk about a huge improvement.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  10. #485
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Some interesting bits:
    A 65nm silicon-on-insulator process is used for producing the near-450-million transistor device, with dual stress liners and a silicon germanium process is used to speed up the pFETs. Eleven layers of copper and low-k dielectrics connect the device.

    At 95 degrees Celsius, modelling suggests the processor will run at between 2.2 and 2.8GHz at 1.15 volts. Each of the four cores include eight temperature sensors. The on-chip northbridge contains a further six.

    The memory interface is 400 to 800Mbps from a 1.7 to 1.9 volt supply for DDR2, and 800 to 1,600Mbps from 1.4 to 1.6 volts for DDR3.

    The HyperTransport interface supports legacy HT1 and 2 modes as well at HT3 at 2.4Hbps with a peak of 5.2Gbps.
    Source:
    http://www.edn.com/article/CA6415782.html?partner=enews

    Enjoy!

    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  11. #486
    Xtreme Enthusiast
    Join Date
    Dec 2003
    Posts
    510
    Quote Originally Posted by LOE
    64 bit - c2d is slower in 64bit mode due to 2 reasons - the iAMD64 and the lack of macro ops fusion in 64bit mode, c2d could easily loose 7-10% of its performance
    But it's not slower, sometimes its faster, sometimes its slower depending on the application, just like the K8. Overall, it still remains the fastest 64-bit x86 processor available today.

    heavy multithreading - we already see quad FX running inferior chips outperforming core2quad in heavy multithreaded scenarios, that gap will only grow bigger when K10 comes out
    We see a one or two unrealistic scenarios where this happens and requires specific situations that benefit from the Quad FX's additional memory controller. However, in a single-socket system, the desktop versions of Barcelona will only have 1 memory controller and 12.8GB/s of memory bandwidth.

    Most other heavy multi-threaded scenarios have the QX6700 beating the Quad FX just as easily as it does in single-threaded scenarios.

    are you sure? C2D can process one 128bit sse instruction per cycle, do you mean pentium has a 21.33 (128/6) bit SSE engine
    A C2D can execute 1 128-bit multiply, 1 128-bit add plus a load, store and jump in the same cycle.
    Last edited by accord99; 02-12-2007 at 03:44 PM.

  12. #487
    Banned
    Join Date
    Nov 2006
    Location
    BEYOND THE SUN - SCOTLAND
    Posts
    476
    Quote Originally Posted by Lightman
    Some interesting bits:


    Source:
    http://www.edn.com/article/CA6415782.html?partner=enews

    Enjoy!


  13. #488
    Xtreme Enthusiast
    Join Date
    Apr 2006
    Location
    Brasil
    Posts
    534
    Quote Originally Posted by accord99
    But it's not slower, sometimes its faster, sometimes its slower depending on the application, just like the K8. Overall, it still remains the fastest 64-bit x86 processor available today.
    All CPUs speed up in 64 bits due to the larger amout of registers and the standard SSE2 instructions.
    But Core2 does not speed up as much as K8 since MacroFusion doesn't work in long mode.

    On SSE execution K10 has little advantage.
    Core2 has 3 SSEs plus one load and one store units.
    K8 has 3 FPUs (that do SSE) plus the load/store unit that do two loads/stores per cycle, on K10 the FPUs are widened to 128 bit so it can do 3 128 bit SSE per cycle plus 2 load/stores.
    So Core2 does 3 SSE, 1 load and 1 store. K10 does 3 SSE, 1 load and 1 store or 2 loads or 2 stores.
    http://www.xbitlabs.com/articles/cpu...amd-k8l_5.html
    Last edited by doompc; 02-12-2007 at 05:28 PM.

  14. #489
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by savantu
    Really ? Not even Conroe manages 1-1.2 except on few codes.

    P4 was around 0.3-0.7 and K8 0.5-0.9 at least for SPEC IIRC.

    I don't know why 0.9 to 1.2 keeps sticking in my head, but in some code base yes, P4 could do that 0.9 to 1.2 (some apps within the SPECINT bench showed this high):

    http://www.princeton.edu/~jdonald/re...uck_pact03.pdf
    The benchmarks that perform
    best in this environment are mcf, art and swim at 93%, 97%
    and 98% of peak respectively. eon and wupwise have relatively
    high instruction throughput of 0.9 and 1.2 IPC respectively,
    while mcf and swim have relatively low IPCs of .08,
    .2 and .4 (all IPCs measured in ��ops). Not unexpectedly,
    then, those applications with low instruction throughput demands
    due to poor memory performance are less affected by
    the statically partitioned execution resources. See Figure 1
    for a summary of results from these runs.
    (EDIT: it is reading this paper sometime ago that 0.9 to 1.2 sticks in my head, because my first thought was wow... a P4 can actually do that )..

    The IPC, of course, is very code dependent (compiler optimizations, instruction ordering, etc) and how the architecture handles the ILP efficiency, combined with all sorts of factors. Truth is I have looked over probably half dozen to dozen papers where the IPC is measured/calculated, HT helps, I have seen IPC as high as 1.6 in some code base. However, the original point is that it really really stunk in a general sense.... a long pipeline with unoptimized code for that situation will generally crater the efficiency.

    Another example of who well and poor the P4 can do IPC wise:
    http://www.geocities.com/ykchen913/p...ions/CAECW.pdf

    In h.264, the IDCT chain could get as high as 1.16 (see table 4). This is a good paper, as it also shows FSB utilization on a P4 is quite low even with a high L2 miss rate.... this is on a 533 MHz FSB .... and multimedia is likely to have the highest demand on FSB.

    Anyway, C2D I do believe is significantly higher than 1.0 IPC on average (some will be low of course, but others high), but I have not found any studies or data that has measured it.

    Barcelona appears to be heading for a good IPC boost, achieving something higher that C2D will be a true accomplishment, C2D did a good job in this department to show the improvements. I am anxious to see the data.

    Jack
    Last edited by JumpingJack; 02-12-2007 at 11:21 PM.

  15. #490
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Lightman
    Some interesting bits:


    Source:
    http://www.edn.com/article/CA6415782.html?partner=enews

    Enjoy!

    95C ... this has gotta be a typo....

  16. #491
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by JumpingJack
    95C ... this has gotta be a typo....
    No it's not a typo. They are modeling core frequency at high temperatures because most servers are running in tight blade cases, people in Africa want to use air cooling , etc.
    To be serious I think it is industrial standard for measurements. Look at speed modeling of other CPUs like this Intel 80core monster. It's speed was modeled at the same 95C temperature.

    This of course has good prospects for us because if you will keep this core at around 65C then @1.15V you can go a bit higher than 2.8GHz on Quad core. . Add some volts and job done! 3GHz should be easy....
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  17. #492
    Banned
    Join Date
    May 2006
    Posts
    20
    Quote Originally Posted by brentpresley
    Keep on flamebaiting and you will go the way of Serge84.

    Does it make your e-Pen_is bigger to throw barbs when you can't come up with any facts to refute my arguments. Only 2 year olds throw temper tantrums.

    He didn't violate NDA to me, just confirmed for me what my other friend at DOE had already said.
    Its not a fact, lol its only a delay. XD And friends tend to lend you things.

    Anyways back on topic...
    Please by all means show us the K10 your friend has? We would all love to see. I mean if he really did show you these performance numbers why don't you/he post them? Any kind of numbers. If your being truthful with all your arguments please state/show more facts on what your saying. Not trying to flame or anything because my statment is not stated as such. Only your not the kind of person that backs up their clams very well, we all just want to know more. Thats what this thread is all about after all.

    Quote Originally Posted by Lightman
    No it's not a typo. They are modeling core frequency at high temperatures because most servers are running in tight blade cases, people in Africa want to use air cooling , etc.
    To be serious I think it is industrial standard for measurements. Look at speed modeling of other CPUs like this Intel 80core monster. It's speed was modeled at the same 95C temperature.

    This of course has good prospects for us because if you will keep this core at around 65C then @1.15V you can go a bit higher than 2.8GHz on Quad core. . Add some volts and job done! 3GHz should be easy....
    These processors can get hot. Opterons are built to take this heat. Server consissions range from 70C to 80C+ temps usoully in a standard blade. Some of us don't know about server condissions, but the ones that do should be taken word for word. It gets very hot in a blade server more then most like. lol Besides cpus are not that fragile. Xeons and opterons alike you would be amazed the punisment they can take in testing as well as 24/7 use.

    Quote Originally Posted by doompc
    All CPUs speed up in 64 bits due to the larger amout of registers and the standard SSE2 instructions.
    But Core2 does not speed up as much as K8 since MacroFusion doesn't work in long mode.

    On SSE execution K10 has little advantage.
    Core2 has 3 SSEs plus one load and one store units.
    K8 has 3 FPUs (that do SSE) plus the load/store unit that do two loads/stores per cycle, on K10 the FPUs are widened to 128 bit so it can do 3 128 bit SSE per cycle plus 2 load/stores.
    So Core2 does 3 SSE, 1 load and 1 store. K10 does 3 SSE, 1 load and 1 store or 2 loads or 2 stores.
    http://www.xbitlabs.com/articles/cpu...amd-k8l_5.html
    Great find.

    Quote Originally Posted by LOE
    are you sure? C2D can process one 128bit sse instruction per cycle, do you mean pentium has a 21.33 (128/6) bit SSE engine

    Pentium D needs 2 cycles to execute one 128 bit sse instruction.. it has a 64bit engine

    core 2 duo has 2x the SSE throughoutput of pentium



    Yes it is missing in minesweeper, but every serious app supports SSE, there are even apps that DO NOT RUN if they don't detect atleast SSE

    One of the reason c2d is 50-100% faster than pentium clock for clock is it's double SSE throughoutput - check apps that render 3d, or encode video
    I took this as fallows...

    Quote Originally Posted by accord99
    But it's not slower, sometimes its faster, sometimes its slower depending on the application, just like the K8. Overall, it still remains the fastest 64-bit x86 processor available today.


    We see a one or two unrealistic scenarios where this happens and requires specific situations that benefit from the Quad FX's additional memory controller. However, in a single-socket system, the desktop versions of Barcelona will only have 1 memory controller and 12.8GB/s of memory bandwidth.

    Most other heavy multi-threaded scenarios have the QX6700 beating the Quad FX just as easily as it does in single-threaded scenarios.


    A C2D can execute 1 128-bit multiply, 1 128-bit add plus a load, store and jump in the same cycle.
    I agree, only K10 has dual memory controllers in the specs. According to previous data in the past threads about K10. Current bandwidth on AM2 is 20GB/s it will nearly be 3x of that bandwidth wise. Double that on memory bandwidth. According to dalytech was it...

    http://www.channelinsider.com/print_...ls/191008.aspx

    Quad-core parts and other Revision H parts are rumored to have two 64-bit independent memory controllers each with its own physical address space thus giving an opportunity to better utilize the available bandwidth in case of random memory accesses occurring in heavily multi-threaded environment. This approach is in a contrary to the previous "interleaved" design, where the two 64-bit data channels are bounded to a single common address space. It will be the first single-chip implementation of the non-uniform memory access architecture.

    http://www.realworldtech.com/page.cf...0206035626&p=1

    http://www.realworldtech.com/page.cf...0206035626&p=2

    http://www.google.com/search?hl=en&q...rs&btnG=Search

    Just some more info on K10 in previous threads. But you all should really read that thread to get the lowdown on K10. I should post a link on the front of the page as a continuation. And not constently rehunting for data having ppl acting like it never existed. Sometimes its silly for somebody to have to repeat themselfs. XD

    http://www.xtremesystems.org/forums/...d.php?t=117702
    Last edited by Grayfox84; 02-13-2007 at 01:40 AM.

  18. #493
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by LOE
    are you sure? C2D can process one 128bit sse instruction per cycle, do you mean pentium has a 21.33 (128/6) bit SSE engine

    Pentium D needs 2 cycles to execute one 128 bit sse instruction.. it has a 64bit engine

    core 2 duo has 2x the SSE throughoutput of pentium



    Yes it is missing in minesweeper, but every serious app supports SSE, there are even apps that DO NOT RUN if they don't detect atleast SSE

    One of the reason c2d is 50-100% faster than pentium clock for clock is it's double SSE throughoutput - check apps that render 3d, or encode video

    Dont try and mix numbers in your favour. Core got 1 SSE port thats 64bit. Core 2 got 3 SSE ports thats 128bit. Yet Core at same FSB/Clock aint much slower than Core 2. And thats with all the rest of the improvements too.
    Crunching for Comrades and the Common good of the People.

  19. #494

  20. #495
    Xtreme Enthusiast
    Join Date
    Dec 2003
    Posts
    510
    Quote Originally Posted by Grayfox84
    I agree, only K10 has dual memory controllers in the specs. According to previous data in the past threads about K10. Current bandwidth on AM2 is 20GB/s it will nearly be 3x of that bandwidth wise. Double that on memory bandwidth. According to dalytech was it...
    AM2's memory bandwidth is 12.8GB/s. If you plug in a Barcelona core into an AM2, that's all you get since there is physically only two channels connecting the memory to the socket.

    http://www.channelinsider.com/print_...ls/191008.aspx

    Quad-core parts and other Revision H parts are rumored to have two 64-bit independent memory controllers each with its own physical address space thus giving an opportunity to better utilize the available bandwidth in case of random memory accesses occurring in heavily multi-threaded environment. This approach is in a contrary to the previous "interleaved" design, where the two 64-bit data channels are bounded to a single common address space. It will be the first single-chip implementation of the non-uniform memory access architecture.
    Intel's current DDR2 memory controllers are already this way.

  21. #496
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by accord99
    AM2's memory bandwidth is 12.8GB/s. If you plug in a Barcelona core into an AM2, that's all you get since there is physically only two channels connecting the memory to the socket.


    Intel's current DDR2 memory controllers are already this way.

    Yeap! At the rated PC6400 speed of course. At the moment AMD and few other companies are trying to push higher memory specification through JEDEC. I heard PC8500 is target for them.
    This would allow DESKTOP version of AMD Quad to get 17GB/s memory bandwidth...

    Of course servers are different animals and I think maximum we will see would be PC6400 Registered dimms (DDR-II 800MHz )
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  22. #497
    Xtreme Addict
    Join Date
    Jul 2004
    Location
    U.S of freakin' A
    Posts
    1,931
    Regarding SSE2, I still expect Core 2 to have edge over the K10, as it has a higher peak theoretical throughput..

    Core 2 can issue a max of 6 SSE instructions per cycle, while the K10 can do 3.

    Ofocurse, there are other factors involved other than peak SIMD throughput, like latency and memory bandwidth, and the K10 will have the edge there.

    But not enough to trounce C2D IMO.

    As for INT, C2D should still maintain a healthy lead as the C2D is a beast in INT. It will be interesting to see which processor holds the performance crown for gaming, as games tend to be far more INT based than FP.

  23. #498
    Xtreme Addict
    Join Date
    Jul 2004
    Location
    U.S of freakin' A
    Posts
    1,931
    Quote Originally Posted by Shintai
    Dont try and mix numbers in your favour. Core got 1 SSE port thats 64bit. Core 2 got 3 SSE ports thats 128bit. Yet Core at same FSB/Clock aint much slower than Core 2. And thats with all the rest of the improvements too.
    Core Duo has 2 64-bit SSE2 ports, not 1.

    As for the closer than expected performance delta between C2D and CD, I put it down to two things:

    1) Merom is FSB limited at 667, far moreso than Yonah.

    2) Yonah was already a very efficient high IPC processor. Actually, it was even faster than the K8 clock for clock in everything but FP intensive apps.

  24. #499
    Xtreme Addict
    Join Date
    Jul 2004
    Location
    U.S of freakin' A
    Posts
    1,931
    Quote Originally Posted by doompc
    All CPUs speed up in 64 bits due to the larger amout of registers and the standard SSE2 instructions.
    But Core2 does not speed up as much as K8 since MacroFusion doesn't work in long mode.
    I'm willing to bet this will be addressed in Penryn.

    On SSE execution K10 has little advantage.
    Core2 has 3 SSEs plus one load and one store units.
    K8 has 3 FPUs (that do SSE) plus the load/store unit that do two loads/stores per cycle, on K10 the FPUs are widened to 128 bit so it can do 3 128 bit SSE per cycle plus 2 load/stores.
    So Core2 does 3 SSE, 1 load and 1 store. K10 does 3 SSE, 1 load and 1 store or 2 loads or 2 stores.
    http://www.xbitlabs.com/articles/cpu...amd-k8l_5.html
    I don't know how accurate this information is. As far as I know, the K10 can issue 2 SSE operations, and one SSE MOV per cycle in the floating point store pipe.

    So thats three instructions peak. Core 2 on the other hand, can potentially do double the K10's SSE issue rate.

  25. #500
    Xtreme Enthusiast
    Join Date
    Apr 2006
    Location
    Brasil
    Posts
    534
    Carfax, it's not clear if the FMISC unit (that do FLOAD in K8) will be widened to 128 bit. If not could not do 1x 128 bit SSE Load per cycle.

    Core2 can theoricaly issue 6x micro-ops per cycle, and it decodes a maximum 2+3 instructions, but it fetches 16 Byte, that's only 128 bits. With the data on bufer waiting to be decoded it may decode an average 3 instructions per cycle.
    I bet the 32 Byte instruction fetch will keep K10's FPUs much busier than Conroe's.
    Last edited by doompc; 02-13-2007 at 05:35 AM.

Page 20 of 30 FirstFirst ... 1017181920212223 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •