MMM
Page 21 of 29 FirstFirst ... 1118192021222324 ... LastLast
Results 501 to 525 of 719

Thread: AMD cuts to the core with 'Bulldozer' Opterons

  1. #501
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Also he keeps quoting those AT game benchmarks while back in real world gaming the results are pretty much even across the number of chips.LC already showed the artifact in benchmarking(the "highest fps") which other websites don't bother addressing. What matters is minimum and average(minimum more than average) since these define the "gameplay fluidity".
    Quote Originally Posted by MS @ LC
    The only obvious differences are in the max frame rates, we have the explanation below:

    The highest frame rates only show at the very beginning of the benchmark and are an artifact of the measuring routine, whereas everything else is essentially constant between the three CPUs tested here.
    And Boris,the 12% number you so often quote is meaningless and has nothing to do with "single core IPC improvement" as you want to believe...
    And no you have it backwards,if it(BD) scales "bad" with many cores then the single thread perf. will be higher... The problem with the 'bad scaling" is 1) it won't scale bad 2) you can't extrapolate any figure from 50% AMD gave us.
    Last edited by informal; 08-05-2010 at 05:36 AM.

  2. #502
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by informal View Post
    Oh so now X6 can't compare(you mean compete?) with i7s?Right.
    Lost Circuits is not a joke,it's probably the best HW website out there.The guy who runs it is the VP of OCZ tech. And 3D rendering is one major part of performance when it comes to modern day desktop chips,just like A/V encoding.Gaming comes last since you can game just as well with lower end QC like Q6600 or Phenom 9850/810.
    It is the best because it is the only on which has weird scores where Phenom CPUs are matching or exceeding i7s
    I trust the plethora of other reviews, both by websites and independent benchers where i7s leave the Phenoms in the dust.

    The best review site is IMO, Techreport by far. And Techreport numbers are inline with all other major sites except LostCircuits.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  3. #503
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Manicdan View Post
    comparing a server chip built for super perfect thread scaling, with gaming where a duel core still offers the best fps/dollar, is fail i think

    also if the 50% faster with 33% is with same clockspeeds, please someone say it, cause so far i dont think it has been mentioned, and will greatly affect the IPC calculations people like so much
    Hope you don't mean me. Because I'm not doing IPC calculations, the reason I compared IPC between i5 and Phenom II is that they have the almost same possible frequency spectra. Even if i5 might have a bit higher potential.
    Therefore, IPC is a good measurement in comparing differences in architectures like Phenom II and iX without HT.

  4. #504
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by Manicdan View Post
    comparing a server chip built for super perfect thread scaling, with gaming where a duel core still offers the best fps/dollar, is fail i think

    also if the 50% faster with 33% is with same clockspeeds, please someone say it, cause so far i dont think it has been mentioned, and will greatly affect the IPC calculations people like so much
    -50% is the integer performance for 8 modules vs 12 cores.
    -Fp performance a lot better than 50%, check the old chart.
    -JF said that is a conversvativ number. So It's >+50%.

    So single thread on BD is gonna take an impressiv boost, more than you think.

    In 3D games, in low res, intel is fast, but in high res with filters, the difference with a 980 and 1090T is very low.

    Who play without filters today ? Nominal is 1920*1200*AA4x*AF16x in mainstream. Most of the time if 8x is not too slow you activate it too.

    And the money value in far better in AMD products.

    Why pay for 4 motherboards with same chipset or almost ?

    P965, do you remember ?

    A guy that have a thuban now, don't change RAM, don't change motherboard, just a bios update.

    An AM3 good motherboard is arround 150€. An intel good motherboard is 199-250€.

    Intel win 15-20% in low res ? and what ? I would not pay for that ( again ).

  5. #505
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by informal View Post
    Also he keeps quoting those AT game benchmarks while back in real world gaming the results are pretty much even across the number of chips.LC already showed the artifact in benchmarking(the "highest fps") which other websites don't bother addressing. What matters is minimum and average(minimum more than average) since these define the "gameplay fluidity".
    Letting the GPU bottleneck the processor is a very bad way of measuring how long the processor will be capable of the latest games. You have to let the processor bottleneck if you shall know the true capacity of the processor which is essential in estimating longevity.

    In low resolutions in 2007, a A64 X2 3800+ and a C2D Q6600+ could performe the same. Which one would be able to run the latest games today?

    Quote Originally Posted by informal View Post
    And no you have it backwards,if it(BD) scales "bad" with many cores then the single thread perf. will be higher... The problem with the 'bad scaling" is 1) it won't scale bad 2) you can't extrapolate any figure from 50% AMD gave us.
    That's exactly what I said! If Bulldozer scales terrific, then it can achieve 50% over Bulldozer in multithreaded scenarios and still be slower per core.

    Let me make up some numbers to show:

    BD performance per core = 1 bogomark.
    With optimal scaling a full 16 core Bulldozer will achieve 16 bogomarks.

    MC perforamnce per core = 1.1 bogomarks. But it scales less.
    With only 80% effiency and 12 cores it will achieve 10.56 bogomarks.

    In this example with numbers I just pulled out of my arse. I clearly show why bulldozers rumored scaling advantage, really is a singlethread disadvantage if 50% is all it can achieve.

    That's why I am hoping it scales lousy, that would mean much higher single thread performance.
    Last edited by -Boris-; 08-05-2010 at 05:59 AM.

  6. #506
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    @Savantu
    The only outlier I can see is the mainconcept test for which there must be some explanation(maybe due to compiler/switches used in the creation of the program).

    And Boris,nobody games at low resolution(with no filters applied).So no,testing at low resolution is not representative of real world gaming and I explicitly said real world gaming.
    As for BD and performance per core,like I said the 50% number can't be used to extrapolate single core perf. increase,period.
    Last edited by informal; 08-05-2010 at 06:01 AM.

  7. #507
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    there is just so little known about whats involved with that 50%, the socket as a whole is faster, but what is still the limiting factor, could it be the new arcitecture is just not that powerful, or could it be that the TDP was reached too quickly since the 32nm process is still so new and leaky.

    if we find out that 50% is done at 1ghz, what would people say then? and what if we find out desktop is going to be 2ghz, would we hate on it? or would we say it has a 500mhz advantage over intel?

    i do like trying to figure out stuff, but theres a difference between doing "what if" calculations and putting specs into JFs mouth that he never said.

  8. #508
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by -Boris- View Post
    Letting the GPU bottleneck the processor is a very bad way of measuring how long the processor will be capable of the latest games. You have to let the processor bottleneck if you shall know the true capacity of the processor which is essential in estimating longevity.

    In low resolutions in 2007, a A64 X2 3800+ and a C2D Q6600+ could performe the same. Which one would be able to run the latest games today?



    That's exactly what I said! If Bulldozer scales terrific, then it can achieve 50% over Bulldozer in multithreaded scenarios and still be slower per core.

    Let me make up some numbers to show:

    BD performance per core = 1 bogomark.
    With optimal scaling a full 16 core Bulldozer will achieve 16 bogomarks.

    MC perforamnce per core = 1.1 bogomarks. But it scales less.
    With only 80% effiency and 12 cores it will achieve 10.56 bogomarks.

    In this example with numbers I just pulled out of my arse. I clearly show why bulldozers rumored scaling advantage, really is a singlethread disadvantage if 50% is all it can achieve.

    That's why I am hoping it scales lousy, that would mean much higher single thread performance.
    In server market gaining 25% of efficency in scaling would be terrific.

    i expect a cruching power from BD, It's 33% wider per core in integer. Would be awesome, if it's slower IPC.

    And FMAC 256 ... it's just omg.

    Even with a bad frequency, ( on high end desktop ) it should be in pair with a dual x4.

    The biggest change of architecture from AMD was K6-III to Athlon. Athlon to Athlon XP was small, Athlon 64 was nice improvement, for low number of transistor change. So was in internal small.

    Phenom was expected to change more dramaticaly, but in fact was not so big internaly too.

    Only reading spec for the BD is just an awesome of work. Execution unit is completely new, first time from Athlon.

    Shared L2 for a module is a very nice idea i think taked from core 2. And CMT is like the first step before reverse HT. So we are no so far from it.

    I just can wait for the 24 to know more deeply inside this awesome chip.
    Last edited by madcho; 08-05-2010 at 06:29 AM.

  9. #509
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by informal View Post
    And Boris,nobody games at low resolution(with no filters applied).So no,testing at low resolution is not representative of real world gaming and I explicitly said real world gaming.
    Real world performance is all that matters, I agree. But the real world in a few years is that all processors from today won't cut it.
    Again. A X2 3800+ could perform exactly like a Q6600 in 2007. But in modern games with modern graphicscards, only one of them still works satisfactory. Well already in 2008 one of them would fail hard when GTA IV was released.
    That's why you measure how good a processor is at games in a CPU bench, and don't let the GPU bottleneck. You can't say two processors will be equal in games for 3-4 years, based on a bottlenecked bench today.

    I promise you, a 980X will be able to run the latest games for many years, you can't say that about all the processors in your graph.


    Quote Originally Posted by informal View Post
    As for BD and performance per core,like I said the 50% number can't be used to extrapolate single core perf. increase,period.
    Well I agree, we don't know . But we have to make up our minds here. IF these numbers with 50% and 33% are correct.
    Bulldozer will either scale well, and have a disappointing singlecore performance, or it will scale bad, and have good singlecore performance.
    You can't have both.

    I hope you agree about this?

  10. #510
    Xtreme Member
    Join Date
    Nov 2006
    Posts
    324
    Quote Originally Posted by -Boris- View Post
    Let me make up some numbers to show:

    BD performance per core = 1 bogomark.
    With optimal scaling a full 16 core Bulldozer will achieve 16 bogomarks.

    MC perforamnce per core = 1.1 bogomarks. But it scales less.
    With only 80% effiency and 12 cores it will achieve 10.56 bogomarks.

    In this example with numbers I just pulled out of my arse. I clearly show why bulldozers rumored scaling advantage, really is a singlethread disadvantage if 50% is all it can achieve.

    That's why I am hoping it scales lousy, that would mean much higher single thread performance.
    Boris, I already gave an example of scaling of Phenom II in other thread:
    Thuban in Cinebench R10, multicore result 14156, single core result is 3089. so 6 cores give 4.58 speedup. not 6x, and scaling is 4.58/6 = 76%

    12 cores would be worse, and take in account that scaling is non-linear with cores increase. Generally, the more cores the worse scaling.
    This is for just cinebench, in other applications numbers will be different. But I would not Expecting 100% scaling as JF AMD told us that 50% performance number is applicable to major set of server loads, not some picked application with 100% (if such possible at all with 12 to 16 threads)


    And I repeat 2nd thing - there is no word about frequencies for that 50% performance increase. So your number 12.5% IPC is double-wrong


    Quote Originally Posted by -Boris-
    Bulldozer will either scale well, and have a disappointing singlecore performance, or it will scale bad, and have good singlecore performance.
    You can't have both.
    We already have a number for BD scaling: 80%. Sooooo????
    Last edited by SEA; 08-05-2010 at 06:30 AM.
    Windows 8.1
    Asus M4A87TD EVO + Phenom II X6 1055T @ 3900MHz + HD3850
    APUs

  11. #511
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    weve had quads for 3+ years now and how many games use more than 2 cores?

  12. #512
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by SEA View Post
    Boris, I already gave an example of scaling of Phenom II in other thread:
    Thuban in Cinebench R10, multicore result 14156, single core result is 3089. so 6 cores give 4.58 speedup. not 6x, and scaling is 4.58/6 = 76%
    thats due to turbo

    i set my turbo to be 14x on my 1055T and my speedup was 5.8x i was very impressed. i can do it again tonight and give you a SS, until then its just my word (or some googling)

  13. #513
    Xtreme Member
    Join Date
    Nov 2006
    Posts
    324
    Quote Originally Posted by Manicdan View Post
    thats due to turbo

    i set my turbo to be 14x on my 1055T and my speedup was 5.8x i was very impressed. i can do it again tonight and give you a SS, until then its just my word (or some googling)
    No need. Google is OK:
    Numbers for AMD Phenom II 940 (no turbo at all): 4 cores - 12790, single core - 3463. Which results to 3.69 multicore speedup. That is 92%. And continuing adding cores will lower this percent significantly at some point.
    Last edited by SEA; 08-05-2010 at 06:44 AM.
    Windows 8.1
    Asus M4A87TD EVO + Phenom II X6 1055T @ 3900MHz + HD3850
    APUs

  14. #514
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by -Boris- View Post
    Real world performance is all that matters, I agree. But the real world in a few years is that all processors from today won't cut it.
    Again. A X2 3800+ could perform exactly like a Q6600 in 2007. But in modern games with modern graphicscards, only one of them still works satisfactory. Well already in 2008 one of them would fail hard when GTA IV was released.
    That's why you measure how good a processor is at games in a CPU bench, and don't let the GPU bottleneck. You can't say two processors will be equal in games for 3-4 years, based on a bottlenecked bench today.

    I promise you, a 980X will be able to run the latest games for many years, you can't say that about all the processors in your graph.




    Well I agree, we don't know . But we have to make up our minds here. IF these numbers with 50% and 33% are correct.
    Bulldozer will either scale well, and have a disappointing singlecore performance, or it will scale bad, and have good singlecore performance.
    You can't have both.

    I hope you agree about this?
    The same old "present day CPUs in modern games few years ahead" story was told for a number of years now and what happened?Nothing much since modern day games are mainly shader bound(meaning GPU bound) and the only thing apart the cache amount that matters today and didn't matter few years back is the core count since the additional core(s) are/can be used by PhysX(or Havok). Also you compare X2 @ 2Ghz Vs C2Q @ 2.4Ghz and there is no surprise a dual core won't cut it Vs the quad core whose natural opponent actually is Agena/Phenom I which almost ties 65nm C2Q. So to sum it up,modern day and future games will be GPU bound and you will need many core CPU with solid amount of cache in order to have a good experience(many cores for AI/PhysX offloading).Naturally a strong GPU is a must .

    As for BD,again,the number is an average from many server workloads,not one application or two.You can bet that many of those just won't scale perfectly with more cores(nothing to do with HW abilities of BD) and this just makes the number not appropriate for extrapolating single core performance.Oh and unknown clock rate for BD doesn't help either,like SEA and many others wrote.

  15. #515
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by SEA View Post
    Boris, I already gave an example of scaling of Phenom II in other thread:
    Thuban in Cinebench R10, multicore result 14156, single core result is 3089. so 6 cores give 4.58 speedup. not 6x, and scaling is 4.58/6 = 76%

    12 cores would be worse, and take in account that scaling is non-linear with cores increase. Generally, the more cores the worse scaling.
    This is for just cinebench, in other applications numbers will be different. But I would not Expecting 100% scaling as JF AMD told us that 50% performance number is applicable to major set of server loads, not some picked applocation with 100% (if such possible at all with 12 to 16 threads)
    Read my latest post, if bulldozer scales better and still only delivers 50% with 33% more cores. That means that it will have less performance increase in single threaded applications than 12.5% per core. In order to perform better, combined with better scaling, it would be more than 50% increase in multithreaded applications.


    Quote Originally Posted by SEA View Post
    And I repeat 2nd thing - there is no word about frequencies for that 50% performance increase. So your number 12.5% IPC is double-wrong
    If I mentioned 12.5% higher IPC it must be a mistake, I don't talk about IPC for bulldozer at all. 12.5% is a maximum increase in performance per core if Bulldozer scales better than MC.

    Quote Originally Posted by SEA View Post
    We already have a munber for BD scaling: 80%. Sooooo????
    That number is for the second core inside module, it can't represent the scaling going from one module to four or eight.

  16. #516
    Xtreme Member
    Join Date
    Nov 2006
    Posts
    324
    Quote Originally Posted by -Boris- View Post
    Read my latest post, if bulldozer scales better and still only delivers 50% with 33% more cores. That means that it will have less performance increase in single threaded applications than 12.5% per core. In order to perform better, combined with better scaling, it would be more than 50% increase in multithreaded applications.
    OK, now i see your point. But I don't get why would you bring 50% to core performance, if you don't know scaling factor, frequencies factor???
    In this particular case, the server chip, performance per core lowered to have more cores while staing in same ACP (power package).
    And that gives us the clue what 50% really means:
    Having same ACP as MC the BD will bring these extra 50%.
    In this light your 12.5 is absolutely meaningless. sorry.

    That number is for the second core inside module, it can't represent the scaling going from one module to four or eight.
    So how would one expect scaling more than 90% if half of all cores are 80%??? sorry.
    Last edited by SEA; 08-05-2010 at 08:00 AM.
    Windows 8.1
    Asus M4A87TD EVO + Phenom II X6 1055T @ 3900MHz + HD3850
    APUs

  17. #517
    Xtreme Member
    Join Date
    Oct 2007
    Location
    Sweden
    Posts
    127
    Quote Originally Posted by -Boris- View Post
    So i'm hoping for the opposite to prove me wrong, that Bulldozer scales bad due to modules instead of cores,.....
    We are supposed to have that as a yes, already.

    IIRC Mr. Fruehe has stated somewhere that a module with two ALU-clusters that
    AMD likes to call cores, scales to the given number 1.8 when fully loaded.
    Where a "true" dualcore solution scales to the number of 2.0.

    I've read this as if only one ALU/core is under full load, it's 100%. when two cores is fully loaded each core would be able reach 90% of its peak capacity
    This due to the parts of the module that's shared.

    Edit: Oh my! This thread runs fast
    Last edited by Kej; 08-05-2010 at 07:38 AM. Reason: quite a party :-)

  18. #518
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by informal View Post
    The same old "present day CPUs in modern games few years ahead" story was told for a number of years now and what happened?Nothing much since modern day games are mainly shader bound(meaning GPU bound) and the only thing apart the cache amount that matters today and didn't matter few years back is the core count since the additional core(s) are/can be used by PhysX(or Havok). Also you compare X2 @ 2Ghz Vs C2Q @ 2.4Ghz and there is no surprise a dual core won't cut it Vs the quad core whose natural opponent actually is Agena/Phenom I which almost ties 65nm C2Q. So to sum it up,modern day and future games will be GPU bound and you will need many core CPU with solid amount of cache in order to have a good experience(many cores for AI/PhysX offloading).Naturally a strong GPU is a must .
    CPUs are still important. And you totally missed my point, two processors can have equal performance when bottlenecked. But perform totally different when not. It doesn't matter if it's due to quad vs. dual, if it's frequencies or if it's IPC. Just because a GPU-limited bench says two processors are equal doesn't mean they really are. You think it's unfair that one processor in the example is a dualcore? But bench showed they were equals right?
    And a couple of years back, people said that CPUs weren't a limiting factor anymore. At that time X2 4800+ was king of the hill.
    The same goes for today, that a CPU isn't a limiting factor today, lets say Athlon II X4 2.9GHz. Doesn't mean that it will just as good as a 980x 2014.

    Quote Originally Posted by informal View Post
    As for BD,again,the number is an average from many server workloads,not one application or two.You can bet that many of those just won't scale perfectly with more cores(nothing to do with HW abilities of BD) and this just makes the number not appropriate for extrapolating single core performance.Oh and unknown clock rate for BD doesn't help either,like SEA and many others wrote.
    As for that number, I trust that Fruehe didn't gave us numbers that was negative to BD. I trust these applications scale pretty well. And Besides, it doesn't matter. I'm talking to scaling vs. performance per core. Even if it is applications that don't scale to well, they don't scale to well on MC either.

  19. #519
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by SEA View Post
    OK, now i see your point. But I don't get why would you bring 50% to core performance, if you don't know scaling factor, frequencies factor???
    In this particular case, the server chip, performance per core lowered to have more cores while staing in same ACP (power package).
    And that gives us the clue what 50% really means:
    Having same ACP as MC the BD will bring these extra 50%.
    In this light your 12.5 is absolutely meaningless. sorry.
    Since they are working within the same thermal envelope I don't see why it would be meaningless.

    If bulldozer performs =12.5% better per core, then it scales to 16 cores just as good as MC scales from 1 to 12 cores..

    If bulldozer performs <12.5% better per core, then it scales to 16 cores better than MC scales from 1 to 12 cores.

    If bulldozer performs >12.5% better per core, then it scales to 16 cores worse than MC scales from 1 to 12 cores.

    Quote Originally Posted by SEA View Post
    So how would one expect scaling more that 90% if half of all cores are 80%??? sorry.
    1.8 times performance is 90% scaling. 90% of 2 is 1.8.

    And we don't know about scaling between modules.

    Quote Originally Posted by Kej View Post
    We are supposed to have that as a yes, already.

    IIRC Mr. Fruehe has stated somewhere that a module with two ALU-clusters that
    AMD likes to call cores, scales to the given number 1.8 when fully loaded.
    Where a "true" dualcore solution scales to the number of 2.0.

    I've read this as if only one ALU/core is under full load, it's 100%. when two cores is fully loaded each core would be able reach 90% of its peak capacity
    This due to the parts of the module that's shared.

    Edit: Oh my! This thread runs fast
    The fact that the scaling within modules isn't the same as the scaling between modules complicates the situation a bit.



    Is it just I who think this discussion is going somewhere?
    I think that we are a bit closer to conclude that due to modules, bulldozer scale a bit worse per core than MC. And that Bulldozer might have quite high single threaded performance. I think that a module, with all it's shared parts, is better on single threads than a dualcore, but as AMDs states, scales a bit worse. It all comes down to how well BD scales between modules.
    Last edited by -Boris-; 08-05-2010 at 08:15 AM.

  20. #520
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Boris,using Athlon II for future comparison isn't the best choice due to lack of L3.IMO any other 45nm core(Penryn,Deneb,Thuban,Lynnfield) with solid amount of cache will do great for future games.Bulldozer is supposed to bring another level of shared cache(L2,shared by core pairs) and with more cache and more much improved cores(8 for desktop) I bet it will do great in games.
    BD scaling(the often quoted 80% figure) just means that instead of 8x perfect scaling you can look at the 4x1.8=7.2x for applications that can scale to 8 threads perfectly(like Linpack).This is around 11% penalty,in other words pretty good scaling if application is able to use the cores efficiently.

  21. #521
    Banned
    Join Date
    May 2006
    Location
    Brazil
    Posts
    580
    Quote Originally Posted by informal View Post
    This is around 11% penalty,in other words pretty good scaling if application is able to use the cores efficiently.
    ~11% penalty is not what we'd get even with ''independent'' cores?
    sound fishy
    Last edited by -Sweeper_; 08-05-2010 at 08:36 AM.

  22. #522
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by informal View Post
    Boris,using Athlon II for future comparison isn't the best choice due to lack of L3.IMO any other 45nm core(Penryn,Deneb,Thuban,Lynnfield) with solid amount of cache will do great for future games.Bulldozer is supposed to bring another level of shared cache(L2,shared by core pairs) and with more cache and more much improved cores(8 for desktop) I bet it will do great in games.
    But they perform the same when bottlenecked?
    So L3 matters, but performance don't? A phenom II 910 and a Core i7 980X will perform the same in games 2014? I have a feeling that the benchmarks made today which aren't bottlenecked will give a good clue on the 980X scenario.


    Quote Originally Posted by informal View Post
    BD scaling(the often quoted 80% figure) just means that instead of 8x perfect scaling you can look at the 4x1.8=7.2x for applications that can scale to 8 threads perfectly(like Linpack).This is around 11% penalty,in other words pretty good scaling if application is able to use the cores efficiently.
    This is all wrong, there are no numbers of scaling on multiple modules. The only number we have is within a module, between one and two cores.
    And 1.8x is 90% performance per core.

    How can you get 80% scaling to 11% penalty? The numbers are 90% and 10%.

  23. #523
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by -Sweeper_ View Post
    ~11% penalty is not what we'd get even with ''independent'' cores?
    That's the penalty due to shared front end.You have around 10% penalty for much much less die space investment(no need to have another front end stage,int cores can use full potential of 2 fmac units,shared L2 per module etc.)
    Also this means that each core inside module does perform a bit better on its own(not counting new Turbo BD will have).

    Quote Originally Posted by -Boris- View Post
    But they perform the same when bottlenecked?
    So L3 matters, but performance don't? A phenom II 910 and a Core i7 980X will perform the same in games 2014? I have a feeling that the benchmarks made today which aren't bottlenecked will give a good clue on the 980X scenario.




    This is all wrong, there are no numbers of scaling on multiple modules. The only number we have is within a module, between one and two cores.
    And 1.8x is 90% performance per core.

    How can you get 80% scaling to 11% penalty? The numbers are 90% and 10%.
    Phenom II 910 is QC ,the other is 6 core Westmere.Yes L3 matters and you can see this with Agena Vs Deneb.By 2014 games will use more than 4 cores so yes it (Westmere) will be better performer but not because the things you believe.

    I can't see what you don't understand? You have a scaling within a module.Anything outside of module should behave the same as Lisbon does today(scaling to 6 cores which communicate over shared L3).Each module is a "super core" if you will and each of those will scale the same-hence 4x1.8(or 8 x 0.9 since you like 90% number more )

    Quote Originally Posted by savantu View Post
    Back again to same old stuff : ICC 8.0 did a check for vendor ID, newer versions ( currently ICC 10 ) have the check removed and will check for feature flags ( basically whatever the CPU supports the compiler will throw at it ). However, Intel claim no responsibility for code quality and bugs.
    They say the check in 8.0 was introduced simply because AMD did not give them the detailled errata list for their CPUs ( obviously that AMD refrains from sending samples to Intel for validation ).
    It would be like AMD sampling now BD to Intel so future updates to Intel's compiler can support BD features.
    You should read Agner Fog's latest blogs then

    Quote Originally Posted by Agner Fog
    Intel have released a new version of their Math Kernel Library (v. 10.3) in beta test.

    I have tested the new libraries and found that the CPU dispatching works basically the same way as before. The standard math library, vector math library, short vector math library and the 64-bit version of other math kernel library functions still use an inferior code path for non-Intel processors.

    I have found the following differences from previous versions:

    * Many functions now have a branch for the forthcoming AVX instruction set, but still only for Intel processors. This will increase the difference in performance between Intel and AMD processors on these functions. Both Intel and AMD are planning to support AVX in 2011.

    * The CPU dispatcher for the vector math library has a new branch for non-Intel processors with SSE2. Unlike the generic branch, the new non-Intel SSE2 branch is used only on non-Intel processors, and it is inferior in many cases to the branch used by Intel processors with the same instruction set. The non-Intel SSE2 branch is implemented in the 32-bit Windows version and the 32-bit Linux version, but not in the 64-bit versions of the library.

    * A new Summary Statistics library uses the same CPU dispatcher as the vector math library.

    Obviously, I haven't tested all functions in the library. There may be more differences that I haven't discovered. But it is clear that many functions in the new version of the library still cripples performance on non-Intel processors. I don't understand how they can do this without violating the legal settlement with AMD.
    Reply To This Message
    Last edited by informal; 08-05-2010 at 08:45 AM.

  24. #524
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    I believe that the scaling of multicore CPU is not architecture dependent(well, cache hierarchy and internal bandwidth has some influence). It is dependent on software implementation and platform(Mainly latency caused by memory operations).

    Some algorithms can't scale well across many cores by their nature. Some algorithms are I/O bound(RAM). Some algorithms can scale linearly in theory, but in practice the rest of the platform causes bottlenecks, not really the CPU by it's design.

    And forget about hyper threading, it can only aid with non-optimal(superscalar dependency stalls, cache misses, branch mispredicts) code by making the CPU do stuff while the execution units idle for some reason(bad code), improving efficiency.

    In which cases does the single thread performance matter? Show me an example of a real world program which doesn't scale to multiple threads and is time critical. Apart from games, are there any? For me anything which I could think of using a multicore CPU for is already scaling across multiple cores, and in those cases there's usually a bigger bottleneck from IO delays than CPU throughput.

  25. #525
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by informal View Post
    Phenom II 910 is QC ,the other is 6 core Westmere.Yes L3 matters and you can see this with Agena Vs Deneb.By 2014 games will use more than 4 cores so yes it (Westmere) will be better performer but not because the things you believe.
    Stop this nonsense. You claim total performance doesn't matter in real world, with this as proof?!
    But after that you claim that L3 has an invisible impact, and core count?
    Let me get this straight. If I have my Phenom II running a bunch of games today clocked at 2GHz and 4.8GHz in two runs. Limited by my GPU I get the exact same FPS in both cases. Will both perform equally in the games realesed in 2014? Same core count, same cache, same everything except frequency.

    Quote Originally Posted by informal View Post
    I can't see what you don't understand? You have a scaling within a module.Anything outside of module should behave the same as Lisbon does today(scaling to 6 cores which communicate over shared L3).Each module is a "super core" if you will and each of those will scale the same-hence 4x1.8(or 8 x 0.9 since you like 90% number more )
    I'm not to sure, we don't know how fast the L3 is. It could be enhanced much. I often see i7 with 3 times the bandwidth. Faster L3 could improve scaling.
    Last edited by -Boris-; 08-05-2010 at 09:03 AM.

Page 21 of 29 FirstFirst ... 1118192021222324 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •