Page 3 of 11 FirstFirst 123456 ... LastLast
Results 51 to 75 of 263

Thread: What to Expect From AMD at ISSCC 2011

  1. #51
    Registered User
    Join Date
    Jun 2010
    Posts
    61
    Quote Originally Posted by Glow9 View Post
    By not doing his research you mean not copying down what the engineer said correctly? You in denial or something?
    Your not related to terrace215 are you ? Ill trust JF-AMD over a news artical which is using the magic 90% number incorrectly.


    EDIT
    Found the quote i was looking for.

    Quote Originally Posted by JF-AMD
    <totally rhetorical question, NOT real numbers>
    Which would you rather have:
    80% of the performance with 50% of the cost and 50% of the power consumption
    100% of the performance with 120% of the cost and 120% of the power consumption
    </end rhetorical question>

    People keep seeing that 80% number and thinking that it is a compromise. What they don't understand is that by sharing components we are able to add more cores in the same die space and same power budget.

    It is by no means 80% of today's performance.
    Last edited by EvilOne; 02-22-2011 at 12:26 AM. Reason: Added quote.

  2. #52
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Glow9 View Post
    By not doing his research you mean not copying down what the engineer said correctly? You in denial or something?
    How about we just wait a bit and read the articles from other sources?
    We have all heard the 90% claim which came from 80% performance improvement a module sees when fully loaded(as opposed to 90-100% that 10h dual core sees). My guess is that the reporter just didn't understand what's being said in the presentation. AMD's goal was clearly to milk almost perfect scaling from a fused compute unit having shared resources between two functional and independent cores.They seem to achieve this goal by claiming an average speed-up of 80%(or 1.8x) which equates to 0.9x or 90% per core,when fully loaded module is observed. Nothing to do with direct "K10.5" IPC comparison or whatever previous core AMD developed. Just the scaling of cores inside compute unit(module) and relatively small die investment for the claimed 1.8x throughput scaling.
    Last edited by informal; 02-22-2011 at 12:34 AM.

  3. #53
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Still no any news about last day's presentation?

    This eetimes article is just a rewrite of the informations from the event's program.
    -

  4. #54
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    hm, maybe they thought if X2=100% in multithreading, then 1 modul=90% in multithreading? Some as % efectivity comparsion to K10.5?
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  5. #55
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by Oliverda View Post
    Still no any news about last day's presentation?

    This eetimes article is just a rewrite of the informations from the event's program.

    ding ding ding!!! we have a winner
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  6. #56
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    "issue up to four instructions per cycle. The unit helps the core meet its target of delivering 90 percent of performance of past AMD cores"


    Hangon.. 4 issue front end helps achieve 90% of the performance of a 3 issue front end?

    ..EEtimes article has issues.

    as said, 90% the performance of an equivilent dual core, was AMD's statement. Which says nothing about end performance (it could be rubbish, good, or somewhere inbetween)
    Last edited by mAJORD; 02-22-2011 at 03:00 AM.

  7. #57
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    JF, help us
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  8. #58
    Xtreme Mentor
    Join Date
    Feb 2009
    Location
    Bangkok,Thailand (DamHot)
    Posts
    2,693
    I expect high overclock
    Intel Core i5 6600K + ASRock Z170 OC Formula + Galax HOF 4000 (8GBx2) + Antec 1200W OC Version
    EK SupremeHF + BlackIce GTX360 + Swiftech 655 + XSPC ResTop
    Macbook Pro 15" Late 2011 (i7 2760QM + HD 6770M)
    Samsung Galaxy Note 10.1 (2014) , Huawei Nexus 6P
    [history system]80286 80386 80486 Cyrix K5 Pentium133 Pentium II Duron1G Athlon1G E2180 E3300 E5300 E7200 E8200 E8400 E8500 E8600 Q9550 QX6800 X3-720BE i7-920 i3-530 i5-750 Semp140@x2 955BE X4-B55 Q6600 i5-2500K i7-2600K X4-B60 X6-1055T FX-8120 i7-4790K

  9. #59
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    yes, but I thought, per clock will be Zambezi about 20-25% better than K10.5. And dont expect clock higher than 3.5 GHz on start at stock....
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  10. #60
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    JF-AMD wont help us before launch day or until the nda's are off at least ...

    but then again guys we have at best 3 months before the product hits the shelf ... so it aint that bad of a wait anymore
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  11. #61
    Registered User
    Join Date
    Apr 2010
    Posts
    8
    Quote Originally Posted by Sn0wm@n View Post
    JF-AMD wont help us before launch day or until the nda's are off at least ...
    I think JF can help us, by confirming that IPC increases (as he's said before). Because that 90% reads like a decrease.

  12. #62
    Xtreme Member
    Join Date
    Jul 2008
    Posts
    260
    Some new info from ISSCC???

  13. #63
    Xtreme Member
    Join Date
    Apr 2008
    Posts
    239
    With 90% performance compared to the old core it better damn turbo like crazy, but will that be enough to compete with say an i5-2400 in software that can't make use of it's eight cores?

  14. #64
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    The Netherlands
    Posts
    896
    It isn't performing 90% of current AMD cores, because JF-AMD already said bulldozer would have higher IPC. What they mean (as I understand it), is that when a module is fully loaded, the individual cores in that module are performing at 90% of their maximum performance when only one of the cores in that module would be loaded. The reason why performance per core goes down to 90% when the module is fully loaded, is because both cores in the same module share some of the resources. In my opinion, the performance hit for sharing resources is _really_ small.

  15. #65
    Xtreme Member
    Join Date
    May 2005
    Posts
    159
    I don't know if the reporter got something wrong but I remember hearing the 90% performance of todays cores with low power in the past meaning the Bobcat core.

    It sounds like a mixup.

    Anyone who still goes on about it, do you really think AMD would release a highly awaited top end processor that is a downgrade in performance? camon, really?
    Quote Originally Posted by Movieman
    been lots of years since I played with an AMD and this is just an hour so bear with me..
    My first thoughts on it is that it's fast, it's smoothe and it's fun.
    Quote Originally Posted by Movieman
    Yes, the i7 does have the edge in pure grunt but then again the AMD has that little something I can't quite put my finger on except to use that word 'smoother" and that will get me flamed faster than posting kiddy :banana::banana::banana::banana: on the Christian networks site.
    Main Rig: Phenom II 550 (x4) @3.9Ghz - Gigabyte 6950@6970 - Asus M4A-785D M Pro - Samsung HDs 2x2TB,1x1.5TB,2x1TB - Season X-650 | OpenCL mining rigs: 2x Phenom II 555(x4) - 1xMSI 890FXA-GD70 - 1xGB 990FXA-UD7 (SICK ) - 1xHD6990 - 1x6950@70 - 6x5850 - 2xCooler Master Silent Pro Gold 1kW

  16. #66
    Xtreme Member
    Join Date
    Apr 2008
    Posts
    239
    Quote Originally Posted by JkS View Post
    Anyone who still goes on about it, do you really think AMD would release a highly awaited top end processor that is a downgrade in performance? camon, really?
    Well it wouldn't really be a downgrade since BD will clock higher and have an aggressive turbo mode.

  17. #67
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    OK, daddy is going to do some math, everyone follow along please.

    First: There is only ONE performance number that has been legally cleared, 16-core Interlagos will give 50% more throughput than 12-core Opteron 6100. This is a statement about throughput and about server workloads only. You CANNOT make any client performance assumptions about that statement.

    Now, let's get started.

    First, everything that I am about to say below is about THROUGHPUT and throughput is different than speed. If you do not understand that, then please stop reading here.

    Second, ALL comparisons are against the same cores, these are not comparison different generations nor are they comparisons against different architectures.

    Assume that a processor core has 100% throughput.

    Adding a second core to an architecture is typically going to give ~95% greater throughput. There is obviously some overhead because the threads will stall, the threads will wait for each other and the threads may share data. So, two completely independent cores would equal 195% (100% for the first core, 95% for the second core.)


    Looking at SPEC int and SPEC FP, Hyperthreading gives you 14% greater throughput for integer and 22% greater throughput for FP. Let's just average the two together.

    One core is 100%. Two cores are 118%. Everyone following so far? We have 195% for 2 threads on 2 cores and we have 118% for 2 threads on 1 core.

    Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.

    Running 2 threads on the same module is ~180%.

    You can see why the strategy is more appealing than HT when it comes to threaded workloads. And, yes, the world is becoming more threaded.

    Now, where does the 90% come from? What is 180% /2? 90%.

    People have argued that there is a 10% overhead for sharing because you are not getting 200%. But, as we saw before, 2 cores actually only equals 195%, so the net per core if you divide the workload is actually 97.5%, so it is roughly a 7-8% delta from just having cores.

    Now, before anyone starts complaining about this overhead and saying that AMD is compromising single thread performance (because the fanboys will), keep in mind that a processor with HT equals ~118% for 2 threads, so per thread that equals 59%, so there is a ~36% hit for HT. This is specifically why I think that people need to stay away from talking about it. If you want to pick on AMD for the 7-8%, you have to acknowledge the ~36% hit from HT. But ultimately that is not how people jusdge these things. Having 5 people in a car consumes more gas than driving alone, but nobody talks about the increase in gas consumption because it is so much less than 5 individual cars driving to the same place.

    So, now you know the approximate metrics about how the numbers work out. But what does that mean to a processor? Well, let's do some rough math to show where the architecture shines.

    An Orochi die has 8 cores. Let's say, for sake of argument, that if we blew up the design and said not modules, only independent cores, we'd end up with about 6 cores.

    Now let's compare the two with the assumption that all of the cores are independent on one and in modules on the other. For sake of argument we will assume that all cores scale identically and that all modules scale identically. The fact that incremental cores scale to something less than 100% is already comprehended in the 180% number, so don't fixate on that. In reality the 3rd core would not be at 95% but we are holding that constant for example.

    Mythical 6-core bulldozer:
    100% + 95% + 95% + 95% + 95% + 95% = 575%

    Orochi die with 4 modules:
    180% + 180% + 180% + 180% = 720%

    What if we had just done a 4 core and added HT (keeping in the same die space):
    100% + 95% +95% +95% + 18% + 18% + 18% + 18% = 457%

    What about a 6 core with HT (has to assume more die space):
    100% + 95% +95% +95% +95% +95% + 18% + 18% + 18% + 18% + 18% + 18% = 683%

    (Spoiler alert - this is a comparison using the same cores, do NOT start saying that there is a 25% performance gain over a 6-core Thuban, which I am sure someone is already starting to type.)

    The reality is that by making the architecture modular and by sharing some resources you are able to squeeze more throughput out of the design than if you tried to use independent cores or tried to use HT. In the last example I did not take into consideration that the HT circuitry would have delivered an extra 5% circuitry overhead....

    Every design has some degree of tradeoff involved, there is no free lunch. The goal behind BD was to increase core count and get more throughput. Because cores scale better than HT, it's the most predictable way to get there.

    When you do the math on die space vs. throughput, you find that adding more cores is the best way to get to higher throughput. Taking a small hit on overall performance but having the extra space for additional cores is a much better tradeoff in my mind.

    Nothing I have provided above would allow anyone to make a performance estimate of BD vs. either our current architecture or our compeition, so, everyone please use this as a learning experience and do not try to make a performance estimate, OK?
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  18. #68
    Xtreme Addict
    Join Date
    Feb 2006
    Location
    Vienna, Austria
    Posts
    1,940
    Quote Originally Posted by JkS View Post
    I don't know if the reporter got something wrong but I remember hearing the 90% performance of todays cores with low power in the past meaning the Bobcat core.

    It sounds like a mixup.

    Anyone who still goes on about it, do you really think AMD would release a highly awaited top end processor that is a downgrade in performance? camon, really?
    QFT

    bobcat is 90% of todays cores clock/clock

    why should a more than 2 times wider core (one modules is more than 4 4times wider than a bobcat core) with a significantly more advanced memory subsystem perform the same as bobcat????

    this 90% performance of K10 number is complete nonsense and eetimes failed big time with their article
    Core i7 2600k|HD 6950|8GB RipJawsX|2x 128gb Samsung SSD 830 Raid0|Asus Sabertooth P67
    Seasonic X-560|Corsair 650D|2x WD Red 3TB Raid1|WD Green 3TB|Asus Xonar Essence STX


    Core i3 2100|HD 7770|8GB RipJawsX|128gb Samsung SSD 830|Asrock Z77 Pro4-M
    Bequiet! E9 400W|Fractal Design Arc Mini|3x Hitachi 7k1000.C|Asus Xonar DX


    Dell Latitude E6410|Core i7 620m|8gb DDR3|WXGA+ Screen|Nvidia Quadro NVS3100
    256gb Samsung PB22-J|Intel Wireless 6300|Sierra Aircard MC8781|WD Scorpio Blue 1TB


    Harman Kardon HK1200|Vienna Acoustics Brandnew|AKG K240 Monitor 600ohm|Sony CDP 228ESD

  19. #69
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by Game_boy View Post
    I think JF can help us, by confirming that IPC increases (as he's said before). Because that 90% reads like a decrease.

    it allready has been posted on this very own thread and in other threads... thanks for making it your very first post in this forum tho





    Quote Originally Posted by AKM View Post
    With 90% performance compared to the old core it better damn turbo like crazy, but will that be enough to compete with say an i5-2400 in software that can't make use of it's eight cores?

    again ... it has been said multiple times by JF-AMD who was autorised and cleared by amd's legal team to say to us that bulldozer will have an ipc increase ... where do you think that informal's funny quote came from???


    the same thread that jf-amd confirmed the icp increase .... so can we please stop saying this nonsense about a decrease in performance ....
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  20. #70
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    Quote Originally Posted by Sn0wm@n View Post
    it allready has been posted on this very own thread and in other threads... thanks for making it your very first post in this forum tho








    again ... it has been said multiple times by JF-AMD who was autorised and cleared by amd's legal team to say to us that bulldozer will have an ipc increase ... where do you think that informal's funny quote came from???


    the same thread that jf-amd confirmed the icp increase .... so can we please stop saying this nonsense about a decrease in performance ....
    Until I see numbers, I'll assume the worse and say it's a 50% decrease. Thats why all of AMD's management is leaving because they know a disaster is looming. I'm being totally sarcastic of course, but you can see what effect the secrecy is having on the general public. Most people I speak to think AMD's next chip is going to be a turd.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  21. #71
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by freeloader View Post
    Until I see numbers, I'll assume the worse and say it's a 50% decrease. Thats why all of AMD's management is leaving because they know a disaster is looming. I'm being totally sarcastic of course, but you can see what effect the secrecy is having on the general public. Most people I speak to think AMD's next chip is going to be a turd.

    cant blame amd for being secretive about a 10 year long project can we???
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  22. #72
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Thanks JF-AMD for additional clarifications!

    More info at official BD blog:

    http://blogs.amd.com/work/2011/02/21...ign-solutions/

    Quote Originally Posted by extract
    This paper also details design of the Bulldozer Floating Point Unit (FPU) shown in Figure 2. High performance computing relies heavily on vector (packed integer) and floating point operations, both handled in the FPU. Bulldozer was designed to execute these operations at higher performance and using less power than the current generation of microprocessors. Key to Bulldozer’s performance and power improvements are FPU changes, including completely redesigned arithmetic units and control structures. This paper covers logic and circuit design goals and tradeoffs for the FP scheduler, datapaths, and register files. As previously described at HotChips 2010, the Bulldozer FPU supports new instructions including SSSE3, SSE4.1, SSE4.2, AVX, AES, and advanced Multiply-Add/Accumulate operations. Fitting these features into the available silicon area, power, and frequency required significant circuit innovations, including pipeline restructuring and a completely new floorplan.

    Changes to the Integer Execution Unit will be described in another ISSCC paper (Session 4.6) titled “40-entry Unified, Out-of-Order Scheduler and Integer Execution Unit for the AMD Bulldozer x86-64 Core” (http://isscc.org/program/index.html). “Bulldozer’s” integer data and processor control sequencing are handled in the Integer Execution Unit (EX). This unit consists of a 1-cycle out-of-order instruction scheduler, four integer pipelines, and a Level1 Data Cache. The design also includes significant circuit and floorplan changes to improve frequency (speed) while reducing per core power rating over previous designs. Handling up to four 64-bit instructions per thread, the EX unit improves instruction scheduling and execution while significantly improving frequency and power over previous designs. The ISSCC paper will describe the logic and circuit design of the EX instruction scheduler and datapaths. It will also detail how Bulldozer’s high frequency is achieved using robust, reliable circuit designs.
    http://blogs.amd.com/work/2011/02/21...hats-in-a-box/

    Quote Originally Posted by extract
    So what’s inside those boxes? I’ll give some insight here, but today at ISSCC I’ll present Paper 4.6, “40-Entry unified Out-of-Order Scheduler and Integer Execution Unit for the AMD Bulldozer x86-64 Core,” to discuss the physical design of the scheduler and execution unit for AMD’s two-core Bulldozer module. Even though the module allows AMD to build a chip with many cores on a single die, single-threaded integer performance cannot be compromised. The out-of-order scheduler must efficiently pick up to four ready instructions for execution and wake up dependent instructions so that they may be picked in the next cycle. The execution units must compute results in a single cycle and forward them to dependent operations in the following cycle. All of this is required so that the module gives high architectural performance, measured in the number of instructions completed per cycle (IPC).

  23. #73
    Xtreme Addict
    Join Date
    Sep 2006
    Location
    Stamford, UK
    Posts
    1,336
    Quote Originally Posted by JF-AMD View Post
    OK, daddy is going to do some math, everyone follow along please.

    First: There is only ONE performance number that has been legally cleared, 16-core Interlagos will give 50% more throughput than 12-core Opteron 6100. This is a statement about throughput and about server workloads only. You CANNOT make any client performance assumptions about that statement.

    Now, let's get started.

    First, everything that I am about to say below is about THROUGHPUT and throughput is different than speed. If you do not understand that, then please stop reading here.

    Second, ALL comparisons are against the same cores, these are not comparison different generations nor are they comparisons against different architectures.

    Assume that a processor core has 100% throughput.

    Adding a second core to an architecture is typically going to give ~95% greater throughput. There is obviously some overhead because the threads will stall, the threads will wait for each other and the threads may share data. So, two completely independent cores would equal 195% (100% for the first core, 95% for the second core.)


    Looking at SPEC int and SPEC FP, Hyperthreading gives you 14% greater throughput for integer and 22% greater throughput for FP. Let's just average the two together.

    One core is 100%. Two cores are 118%. Everyone following so far? We have 195% for 2 threads on 2 cores and we have 118% for 2 threads on 1 core.

    Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.

    Running 2 threads on the same module is ~180%.

    You can see why the strategy is more appealing than HT when it comes to threaded workloads. And, yes, the world is becoming more threaded.

    Now, where does the 90% come from? What is 180% /2? 90%.

    People have argued that there is a 10% overhead for sharing because you are not getting 200%. But, as we saw before, 2 cores actually only equals 195%, so the net per core if you divide the workload is actually 97.5%, so it is roughly a 7-8% delta from just having cores.

    Now, before anyone starts complaining about this overhead and saying that AMD is compromising single thread performance (because the fanboys will), keep in mind that a processor with HT equals ~118% for 2 threads, so per thread that equals 59%, so there is a ~36% hit for HT. This is specifically why I think that people need to stay away from talking about it. If you want to pick on AMD for the 7-8%, you have to acknowledge the ~36% hit from HT. But ultimately that is not how people jusdge these things. Having 5 people in a car consumes more gas than driving alone, but nobody talks about the increase in gas consumption because it is so much less than 5 individual cars driving to the same place.

    So, now you know the approximate metrics about how the numbers work out. But what does that mean to a processor? Well, let's do some rough math to show where the architecture shines.

    An Orochi die has 8 cores. Let's say, for sake of argument, that if we blew up the design and said not modules, only independent cores, we'd end up with about 6 cores.

    Now let's compare the two with the assumption that all of the cores are independent on one and in modules on the other. For sake of argument we will assume that all cores scale identically and that all modules scale identically. The fact that incremental cores scale to something less than 100% is already comprehended in the 180% number, so don't fixate on that. In reality the 3rd core would not be at 95% but we are holding that constant for example.

    Mythical 6-core bulldozer:
    100% + 95% + 95% + 95% + 95% + 95% = 575%

    Orochi die with 4 modules:
    180% + 180% + 180% + 180% = 720%

    What if we had just done a 4 core and added HT (keeping in the same die space):
    100% + 95% +95% +95% + 18% + 18% + 18% + 18% = 457%

    What about a 6 core with HT (has to assume more die space):
    100% + 95% +95% +95% +95% +95% + 18% + 18% + 18% + 18% + 18% + 18% = 683%

    (Spoiler alert - this is a comparison using the same cores, do NOT start saying that there is a 25% performance gain over a 6-core Thuban, which I am sure someone is already starting to type.)

    The reality is that by making the architecture modular and by sharing some resources you are able to squeeze more throughput out of the design than if you tried to use independent cores or tried to use HT. In the last example I did not take into consideration that the HT circuitry would have delivered an extra 5% circuitry overhead....

    Every design has some degree of tradeoff involved, there is no free lunch. The goal behind BD was to increase core count and get more throughput. Because cores scale better than HT, it's the most predictable way to get there.

    When you do the math on die space vs. throughput, you find that adding more cores is the best way to get to higher throughput. Taking a small hit on overall performance but having the extra space for additional cores is a much better tradeoff in my mind.

    Nothing I have provided above would allow anyone to make a performance estimate of BD vs. either our current architecture or our compeition, so, everyone please use this as a learning experience and do not try to make a performance estimate, OK?

    So based on this we can assume that BD is 90% faster than...
    Thanks for the post JF, top notch and very informative as usual!
    FX8350 @ 4.0Ghz | 32GB @ DDR3-1200 4-4-4-12 | Asus 990FXA @ 1400Mhz | AMD HD5870 Eyefinity | XFX750W | 6 x 128GB Sandisk Extreme RAID0 @ Aerca 1882ix with 4GB DRAM
    eXceed TJ07 worklog/build

  24. #74
    Xtreme Enthusiast
    Join Date
    Feb 2005
    Posts
    970
    Quote Originally Posted by freeloader View Post
    Until I see numbers, I'll assume the worse and say it's a 50% decrease. Thats why all of AMD's management is leaving because they know a disaster is looming. I'm being totally sarcastic of course, but you can see what effect the secrecy is having on the general public. Most people I speak to think AMD's next chip is going to be a turd.
    Who are you speaking to, people from inetl?

  25. #75
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    The Netherlands
    Posts
    896
    Quote Originally Posted by eXceededgoku View Post

    So based on this we can assume that BD is 90% faster than...
    Thanks for the post JF, top notch and very informative as usual!
    Is it so difficult to see he was merely showing the scaling differences between extra cores, hyperthreading and the way bulldozer does it: Extra cores which share resources inside a module, which saves die space versus extra cores which don't share any resources?

    IMO it was a very informative post. Don't forget JF-AMD is under NDA and simply can't give us any performance numbers.

Page 3 of 11 FirstFirst 123456 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •