MMM
Page 9 of 11 FirstFirst ... 67891011 LastLast
Results 201 to 225 of 262

Thread: Dresdenboys' blog: AMD Bulldozer - Patent based research

  1. #201
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by Particle View Post
    What hype?
    Guess you need to read some post in here again.. :p

    Quote Originally Posted by Particle View Post
    Also remember that K10 was actually a pretty big improvement over K8, before you all forget that.
    Clock for clock they only where ~5% faster then the athlons. There big "face saver" was quadcore, and the trend that apps used more then 2 threads.

    Anyway thats the past, but as i mentioned there also was much hype about phenom before we got any hard facts.
    This just reminds me of that.

  2. #202
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Hornet331 View Post
    Guess you need to read some post in here again.. :p



    Clock for clock they only where ~5% faster then the athlons. There big "face saver" was quadcore, and the trend that apps used more then 2 threads.

    Anyway thats the past, but as i mentioned there also was much hype about phenom before we got any hard facts.
    This just reminds me of that.
    5% ?? Are you joking?
    10h @ 65nm is a solid 15-20% over 65nm/90nm K8 on average,per core and per clock ... A big jump in SSE(more than 20% naturally). 45nm 10h is a solid 8-10% again,on average ,over 10h @ 65nm.All this in client workloads. In server ones,10h killed K8(factoring 2x more cores and IPC improvements).

  3. #203
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Singapore
    Posts
    970
    Quote Originally Posted by qcmadness View Post
    Long time ago, I think the difference between 3 and 4 issue INT core makes a big difference, but the major difference is about the branch prediction and cache miss.

    yes
    had to agree on this
    Intel has better branch prediction hit rate than AMD
    I guess all these thanks to long pipeline P4 architecture. If ain't that, I guess their branch prediction would stay the same.
    Main Rig:
    Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
    Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
    Graphic Card:XFX RX 580 4GB
    Power Supply Unit:FSP AURUM 92+ Series PT-650M
    Storage Unit:Crucial MX 500 240GB SATA III SSD
    Processor Heatsink Fan:AMD Wraith Spire RGB
    Chasis:Thermaltake Level 10GTS Black

  4. #204
    Banned
    Join Date
    Sep 2009
    Posts
    97
    Quote Originally Posted by savantu View Post
    Bulldozer looks to be all about through-output oriented.I think it will be weaker in single threaded integer apps than current K10.5s since the INT units were simplified.
    For FP, a cluster will beat a single core K10.5 but it will be similar to a double core K10.5

    So while in commercial apps it will be great, in desktop ones it won't be any miracle.
    That's my 2 cents.
    When your 2 cents, include Nehalem being "at least" 50% faster then C2D, and being the biggest achievement ever in CPU's history, they really don't mean much.

  5. #205
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by Particle View Post
    What hype? People are doing their best to say it'll actually be worse. Also remember that K10 was actually a pretty big improvement over K8, before you all forget that.
    exactly, amd just had problems with the tlb bug and target dates. weather it means much to most ppl or not, the fact remains that it was the first for many things, such as true native core.....before everyone jumps on me, if it was not a good thing; then why is Nehalem the same way?
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  6. #206
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by LesGrossman View Post
    When your 2 cents, include Nehalem being "at least" 50% faster then C2D, and being the biggest achievement ever in CPU's history, they really don't mean much.
    that's quite impossible since it doesn't scale perfectly.

    at most it is 30% in some work loads.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  7. #207
    Xtreme Guru
    Join Date
    May 2007
    Location
    Ace Deuce, Michigan
    Posts
    3,955
    Quote Originally Posted by sdsdv10 View Post
    So if I understand what your saying, the argument is based on logic not benchmark information. Is this correct?
    That is partially correct, but benchmarks don't always tell the whole story either. For instance, Intel will get scores that are often better than twice as good as AMD due to the l2 cache latency, but in reality Intel only sees about 5-10% ipc increase over k10.5.

    My argument is based off the diagrams we've seen so far, which all point to sizable gains. AMD themselves said 35%, I personally think it'll be much closer to 20% on average, which is still a really big increase. The rest of the math was done based off of logic.

    Quote Originally Posted by savantu View Post
    Bulldozer looks to be all about through-output oriented.I think it will be weaker in single threaded integer apps than current K10.5s since the INT units were simplified.
    For FP, a cluster will beat a single core K10.5 but it will be similar to a double core K10.5

    So while in commercial apps it will be great, in desktop ones it won't be any miracle.
    That's my 2 cents.
    You can't do math can you? All the evidence so far has pointed to fairly good gains in IPC (just look at Dresdenboy's work), there is literally no logical reason to think that it will be slower.

    Really the only major question mark will be the effects of CMT, as none of us (as far as I know) have seen it in action before and so we have no clue what that will do the single threaded performance.

    Quote Originally Posted by qcmadness View Post
    Long time ago, I think the difference between 3 and 4 issue INT core makes a big difference, but the major difference is about the branch prediction and cache miss.
    If by long time ago, you mean Core 2 duo vs K10, then yes. It still does make a big difference and is one of the main reasons why amd can't touch Intel on IPC right now.
    Quote Originally Posted by Hans de Vries View Post

    JF-AMD posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans

  8. #208
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    I think a big problem in all these discussions about per core performance is, that looking at the whole processor's performance is not the best way to guess single core performance. With a hypothetical perfect scaling a 12 core BD based processor (the name of the core actually is "Orochi", BD was just the project according to JF), the per core performance could be higher than for MC, while the single core performance could be lower - because MC needs a higher single core performance to offset losses due to scaling.
    Last edited by Dresdenboy; 03-09-2010 at 10:56 AM.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  9. #209
    Xtreme Addict
    Join Date
    Mar 2005
    Location
    Rotterdam
    Posts
    1,553
    Well if this is in fact true and single core/thread perf. of BD could be the same or possibly lower than MC, isn't this really dangerous for Desktop performance on these chips considering single thread perf. is quite important there?

    How is BD supposed to compete with Intel's future gen. architecture with single threaded perf. lower than MC? I mostly doubt this will be true but hearing Dresdenboy comment on this possibility worries me.
    Gigabyte Z77X-UD5H
    G-Skill Ripjaws X 16Gb - 2133Mhz
    Thermalright Ultra-120 eXtreme
    i7 2600k @ 4.4Ghz
    Sapphire 7970 OC 1.2Ghz
    Mushkin Chronos Deluxe 128Gb

  10. #210
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    I for one seriously doubt that per core performance will be lower than what we have today with 10h.My opinion is the situation will be quite the opposite.

  11. #211
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by Dimitriman View Post
    Well if this is in fact true and single core/thread perf. of BD could be the same or possibly lower than MC, isn't this really dangerous for Desktop performance on these chips considering single thread perf. is quite important there?

    How is BD supposed to compete with Intel's future gen. architecture with single threaded perf. lower than MC? I mostly doubt this will be true but hearing Dresdenboy comment on this possibility worries me.
    This was just an argument against the per core performance analysis. What counts is how fast your apps run with what energy consumption and at which price. Also matters become more complex because of more advanced energy management methods etc. Recently I tried to scale back the Interlagos performance advantage to 4 and 6 cores (2 and 3 modules respectively) and got performance numbers comparable to 4.8 and 3.8 GHz quad- and six-core K10. OTOH I also already heard, that BD scales really well. This would mean somewhat lower numbers for the smaller core counts.

    And then add an even more efficient turbo for the lowly threaded apps or even such ones, which don't utilize some of the units that much (e.g. FPU, caches).. We might see scalable caches, TLBs, FPUs, maybe even integer cores... A lot of such stuff has been patented. So I don't have problems to agree with informal here.

    BTW I'll be offline for a week. So don't expect updates or answers during that time, except I find some internet terminal
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  12. #212
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    I don't get it.
    What makes dresdenboy and everyone else believe that the 4 pipelines should be translated to 2 ALUs and 2 AGUs?
    In Athlon the 3 pipelines consists of 3 ALUs and 3 AGUs.

    What I read as 33% more integer performance, others read as 33% less.
    Explain please.

  13. #213
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by -Boris- View Post
    I don't get it.
    What makes dresdenboy and everyone else believe that the 4 pipelines should be translated to 2 ALUs and 2 AGUs?
    In Athlon the 3 pipelines consists of 3 ALUs and 3 AGUs.

    What I read as 33% more integer performance, others read as 33% less.
    Explain please.
    1 pipeline has ALU +AGU, not ALU or AGU.
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  14. #214
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by vietthanhpro View Post
    1 pipeline has ALU +AGU, not ALU or AGU.
    Exactly!
    I simply don't understand why everyone believes in dresdenboys 2 pipeline config for Bulldozer.

  15. #215
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by -Boris- View Post
    Exactly!
    I simply don't understand why everyone believes in dresdenboys 2 pipeline config for Bulldozer.
    AMD don't confirm !
    i think 3 pipelines for ALU+AGU, 1 pipeline for load/store unit.
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  16. #216
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Please calm down. These numbers are just multiple hypothesis. One is based on stuff like this:
    Within each cluster 150 , execution units 154 may support the concurrent execution of various different types of operations. For example, in one embodiment execution units 154 may support two concurrent load/store address generation (AGU) operations and two concurrent arithmetic/logic (ALU) operations, for a total of four concurrent integer operations per cluster.
    (contained in several Bulldozer related patents)

    Some of my related posts:
    2 ALU/2 AGU hyptothesis:
    http://citavia.blog.de/2009/11/13/ho...-have-7366681/
    4 ALU/4 AGU hypothesis:
    http://citavia.blog.de/2009/11/16/bu...ought-7383623/
    2 ALU/2 AGU hyptothesis by Hiroshige Goto:
    http://citavia.blog.de/2009/12/19/bu...ussed-7605288/
    2 ALU/2 AGU possible confirmation reported by Yusuke Ohara:
    http://citavia.blog.de/2010/01/11/an...japan-7737558/

    But it could also go into another direction:
    In one embodiment, the ALU 220 and the AGU 222 are implemented as the same unit.
    (also from some patents related to a BD like architecture, maybe a successor)

    And to round it up there is my hypothesis, that there are just 2 ALUs and 2 AGUs, but running at a significantly higher clock than today (maybe even as a double pumped design).

    What counts, is the resulting performance of the chip and not, if it has 4 ALUs/4 AGUs at only 2GHz. The goal is not to achieve the ultimate absolute raw power per clock, but the highest performance for different workloads inside a given TDP or ACP "envelope".
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  17. #217
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Dresdenboy View Post
    And to round it up there is my hypothesis, that there are just 2 ALUs and 2 AGUs, but running at a significantly higher clock than today (maybe even as a double pumped design).

    What counts, is the resulting performance of the chip and not, if it has 4 ALUs/4 AGUs at only 2GHz. The goal is not to achieve the ultimate absolute raw power per clock, but the highest performance for different workloads inside a given TDP or ACP "envelope".
    So you say two pipelines despite AMDs charts saying otherwise, and assume a higher frequency, maybe doublepumped pipes to compensate?

    EDIT:
    Still don't get it. Your theory about 2 Pipes seems to be based on charts of performance hinting similar performance per thread. But when you get information about higher integer performance you start talking about higher frequencies instead of the four pipes.

    That performance chart could be based on a 4 module 8 core interalgos, we still don't know if the 16 core part will be 2011 or later.
    Last edited by -Boris-; 04-19-2010 at 03:27 AM.

  18. #218
    Xtreme Member
    Join Date
    Oct 2008
    Location
    Colorado
    Posts
    312
    Quote Originally Posted by ajaidev View Post
    To put things into prospective i made these. The BD modules are suppose to perform ~80-90% of a dual core cpu lets say a deneb based , so a dual module BD will come close to a deneb quad core and a quad module based BD will come close to a 8 core based deneb. Now as things stand a x5550 ~ x4 2435, this means AMD has to give 6 core to achieve what intel does with 4. In BD terms, you will need a three bulldozer modules to come close to a Nehalem based cpu but what we have seen with westmere "hex core" is great and i expect sandy bridge to only improve in performance. So much so that i expect only a four module bulldozer maybe able to compete with a four core sandy bridge.





    If Intel releases a 6 core based sandy bridge, AMD will have to release a 6 module based cpu IMO.
    If they do that Bulldozer will be one big die. 6 Modules means 12 physical cores.
    My rig the Kill-Jacker

    CPU: AMD Phenom II 1055T 3.82GHz
    Mobo: ASUS Crosshair IV Extreme
    Game GPU: EVGA GTX580
    Secondary GPU 2: EVGA GTX470
    Memory: Mushkin DDR3 1600 Ridgeback 8GB
    PSU: Silverstone SST-ST1000-P
    HDD: WD 250GB Blue 7200RPM
    HDD2: WD 1TB Blue 7200RPM
    CPU Cooler: TRUE120 Rev. B Pull
    Case: Antec 1200


    FAH Tracker V2 Project Site

  19. #219
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    I've read some page ago, that the BD will able to do micro op and macro op in same time. This is a totaly different from macro/micro ops fusion like said before. This feature is confirmed ?

    AMD said in the past they are looking for doing "better cores", i don't think their goal in BD is to do less in single thread. I think that AMD want to use shared modules to do better in single threads, by using the part of the second core 'sleeping' and add less transistor for one core. MP speed up should be lower than previous CPU cores if i'm right, but the speed of the whole ship should be way faster in common use. will depend on the yields.

  20. #220
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by -Boris- View Post
    So you say two pipelines despite AMDs charts saying otherwise, and assume a higher frequency, maybe doublepumped pipes to compensate?

    EDIT:
    Still don't get it. Your theory about 2 Pipes seems to be based on charts of performance hinting similar performance per thread. But when you get information about higher integer performance you start talking about higher frequencies instead of the four pipes.

    That performance chart could be based on a 4 module 8 core interalgos, we still don't know if the 16 core part will be 2011 or later.
    2 ALU pipes and 2 AGU pipes make 4, just that they are dedicated and not shared.

    There are just more signs pointing to 2 ALU pipes + 2 AGU pipes than 4 ALU/AGU pipes.

    And the high frequency hypothesis (which is not just a try to explain 2+2) is supported by:

    • different patents (not older than 2005), which describe
      • units designed for high frequency
      • local clock generators with dividers like 0.5
      • an integer execution unit designed for high frequency containing ALU0, AGU0, ALU1, AGU1
      • a simplified, recycling AGU (for higher frequency operation) with just one adder
    • a recent AMD paper (there are only a few in total) about low power low delay adder designs
    • the thought, that with increasing leakage, additional, more often idle units, which use area, might be less efficient than less, but more saturated ones
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  21. #221
    Xtreme Member
    Join Date
    Oct 2008
    Location
    Colorado
    Posts
    312
    Dres can you explain how they have the module system set up now and with relation to cores and threads? Im a little confused on whether its still like what anand had a while back or if they changed the 1 module 2 cores idea.
    My rig the Kill-Jacker

    CPU: AMD Phenom II 1055T 3.82GHz
    Mobo: ASUS Crosshair IV Extreme
    Game GPU: EVGA GTX580
    Secondary GPU 2: EVGA GTX470
    Memory: Mushkin DDR3 1600 Ridgeback 8GB
    PSU: Silverstone SST-ST1000-P
    HDD: WD 250GB Blue 7200RPM
    HDD2: WD 1TB Blue 7200RPM
    CPU Cooler: TRUE120 Rev. B Pull
    Case: Antec 1200


    FAH Tracker V2 Project Site

  22. #222
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by Hans de Vries
    Quote Originally Posted by TacoBell
    Quote Originally Posted by yfe
    Quote Originally Posted by jack
    Yes. K10 core has 3 integer ALUs, while Bulldozer "cluster" has 2 integer ALUs (Bulldozer core has 4 integer ALUs in total).
    Where did you get it from? Don't you mix it with Bobcat?
    On Bulldozer pic, four pipelines/core are shown only, w/o any details.
    Ars claims 4 pipes = 2 ALU + load + store
    The patents do show this and it corresponds with AMD's Bobcat diagram.
    Bulldozer's diagram however shows 4 integer ALU's and no load & store.

    This indicates the load and store address calculations are probably
    done by the ALU's. (It already a while that the 3 operand adds aren't
    used anymore in the address calculation)


    Regards, Hans
    link:http://aceshardware.freeforums.org/p...57.html#p12357
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  23. #223
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by =SOC= Admiral View Post
    Dres can you explain how they have the module system set up now and with relation to cores and threads? Im a little confused on whether its still like what anand had a while back or if they changed the 1 module 2 cores idea.
    As has been told us in November a module contains two integer cores and the components shared between them. In the patents these cores were named "cluster" and the module was the "core". Each integer core (cluster) can execute one program thread. So the whole module can execute two of them.

    Such a module (maybe called so in relation to M-SPACE and their modular design philosophy) is the smallest compute unit to be used in processor designs.

    However marketing wise it just might be harder to sell 4 cores (for 8 threads) at e.g. 3 GHz than to sell 8 cores (again for 8 threads). Even if one of the former type of cores would be more powerful than any other x86 core on the market by then.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  24. #224
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by vietthanhpro View Post
    This is even supported by the patents describing an ALU and AGU as being the same unit (see my quote above).

    Wireloop's variant (from one of his comments on my blog):
    Pipe 0 -> multiplier, simple ops (add, subtract, logical)
    Pipe 1 -> AGU-like, barrel shifter, branch (both direct & indirect), simple ops
    Pipe 2 -> ABM, simple ops
    Pipe 3 -> AGU-like, barrel shifter, branch (both types too), simple ops
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  25. #225
    Xtreme Member
    Join Date
    Oct 2008
    Location
    Colorado
    Posts
    312
    Quote Originally Posted by Dresdenboy View Post
    As has been told us in November a module contains two integer cores and the components shared between them. In the patents these cores were named "cluster" and the module was the "core". Each integer core (cluster) can execute one program thread. So the whole module can execute two of them.

    Such a module (maybe called so in relation to M-SPACE and their modular design philosophy) is the smallest compute unit to be used in processor designs.

    However marketing wise it just might be harder to sell 4 cores (for 8 threads) at e.g. 3 GHz than to sell 8 cores (again for 8 threads). Even if one of the former type of cores would be more powerful than any other x86 core on the market by then.
    Thanks for the info.
    My rig the Kill-Jacker

    CPU: AMD Phenom II 1055T 3.82GHz
    Mobo: ASUS Crosshair IV Extreme
    Game GPU: EVGA GTX580
    Secondary GPU 2: EVGA GTX470
    Memory: Mushkin DDR3 1600 Ridgeback 8GB
    PSU: Silverstone SST-ST1000-P
    HDD: WD 250GB Blue 7200RPM
    HDD2: WD 1TB Blue 7200RPM
    CPU Cooler: TRUE120 Rev. B Pull
    Case: Antec 1200


    FAH Tracker V2 Project Site

Page 9 of 11 FirstFirst ... 67891011 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •