MMM
Results 1 to 25 of 262

Thread: Dresdenboys' blog: AMD Bulldozer - Patent based research

Hybrid View

  1. #1
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by Dimitriman View Post
    Well if this is in fact true and single core/thread perf. of BD could be the same or possibly lower than MC, isn't this really dangerous for Desktop performance on these chips considering single thread perf. is quite important there?

    How is BD supposed to compete with Intel's future gen. architecture with single threaded perf. lower than MC? I mostly doubt this will be true but hearing Dresdenboy comment on this possibility worries me.
    This was just an argument against the per core performance analysis. What counts is how fast your apps run with what energy consumption and at which price. Also matters become more complex because of more advanced energy management methods etc. Recently I tried to scale back the Interlagos performance advantage to 4 and 6 cores (2 and 3 modules respectively) and got performance numbers comparable to 4.8 and 3.8 GHz quad- and six-core K10. OTOH I also already heard, that BD scales really well. This would mean somewhat lower numbers for the smaller core counts.

    And then add an even more efficient turbo for the lowly threaded apps or even such ones, which don't utilize some of the units that much (e.g. FPU, caches).. We might see scalable caches, TLBs, FPUs, maybe even integer cores... A lot of such stuff has been patented. So I don't have problems to agree with informal here.

    BTW I'll be offline for a week. So don't expect updates or answers during that time, except I find some internet terminal
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  2. #2
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    I don't get it.
    What makes dresdenboy and everyone else believe that the 4 pipelines should be translated to 2 ALUs and 2 AGUs?
    In Athlon the 3 pipelines consists of 3 ALUs and 3 AGUs.

    What I read as 33% more integer performance, others read as 33% less.
    Explain please.

  3. #3
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by -Boris- View Post
    I don't get it.
    What makes dresdenboy and everyone else believe that the 4 pipelines should be translated to 2 ALUs and 2 AGUs?
    In Athlon the 3 pipelines consists of 3 ALUs and 3 AGUs.

    What I read as 33% more integer performance, others read as 33% less.
    Explain please.
    1 pipeline has ALU +AGU, not ALU or AGU.
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  4. #4
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by vietthanhpro View Post
    1 pipeline has ALU +AGU, not ALU or AGU.
    Exactly!
    I simply don't understand why everyone believes in dresdenboys 2 pipeline config for Bulldozer.

  5. #5
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by -Boris- View Post
    Exactly!
    I simply don't understand why everyone believes in dresdenboys 2 pipeline config for Bulldozer.
    AMD don't confirm !
    i think 3 pipelines for ALU+AGU, 1 pipeline for load/store unit.
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  6. #6
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Please calm down. These numbers are just multiple hypothesis. One is based on stuff like this:
    Within each cluster 150 , execution units 154 may support the concurrent execution of various different types of operations. For example, in one embodiment execution units 154 may support two concurrent load/store address generation (AGU) operations and two concurrent arithmetic/logic (ALU) operations, for a total of four concurrent integer operations per cluster.
    (contained in several Bulldozer related patents)

    Some of my related posts:
    2 ALU/2 AGU hyptothesis:
    http://citavia.blog.de/2009/11/13/ho...-have-7366681/
    4 ALU/4 AGU hypothesis:
    http://citavia.blog.de/2009/11/16/bu...ought-7383623/
    2 ALU/2 AGU hyptothesis by Hiroshige Goto:
    http://citavia.blog.de/2009/12/19/bu...ussed-7605288/
    2 ALU/2 AGU possible confirmation reported by Yusuke Ohara:
    http://citavia.blog.de/2010/01/11/an...japan-7737558/

    But it could also go into another direction:
    In one embodiment, the ALU 220 and the AGU 222 are implemented as the same unit.
    (also from some patents related to a BD like architecture, maybe a successor)

    And to round it up there is my hypothesis, that there are just 2 ALUs and 2 AGUs, but running at a significantly higher clock than today (maybe even as a double pumped design).

    What counts, is the resulting performance of the chip and not, if it has 4 ALUs/4 AGUs at only 2GHz. The goal is not to achieve the ultimate absolute raw power per clock, but the highest performance for different workloads inside a given TDP or ACP "envelope".
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  7. #7
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Dresdenboy View Post
    And to round it up there is my hypothesis, that there are just 2 ALUs and 2 AGUs, but running at a significantly higher clock than today (maybe even as a double pumped design).

    What counts, is the resulting performance of the chip and not, if it has 4 ALUs/4 AGUs at only 2GHz. The goal is not to achieve the ultimate absolute raw power per clock, but the highest performance for different workloads inside a given TDP or ACP "envelope".
    So you say two pipelines despite AMDs charts saying otherwise, and assume a higher frequency, maybe doublepumped pipes to compensate?

    EDIT:
    Still don't get it. Your theory about 2 Pipes seems to be based on charts of performance hinting similar performance per thread. But when you get information about higher integer performance you start talking about higher frequencies instead of the four pipes.

    That performance chart could be based on a 4 module 8 core interalgos, we still don't know if the 16 core part will be 2011 or later.
    Last edited by -Boris-; 04-19-2010 at 03:27 AM.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •