MMM
Results 1 to 25 of 262

Thread: Dresdenboys' blog: AMD Bulldozer - Patent based research

Hybrid View

  1. #1
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    I don't get it.
    What makes dresdenboy and everyone else believe that the 4 pipelines should be translated to 2 ALUs and 2 AGUs?
    In Athlon the 3 pipelines consists of 3 ALUs and 3 AGUs.

    What I read as 33% more integer performance, others read as 33% less.
    Explain please.

  2. #2
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by -Boris- View Post
    I don't get it.
    What makes dresdenboy and everyone else believe that the 4 pipelines should be translated to 2 ALUs and 2 AGUs?
    In Athlon the 3 pipelines consists of 3 ALUs and 3 AGUs.

    What I read as 33% more integer performance, others read as 33% less.
    Explain please.
    1 pipeline has ALU +AGU, not ALU or AGU.
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  3. #3
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by vietthanhpro View Post
    1 pipeline has ALU +AGU, not ALU or AGU.
    Exactly!
    I simply don't understand why everyone believes in dresdenboys 2 pipeline config for Bulldozer.

  4. #4
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by -Boris- View Post
    Exactly!
    I simply don't understand why everyone believes in dresdenboys 2 pipeline config for Bulldozer.
    AMD don't confirm !
    i think 3 pipelines for ALU+AGU, 1 pipeline for load/store unit.
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  5. #5
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Please calm down. These numbers are just multiple hypothesis. One is based on stuff like this:
    Within each cluster 150 , execution units 154 may support the concurrent execution of various different types of operations. For example, in one embodiment execution units 154 may support two concurrent load/store address generation (AGU) operations and two concurrent arithmetic/logic (ALU) operations, for a total of four concurrent integer operations per cluster.
    (contained in several Bulldozer related patents)

    Some of my related posts:
    2 ALU/2 AGU hyptothesis:
    http://citavia.blog.de/2009/11/13/ho...-have-7366681/
    4 ALU/4 AGU hypothesis:
    http://citavia.blog.de/2009/11/16/bu...ought-7383623/
    2 ALU/2 AGU hyptothesis by Hiroshige Goto:
    http://citavia.blog.de/2009/12/19/bu...ussed-7605288/
    2 ALU/2 AGU possible confirmation reported by Yusuke Ohara:
    http://citavia.blog.de/2010/01/11/an...japan-7737558/

    But it could also go into another direction:
    In one embodiment, the ALU 220 and the AGU 222 are implemented as the same unit.
    (also from some patents related to a BD like architecture, maybe a successor)

    And to round it up there is my hypothesis, that there are just 2 ALUs and 2 AGUs, but running at a significantly higher clock than today (maybe even as a double pumped design).

    What counts, is the resulting performance of the chip and not, if it has 4 ALUs/4 AGUs at only 2GHz. The goal is not to achieve the ultimate absolute raw power per clock, but the highest performance for different workloads inside a given TDP or ACP "envelope".
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  6. #6
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Dresdenboy View Post
    And to round it up there is my hypothesis, that there are just 2 ALUs and 2 AGUs, but running at a significantly higher clock than today (maybe even as a double pumped design).

    What counts, is the resulting performance of the chip and not, if it has 4 ALUs/4 AGUs at only 2GHz. The goal is not to achieve the ultimate absolute raw power per clock, but the highest performance for different workloads inside a given TDP or ACP "envelope".
    So you say two pipelines despite AMDs charts saying otherwise, and assume a higher frequency, maybe doublepumped pipes to compensate?

    EDIT:
    Still don't get it. Your theory about 2 Pipes seems to be based on charts of performance hinting similar performance per thread. But when you get information about higher integer performance you start talking about higher frequencies instead of the four pipes.

    That performance chart could be based on a 4 module 8 core interalgos, we still don't know if the 16 core part will be 2011 or later.
    Last edited by -Boris-; 04-19-2010 at 03:27 AM.

  7. #7
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by -Boris- View Post
    So you say two pipelines despite AMDs charts saying otherwise, and assume a higher frequency, maybe doublepumped pipes to compensate?

    EDIT:
    Still don't get it. Your theory about 2 Pipes seems to be based on charts of performance hinting similar performance per thread. But when you get information about higher integer performance you start talking about higher frequencies instead of the four pipes.

    That performance chart could be based on a 4 module 8 core interalgos, we still don't know if the 16 core part will be 2011 or later.
    2 ALU pipes and 2 AGU pipes make 4, just that they are dedicated and not shared.

    There are just more signs pointing to 2 ALU pipes + 2 AGU pipes than 4 ALU/AGU pipes.

    And the high frequency hypothesis (which is not just a try to explain 2+2) is supported by:

    • different patents (not older than 2005), which describe
      • units designed for high frequency
      • local clock generators with dividers like 0.5
      • an integer execution unit designed for high frequency containing ALU0, AGU0, ALU1, AGU1
      • a simplified, recycling AGU (for higher frequency operation) with just one adder
    • a recent AMD paper (there are only a few in total) about low power low delay adder designs
    • the thought, that with increasing leakage, additional, more often idle units, which use area, might be less efficient than less, but more saturated ones
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •