Page 28 of 39 FirstFirst ... 182526272829303138 ... LastLast
Results 676 to 700 of 954

Thread: AMD's Bobcat and Bulldozer

  1. #676
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Motiv View Post
    Being an absolute noob, could someone explain this to me.

    How many pipelines are on the P2 (x4 for arguments sake), how do they feed the ALU & AGU normally.

    To me it looks like bulldozer has cut down by 1 ALU&AGU per 'core'.
    Agner Fog's microarchitecture.pdf is a good place to start.It has a part where it tries to identify the bottlenecks in every major x86 design today,so there is 10h(or wrongly called K10). Essentially 10h can in theory do a massive of 9(nine) "micro ops"* but retire only 3 "macro ops"** . There is a bottleneck in the retirement part of the design(but the utilization of 9 units can't be effectively measured in real world as the document says;it is clear that some of the time exec. units are underutilized ,especially 3rd AGU which is redundant due to 2 ports to L1D cache).

    *macro op is split into these micro instructions and then sent to execution units
    **macro op is an instruction the decoder deals with;1 x86 instruction typically = 1 or 2 macro ops

    edit:
    continued on to Bulldozer
    Front end can take up 4 x86 instructions(can't tell what is the relation to the RISC like macro ops in 10h decoder stage) and dispatch it in 2 groups of 4(macro ops?). Each integer core can do 4 instructions (2 arithmetic and 2 address,but the Agen unit can maybe do some math work too ). Still a lot is unknown so we can't say what else is in there and how AMD organized it.At least not until launch .
    Last edited by informal; 08-30-2010 at 05:17 PM.

  2. #677
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by nn_step View Post
    Correction

    while(!interrupted)
    {
    cout << "terrace215 post: " << random(bull_sh1t_reason) << endl;

    cin >> wait_responses;

    if (wait_responses = true) {

    cout << "terrace215 post: " << random(spout_more_sh1t) << endl;

    }

    }
    fixed...although might not work well if it was a real program ;p.
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  3. #678
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by god_43 View Post
    fixed...although might not work well if it was a real program ;p.
    wait for response isn't required, since it is apparent that such activity doesn't actually exist. [At least in most of the posts made]
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  4. #679
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Hong Kong
    Posts
    526
    Quote Originally Posted by Motiv View Post
    Being an absolute noob, could someone explain this to me.

    How many pipelines are on the P2 (x4 for arguments sake), how do they feed the ALU & AGU normally.

    To me it looks like bulldozer has cut down by 1 ALU&AGU per 'core'.
    http://www.xbitlabs.com/articles/cpu...0_6.html#sect0

    Upon the availability of data, the scheduler may issue one integer operation to ALU and one address operation to AGU from each queue. There can be maximum two simultaneous memory requests. So, up to 3 integer operations and 2 memory operations (64-bit read/write in any combination) may be issue for execution per clock. Micro-operations from various arithmetic MOPs are issued for execution from their queues in an out-of-order manner, depending on the readiness of the data.

  5. #680
    Xtreme Member
    Join Date
    Jan 2006
    Location
    Los Angeles, CA
    Posts
    172
    isn't the bulldozer going to be released in 2nd quarter 2011 when the sandybridge 8 core arrives to do battle?
    Last edited by jtdigital; 08-30-2010 at 06:46 PM.

  6. #681
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    page 251 of: http://support.amd.com/us/Processor_TechDocs/25112.PDF
    A.3 Superscalar Processor

    The AMD Athlon 64 and AMD Opteron processors are aggressive, out-of-order, three-way
    superscalar AMD64 processors. They can fetch, decode, and issue up to three AMD64 instructions
    per cycle with a centralized instruction control unit (ICU) and two independent instruction
    schedulers—an integer scheduler and a floating-point scheduler. These two schedulers can
    simultaneously issue up to nine micro-ops to the three general-purpose integer execution units
    (ALUs), three address-generation units (AGUs), and three floating-point execution units
    . The
    processors move integer instructions down the integer execution pipeline, which consists of the
    integer scheduler and the ALUs, as shown in Figure 6 on page 252. Floating-point instructions are
    handled by the floating-point execution pipeline, which consists of the floating-point scheduler and
    the floating-point execution units.
    or alternatively:

    http://www.chip-architect.com/news/2...Core.html#1.20


    But don't forget that the average number of ALU instructions is something like 0.4/cycle
    which is 4, 5 times less as two ALUs can provide.


    Regards, Hans
    Last edited by Hans de Vries; 08-30-2010 at 07:06 PM.

  7. #682
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    Quote Originally Posted by Hans de Vries View Post
    forever{
    terrace215 post: IPC decreases, because .....
    terrace215 post: IPC decreases, says .... of AMD
    terrace215 post: IPC decreases, according to AMD's presentation.
    terrace215 post: IPC decreases, don't trust marketing guys.
    terrace215 post: IPC decreases, because of the 2 ALUs..
    terrace215 post: IPC decreases, the marketing guy isn't talking about IPC
    terrace215 post: IPC decreases, because of the 16KB caches
    terrace215 post: IPC decreases, AMD has given up improving IPC.
    terrace215 post: IPC decreases, The AMD architect says it decreases by 5%
    terrace215 post: IPC decreases, Bulldozer is only optimized for server workloads.
    terrace215 post: IPC decreases, AMD presentation sheet no.X tells us so.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans
    You've summed it up better than we all could.

  8. #683
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by nn_step View Post
    wait for response isn't required, since it is apparent that such activity doesn't actually exist. [At least in most of the posts made]
    lool thats true..i have failed. cast out of troll programming school : (.


    on topic. yeah its supposed to be 2011 q2, should be fun!
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  9. #684
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by god_43 View Post
    lool thats true..i have failed. cast out of troll programming school : (.


    on topic. yeah its supposed to be 2011 q2, should be fun!
    on topic, it is 2011. that is all that has ever been said.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  10. #685
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Guys ... David has posted a terrific summary of Bulldozer ... http://www.realworldtech.com/page.cf...2610181333&p=1
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  11. #686
    Xtreme Member
    Join Date
    Feb 2010
    Posts
    138
    Quote Originally Posted by Movieman View Post
    Hello Hans.
    .... Oh, forgot,no one drives me nuts. I just smile, grab my hammer and hit them upside the head so hard their grandchildren will walk with a 15degree list.


    I honestly am waiting like thousands more to get a sneak peak... :P I don't live in that city which i mentioned earlier with the cave running the (early sample) hardware in question... or i'd have had done anything, akin to indiana jones (which is lamo me thinks) to get to the 16 core unobtanium-optronium! :P

  12. #687
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Hornet331 View Post
    Yes one module is capable of 4 agu/alu ops, but that requiers 2 threads. For single thread your down to 2/2, but with a fornt-end thats much more capable then that of K8



    Your making a mistake, your talking about relative performance, which is not related to IPC at all or is only one part of the equation.
    No, a BD Core has 2 ALUs AND 2 AGUs available. 2+2=4. A Phenom II has 3 ALUs OR 3 AGUs. 6/2 = 3.



    EDIT:
    And PLEASE, can we dedicate this thread to Bulldozer and not forum moderation? I too welcome the ban, but I'm sure we got enough criticism and back-patting here. There are other places we can continue doing that.
    Last edited by -Boris-; 08-30-2010 at 11:05 PM.

  13. #688
    NooB MOD
    Join Date
    Jan 2006
    Location
    South Africa
    Posts
    5,799
    I heard via the grapevine that it'll be 1H2011 for server, 4Q2011 for desktop
    Xtreme SUPERCOMPUTER
    Nov 1 - Nov 8 Join Now!


    Quote Originally Posted by Jowy Atreides View Post
    Intel is about to get athlon'd
    Athlon64 3700+ KACAE 0605APAW @ 3455MHz 314x11 1.92v/Vapochill || Core 2 Duo E8500 Q807 @ 6060MHz 638x9.5 1.95v LN2 @ -120'c || Athlon64 FX-55 CABCE 0516WPMW @ 3916MHz 261x15 1.802v/LN2 @ -40c || DFI LP UT CFX3200-DR || DFI LP UT NF4 SLI-DR || DFI LP UT NF4 Ultra D || Sapphire X1950XT || 2x256MB Kingston HyperX BH-5 @ 290MHz 2-2-2-5 3.94v || 2x256MB G.Skill TCCD @ 350MHz 3-4-4-8 3.1v || 2x256MB Kingston HyperX BH-5 @ 294MHz 2-2-2-5 3.94v

  14. #689
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by JumpingJack View Post
    Guys ... David has posted a terrific summary of Bulldozer ... http://www.realworldtech.com/page.cf...2610181333&p=1
    nice article by David and thanks for the heads up...
    Coming Soon

  15. #690
    Xtreme Member
    Join Date
    Sep 2007
    Posts
    380
    Quote Originally Posted by Oj101 View Post
    I heard via the grapevine that it'll be 1H2011 for server, 4Q2011 for desktop
    come on AMD make it the other way round :P

  16. #691
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by geo View Post
    come on AMD make it the other way round :P
    Might see a Bulldozer FX at the same time as the server chips.

  17. #692
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by JumpingJack View Post
    Guys ... David has posted a terrific summary of Bulldozer ... http://www.realworldtech.com/page.cf...2610181333&p=1
    Thanks,that is a great article.

  18. #693
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by -Boris- View Post
    No, a BD Core has 2 ALUs AND 2 AGUs available. 2+2=4. A Phenom II has 3 ALUs OR 3 AGUs. 6/2 = 3.

    ..
    K10 has 3 ALUs and 3 AGUs. No matter how hard you and others try to downplay K10 execution resources, fact is, a K10 integer core has more resources than a BD integer core.

    The docs linked by Hans are pretty clear.

    http://www.xtremesystems.org/forums/...&postcount=681
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  19. #694
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Bobcat is 2way(2ALU+2AGU) design,has 90% of Propus and is a low power design with solid perfromance.One can expect Bulldozer core to stump over Bobcat core but both have less ALUs/AGUs than 10h. Number of units means nothing if you can't effectively use them and you know that.The number of core level changes is pretty big,from L/S improvements,prefetch,BP,shared L2 etc..As Anand wrote(info from AMD) ,per core performance will be better than 10h.

  20. #695
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by savantu View Post
    K10 has 3 ALUs and 3 AGUs. No matter how hard you and others try to downplay K10 execution resources, fact is, a K10 integer core has more resources than a BD integer core.

    The docs linked by Hans are pretty clear.

    http://www.xtremesystems.org/forums/...&postcount=681
    No, you are wrong. Old architecture has shared resources, new architecture has dedicated resources.

    A BD integer core will do more IPC and perform single threads faster than an old core.

    Why do you keep saying these things even though I have posted the information in multiple places?
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  21. #696
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    Quote Originally Posted by JF-AMD View Post
    No, you are wrong. Old architecture has shared resources, new architecture has dedicated resources.

    A BD integer core will do more IPC and perform single threads faster than an old core.

    Why do you keep saying these things even though I have posted the information in multiple places?
    JF...have you personally seen running BD chips yet or whatever the server variant is called? Just wondering how you're so sure if you haven't bench tested one yet.

    Does anyone here know when BD compatible socket motherboards will go on sale?
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  22. #697
    Xtreme Mentor
    Join Date
    Nov 2006
    Location
    Spain, EU
    Posts
    2,949
    Quote Originally Posted by JF-AMD View Post
    No, you are wrong. Old architecture has shared resources, new architecture has dedicated resources.
    He's right. K10 has more resources, shared or not.
    Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)


    Quote Originally Posted by PerryR, on John Fruehe (JF-AMD) View Post
    Pretty much. Plus, he's here voluntarily.

  23. #698
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by freeloader View Post
    JF...have you personally seen running BD chips yet or whatever the server variant is called? Just wondering how you're so sure if you haven't bench tested one yet.

    Does anyone here know when BD compatible socket motherboards will go on sale?
    I'm pretty sure that when you have the position JF has in a company you get pretty accurate numbers from engineering and so on. There is no need for him to sit down and bench engineering samples personally. Would be quite stupid if engineering lied about the performance in internal reviews and documents.
    You know this isn't Dilbertland right?

  24. #699
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by STaRGaZeR View Post
    He's right. K10 has more resources, shared or not.
    If you can't use it it isn't a resource. Phenom has only three integer pipes. In one of those pipes the AGU and ALU have to take turns being part of the resource pool.

  25. #700
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    10h can retire 3 macro ops.BD integer core/fp core should be able to do 4.

Page 28 of 39 FirstFirst ... 182526272829303138 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •