Results 1 to 25 of 4519

Thread: AMD Zambezi news, info, fans !

Hybrid View

  1. #1
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    Quote Originally Posted by TESKATLIPOKA View Post
    I will add the value 14-16 for pipeline at least until we won't know the real number.

    2x64-bit store effectively 256-bits/cycle.
    you meant 128-bits/cycle, right?
    I mean 2x64-bit store for 10h or 2x128-bit load. Because 10h can't execute AVX 256 instructions, it can load data in 128-bit chunks.
    This is 128-bits /cycle for stores, or 256-bits/cycle for loads.
    In the Bulldozer core(not module), there is 256-bit load + 128-bit store in the same time. With Bulldozer module there is double of that operations.
    Bulldozer core can calculate 2 adresses at same time because it has 2 AGU - adress generation units.

    Sandy core can do also 2 adress operations at once, because it has 2 L/S AGU. It has slightly different approach for store. SB store unit is attached to scheduler

    You are right, I wrote double the amount of LS units, I will repair It right away.

    If anything else is wrong just say it.
    Yes, per core it has 2 ALU and 2 AGU. I've made detail diagram for Bulldozer module, K10, Nehalem and of course of Sandy Bridge HT core architecture.
    Attachment 118765
    Last edited by drfedja; 08-09-2011 at 07:20 AM.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

  2. #2
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Quote Originally Posted by drfedja View Post
    I mean 2x64-bit store for 10h or 2x128-bit load. Because 10h can't execute AVX 256 instructions, it can load data in 128-bit chunks.
    This is 128-bits /cycle for stores, or 256-bits/cycle for loads.
    In the Bulldozer core(not module), there is 256-bit load + 128-bit store in the same time. With Bulldozer module there is double of that operations.
    Bulldozer core can calculate 2 adresses at same time because it has 2 AGU - adress generation units.

    Sandy core can do also 2 adress operations at once, because it has 2 L/S AGU. It has slightly different approach for store. SB store unit is attached to scheduler


    Yes, per core it has 2 ALU and 2 AGU. I've made detail diagram for Bulldozer module, K10, Nehalem and of course of Sandy Bridge HT core architecture.

    BTW you interpret the Address Generation Units as units for calculate linear addresses as well as INC/LEA values. The Optimization Guide refers them as simple integer exetution units, too (AGLU).

    Would you briefly explain what kind of operations can these units execute?

    Thanks
    -

  3. #3
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    Quote Originally Posted by Oliverda View Post
    BTW you interpret the Address Generation Units as units for calculate linear addresses as well as INC/LEA values. The Optimization Guide refers them as simple integer exetution units, too (AGLU).

    Would you briefly explain what kind of operations can these units execute?

    Thanks
    Also I'd like to know what type of instructions can execute AGLU, but what my knowledge of what can AGLU execute is based also on Optimisation Manual and my assumptions is that the AGLU can execute address calculations and LEA, and probably can execute INC. If AMD's manual says that the AGLU can execute simple ALU operations. Maybe i'm wrong for INC, but it could be possible for such unit to support some other type of instructions than CALL and LEA.
    If it can calculate adress, that unit can also execute simple ADD or INC with unsigned integer (address + offset) and some logical operations like XOR or AND. That is my speculation, because Optimisation Manual probably isn't fully written.

    Optimisation guide also refer this:
    There are four integer execution units per core. Two units which handle all arithmetic, logical and
    shift operations (EX). And two which handle address generation and simple ALU operations
    (AGLU). Figure 2 shows a block diagram for one integer cluster. There are two such integer clusters
    per compute unit.
    Optimisation manual says that the AG0|AG1 units execute LEA instruction when work with 3 operands. But with legacy 2 operand instructions LEA can be executed only at EX0|EX1 units. AG0|AG1 can execute CALL instructions, which is double op decoded. Fist op. execute on EX and secon op. execite on AGLU.
    The CALL instruction clearly transfers control to another procedure, and the RET instruction returns to the instruction following the call.
    But that isn't any big difference in comparison to K10. K10 also execute CALL instruction like double op, but on BD CALL disp, near and CALL reg, near has 50% lower latency than 10h and CALL mem (near) is hardwired - double decoded, on 10h is microcoded.

    According to Optimisation manual, main difference in BD AGU vs 10h AGU units is that the BD AGU can execute LEA, when work with three operands, and CALL is fully hardwired, with slightly lower latencies.

    Quote Originally Posted by danielkza View Post
    I'd be even more thankful if there happened to be a version in English.
    Use google translate to learn Serbian... :P
    I will translate that diagrams to English, that isn't problem, but I think it is understandable in that version. Picture is worth a thousand words! :P
    Last edited by drfedja; 08-09-2011 at 03:07 PM.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •