Page 7 of 11 FirstFirst ... 45678910 ... LastLast
Results 151 to 175 of 263

Thread: What to Expect From AMD at ISSCC 2011

  1. #151
    Xtreme Member
    Join Date
    Dec 2008
    Location
    Sweden
    Posts
    450
    Quote Originally Posted by JkS View Post
    Agreed, it's the best of both worlds when it comes to multithreading and IPC.

    If you're a guy who doesn't need more than 4 cores, the cores get the FPU and cache to themselves and thus more single threaded performance.*

    If you're a guy who likes threaded applications, you get to leverage the threaded power of the architecture for more threaded performance.

    *I'm curious to know if the Bulldozer design tries to prioritize single threaded apps to their own modules, does anyone know if this is the case?
    It could be more efficient to use one module if that means it can turbo (higher) instead of two when using a lower amount of threads. At least from a power consumption perspective it could be smarter to use one module instead of two.

    2 threads in a module at 90% IPC of seperate modules at 110% speed will be ~100% performance but power consumption should be lower.

  2. #152
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Good points marten_larsson. There are still a few things we don't know about the design,so I guess we should wait and see how the cores are handling the workloads when workload is less threaded(than the core count).

  3. #153
    Xtreme Cruncher
    Join Date
    Apr 2005
    Location
    TX, USA
    Posts
    898
    Quote Originally Posted by JkS View Post
    Agreed, it's the best of both worlds when it comes to multithreading and IPC.

    If you're a guy who doesn't need more than 4 cores, the cores get the FPU and cache to themselves and thus more single threaded performance.*

    If you're a guy who likes threaded applications, you get to leverage the threaded power of the architecture for more threaded performance.

    *I'm curious to know if the Bulldozer design tries to prioritize single threaded apps to their own modules, does anyone know if this is the case?
    I would expect that responsibility ultimately ends up at the OS thread scheduler's mercy (or use of direct thread binding). It should still pose a similar situation as Hyper-Threading/SMT, the scheduler could use whatever logical processors it wants, however it *might* not necessarily know how to optimize for the underlying micro-architectural implications, e.g. the logical to physical mapping and their resource-sharing relation.

    The more I hear about Bulldozer, the more I see it as an adaptation of the ideas inherent in the UltraSPARC T2/3 architecture, with AMD mainly emphasizing single-thread latency instead of going all out on throughput
    Last edited by rcofell; 02-24-2011 at 07:44 AM.



  4. #154
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by marten_larsson View Post
    It could be more efficient to use one module if that means it can turbo (higher) instead of two when using a lower amount of threads. At least from a power consumption perspective it could be smarter to use one module instead of two.

    2 threads in a module at 90% IPC of seperate modules at 110% speed will be ~100% performance but power consumption should be lower.
    i think i understand what you mean

    example:
    4 threads each on their own module might use 70% of the TDP and overclock 10-15%
    vs
    4 threads on 2 modules which would use about 50% of the TDP and overclock 20-25%, but have 5-10% less IPC due to shared resources.

    in the end it might actually break even or be negligible differences for either route, which is kinda good since it means no matter how messed up windows is at deciding threads, BD knows how to overclock for max performance.
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  5. #155
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Manicdan there is also a cooperative prefetch bonus when threads share same L2 cache,which in turn may increase the performance to some degree.But maybe it's already counted in the already mentioned 80% boost when module is fully loaded. The shared frontend will act similarly to intel's SMT in other beneficiary way:it will smooth out inefficient usage by filling unused integer pipelines as quickly as possible ,just as SMT functionality does on one intel Nehalem/SB core.Only in AMD's case we have a whole core to grab the data,not just a vacant port in the execution stack .That's why AMD can see 80% increase from 2 threads and intel sees 20%-25%.

  6. #156
    Xtreme Member
    Join Date
    Aug 2010
    Location
    Athens, Greece
    Posts
    116
    Quote Originally Posted by zalbard View Post
    It appears that it's been pulled.
    Found another source:



    One extra integer pipeline, then?
    Looks like a damn smart and efficient design, tbh, hope it works out well...
    If im not mistaken, DENEBs integer Execution unit has 3 integer pipes PLUS 3 Loas/Store (AGUs) pipes when Bulldozer Integer Execution Unit has 2 integer pipes PLUS 2 AGen(Address Generator).

    So we have 3 int pipes for DENEB vs 2 int pipes for each BD Integer Execution Unit.
    Intel Core i7 920@4GHz, ASUS GENE II, 3 x 4GB DDR-3 1333MHz Kingston, 2x ASUS HD6950 1G CU II, Intel SSD 320 120GB, Windows 7 Ultimate 64bit, DELL 2311HM

    AMD FX8150 vs Intel 2500K, 1080p DX-11 gaming evaluation.

  7. #157
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by Aten-Ra View Post
    If im not mistaken, DENEBs integer Execution unit has 3 integer pipes PLUS 3 Loas/Store (AGUs) pipes when Bulldozer Integer Execution Unit has 2 integer pipes PLUS 2 AGen(Address Generator).

    So we have 3 int pipes for DENEB vs 2 int pipes for each BD Integer Execution Unit.
    it's 3 ALU/AGU for K10, they are general pipes. ( the third pipes give 5% more performance said by AMD ).

    it's 2 specialised ALU, and 2 specialised AGU. so it's more faster on integer on Bulldozer. And i think this is a more efficient disign about power comsuption.

  8. #158
    Registered User
    Join Date
    Jun 2010
    Location
    Denmark
    Posts
    90

  9. #159
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Quote Originally Posted by andos View Post
    That's early! Epic!

    Edit: wait, it's not a release, it's a presentation. Or do I get it wrong?
    Last edited by zalbard; 02-24-2011 at 02:53 PM.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  10. #160
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    ok lets all hold hands and pray for just one benchmark
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  11. #161
    Xtreme Enthusiast
    Join Date
    Jan 2007
    Location
    San Antonio, TX
    Posts
    836
    Quote Originally Posted by andos View Post
    How credible would we consider cebit here?

    Ryzen 3800X @ 4.4Ghz
    MSI X570 Unify
    32GB G.Skill 3600Mhz CL14
    Sapphire Nitro Vega 64
    OCZ Gold 850W ZX Series
    Thermaltake LV10

  12. #162
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Quote Originally Posted by FlawleZ View Post
    How credible would we consider cebit here?
    That's an official statement.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  13. #163
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    That's the official Cebit page... Pretty credible if you ask me. But unveiling does not equate launching.

  14. #164
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by informal View Post
    That's the official Cebit page... Pretty credible if you ask me. But unveiling does not equate launching.
    if it is 50% then... it has to be BD? perhaps they will name them phenom III?
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  15. #165
    Xtreme Member
    Join Date
    Aug 2010
    Location
    Athens, Greece
    Posts
    116


    Quote Originally Posted by madcho View Post
    it's 3 ALU/AGU for K10, they are general pipes. ( the third pipes give 5% more performance said by AMD ).

    it's 2 specialised ALU, and 2 specialised AGU. so it's more faster on integer on Bulldozer. And i think this is a more efficient disign about power comsuption.
    Well, first the above slide is a little bit misleading for someone that dont knows the DENEB (Magny Cours) architecture.

    Secondly i believe the slide wants to show the difference of the CMP vs Bulldozers Cluster-Based Multithreading design and so the slide is not an accurate representation of Denebs (Magny Cours) Integer/fp Execution Units.

    Denebs integer execution unit has 6x pipelines, 3x ALUs (Integer) and 3x AGUs (Load/Store) and the Integer Scheduler can issue 6x MicroOP (uops) to it.

    Bulldozer integer execution unit has 4x pipelines, 2x ALUs (Integer) and 2x AGen(Address Generators) Plus a Load/Store unit (40 Load/24 Store) and the Integer Scheduler can issue 4 uops to it.

    The DENEB don’t have FP FMACs but 3x Pipelines, FADD, FMUL,FMISC
    Bulldozer FP has 4x pipelines, 2x 128-bit FMACs and 2x 128-bit MMX and one FP shared Scheduler that can issue 4x uops to the FP execution unit.
    Intel Core i7 920@4GHz, ASUS GENE II, 3 x 4GB DDR-3 1333MHz Kingston, 2x ASUS HD6950 1G CU II, Intel SSD 320 120GB, Windows 7 Ultimate 64bit, DELL 2311HM

    AMD FX8150 vs Intel 2500K, 1080p DX-11 gaming evaluation.

  16. #166
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    lol isnt the MC pic supposed to be of bobcat?
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  17. #167
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Quote Originally Posted by god_43 View Post
    if it is 50% then... it has to be BD? perhaps they will name them phenom III?
    No more Phenom, no more Athlon. Probably it will be simply called AMD FX.
    -

  18. #168
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by Aten-Ra View Post
    Denebs integer execution unit has 6x pipelines, 3x ALUs (Integer) and 3x AGUs (Load/Store) and the Integer Scheduler can issue 6x MicroOP (uops) to it.

    Bulldozer integer execution unit has 4x pipelines, 2x ALUs (Integer) and 2x AGen(Address Generators) Plus a Load/Store unit (40 Load/24 Store) and the Integer Scheduler can issue 4 uops to it.
    i think your mixing up one major thing
    deneb has 3x pipelines, but they can do EITHER, NOT BOTH at the same time.

    so 2 alu and 1 alg
    or 3 alg and 0 alu
    but not 3 and 3 at the same time

    BD is always going to do 2 alg and 2 alu per core
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  19. #169
    Registered User
    Join Date
    Nov 2010
    Posts
    4
    First of don't like the design nor do I trust the bold assumption/statement made for bulldozer here (perhaps, just have to see myself as currently still consider AMD making same mistake than Intel). Also, I think HT calculation is a bit inaccurate considering ratio is closer to 165% not 118% (and not by being intel fanboy just been testing a lot of their cores), but even this 165% sucks as it's never stable for general application code for real cores. I wonder what would similar application pressure do to AMD bulldozer like server database pressure against high clock would it uberly fail like Intel's HT or still calculate correctly.

    What ever happened to 'real cores' only strategy.

  20. #170
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    Quote Originally Posted by genetix View Post
    First of don't like the design nor do I trust the bold assumption/statement made for bulldozer here (perhaps, just have to see myself as currently still consider AMD making same mistake than Intel). Also, I think HT calculation is a bit inaccurate considering ratio is closer to 165% not 118% (and not by being intel fanboy just been testing a lot of their cores), but even this 165% sucks as it's never stable for general application code for real cores. I wonder what would similar application pressure do to AMD bulldozer like server database pressure against high clock would it uberly fail like Intel's HT or still calculate correctly.

    What ever happened to 'real cores' only strategy.
    "Real cores" still exist. Instead of using up die space for real cores, AMD has managed to almost achieve the performance of two real cores by using shared resources. Think about it. You can achieve 85 to 90% performance of two real cores with 1 module but you're reducing logic die space, heat and power consumption. All good IMHO.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  21. #171
    Registered User
    Join Date
    Nov 2010
    Posts
    4
    Quote Originally Posted by freeloader View Post
    "Real cores" still exist. Instead of using up die space for real cores, AMD has managed to almost achieve the performance of two real cores by using shared resources. Think about it. You can achieve 85 to 90% performance of two real cores with 1 module but you're reducing logic die space, heat and power consumption. All good IMHO.
    I do understand this. I just prefer solid real cores only no matter would the "virtualized" cores do clean 200% that only means so could the actual module/core. Makes big decesion when we talk about real cores if it comes up to 4 modules or clean 8 core XEON you can be pretty sure it ain't gonna be modules

    but that's an bad example I am just saying I dislike the idea how current architectures are build.

  22. #172
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by informal View Post
    Good points marten_larsson. There are still a few things we don't know about the design,so I guess we should wait and see how the cores are handling the workloads when workload is less threaded(than the core count).
    Just a reminder -- the CPU does not schedule, the OS does. Questions about how threads are distributed should be investigated at the OS level. I.e. How much has AMD worked with MS to understand the most effective, efficient scheduling?
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  23. #173
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    I think JF once said they are collaborating with MS on how BD thread scheduling will be handled per AMD's own suggestions(since they should know best as day designed the thing ).

  24. #174
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by informal View Post
    I think JF once said they are collaborating with MS on how BD thread scheduling will be handled per AMD's own suggestions(since they should know best as day designed the thing ).
    Yep. Wouldn't surprise me. Regardless if it is SMT or CMT, there is an asymmetric distribution of resources between two shared vs two physically different cores (or modules to retain the same level of meaning).

    Best performance would to distribute over module first then pair up. I am certain this is being comprehended within the OS for BD.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  25. #175
    Xtreme Member
    Join Date
    Aug 2010
    Location
    Athens, Greece
    Posts
    116
    Quote Originally Posted by Manicdan View Post
    i think your mixing up one major thing
    deneb has 3x pipelines, but they can do EITHER, NOT BOTH at the same time.

    so 2 alu and 1 alg
    or 3 alg and 0 alu
    but not 3 and 3 at the same time

    BD is always going to do 2 alg and 2 alu per core
    Yes my bad, Deneb has 3 pipes
    Intel Core i7 920@4GHz, ASUS GENE II, 3 x 4GB DDR-3 1333MHz Kingston, 2x ASUS HD6950 1G CU II, Intel SSD 320 120GB, Windows 7 Ultimate 64bit, DELL 2311HM

    AMD FX8150 vs Intel 2500K, 1080p DX-11 gaming evaluation.

Page 7 of 11 FirstFirst ... 45678910 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •