+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 25 of 65

Thread: AMD sheds light on Bulldozer

  1. #1
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts

    AMD sheds light on Bulldozer

    The TechReport

    Incidentally, AMD put those recent 128-bit rumors to rest by saying Bulldozer's floating-point multiply and accumulate (FMAC) units will be able to process two 64-bit double-precision or four 32-bit single-precision operations simultaneously, but not single, 128-bit operations.
    Are WCG WUs double or single precision? Are we looking at 2 WUs per core or 4 or am I not understanding this? The WUs are 32 bit right? Even with 64bit Boinc and OS.
    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  2. #2
    Xtreme Cruncher Chumbucket843's Avatar
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Thanks
    0
    Thanked 0 Times in 0 Posts
    no, thats just sse. 128 bit refers to the registers so on a single precision work unit bulldozer will work on 4 values in parallel like a gpu. sandy bridge will have 256 bit avx so it will be able to process 8 simultaneously and larrabee will have 512 bit so it will be able to process 16 values. each core has its own simd unit so if bulldozer is 8 cores then theoretically it will be as fast as sandy bridge clock for clock. i am pretty sure most projects are single precision. if cmt rumors are true then 2 wu per core for bulldozer.

  3. #3
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts
    OK still not understanding.

    Article says 4 32 bit single precision per core and you say 4 values in parallel for single-precision but end it saying 2 WUs per core. Try dumbing down your explanation.

    2 per core is cool and would make me happy, if affordable but trying to understand.

    Thanks!
    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  4. #4
    BANNED!!!!!!!!!!! lkiller123's Avatar
    Join Date
    Dec 2008
    Location
    Los Angeles/Hong Kong
    Posts
    3,037
    Thanks
    20
    Thanked 2 Times in 2 Posts
    Well, it still won't happen until 2011

  5. #5
    Xtreme Cruncher informal's Avatar
    Join Date
    Jun 2006
    Posts
    6,000
    Thanks
    204
    Thanked 88 Times in 58 Posts
    Quote Originally Posted by Chumbucket843 View Post
    no, thats just sse. 128 bit refers to the registers so on a single precision work unit bulldozer will work on 4 values in parallel like a gpu. sandy bridge will have 256 bit avx so it will be able to process 8 simultaneously and larrabee will have 512 bit so it will be able to process 16 values. each core has its own simd unit so if bulldozer is 8 cores then theoretically it will be as fast as sandy bridge clock for clock. i am pretty sure most projects are single precision. if cmt rumors are true then 2 wu per core for bulldozer.
    Let's see: per "core"(the thing in my avatar) and in SIMD workloads, a BD core will be theoretically as potent as Sandy Bridge per clock,meaning it will be 256b AVX compatible .That's the thing with ISA extension set compatibility-you have to make it work 100% according to specs,and the specs are what SandyB. is having. The BD cores will have a FPU/SIMD unit capable of 1x256 or 2x128 (or 1x128b and other combos) mode operations.This is to cut power draw and optimize performance at the same time(since the 2 threads will be shared among the 2 int clusters and extract maximum ILP from SIMD code-similar to what intel is doing with Nehalem and future SandyB.;difference is the additional integer sharing hardware in AMD's approach).

    @PoppaGeek

    In blunt terms,theoretically the SIMD execution potential of one bulldzozer core will be 2x of the one based on Deneb(256b simd support Vs 128b one in Deneb). You can even see 2 128b threads running along the depicted BD core in my avatar .
    Last edited by informal; 11-11-2009 at 04:11 PM.

  6. #6
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts
    Quote Originally Posted by lkiller123 View Post
    Well, it still won't happen until 2011
    Well I'll have Thuban to keep me warm and happy until then.

    Thanks for explanation Informal and chumbucket.
    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  7. #7
    Xtreme Cruncher informal's Avatar
    Join Date
    Jun 2006
    Posts
    6,000
    Thanks
    204
    Thanked 88 Times in 58 Posts
    No problem .
    One small addition,a picture is worth a thousand words :


    Take notice of couple of things: Marketed 8 core(octal) BD based CPUs will have 4 modules each consisting of 2 integer clusters(or better worded: cores) capable of (naturally) 2 hardware threads and one 256b FP/SIMD unit capable of maximum 2 (but also only 1 if need be) simd threads! The efficiency of the SIMD unit will probably be even better than a hypothetical Deneb core with 256b wide SIMD unit since that same Deneb design would not be having dual threads extracting maximum ILP from SIMD code,like Bulldozer module will have .
    The 4 module models will be marketed as 8 core chips since in essence they will be 8 core chips-and 8 threads will be present in Task Manager . This is somewhat similar to what we have with Nehalem but in Nehalem intel is not resource sharing the execution units the way AMD will be doing(AMD way: dedicated parts for some things like integer cores that can communicate ;one dual thread SIMD unit;shared front end and back end)
    Last edited by informal; 11-11-2009 at 04:37 PM.

  8. #8
    Xtreme Cruncher Otis11's Avatar
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    4,803
    Thanks
    102
    Thanked 36 Times in 30 Posts
    Sweet! So this is gonna be good?!?

    So, it has 8 threads on 8 cores? or is it 8 threads on 2 "cores"?

    Is it just me or are they starting to blur the line between cores and threads...

  9. #9
    A thing of beauty is a joy forever! Movieman's Avatar
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    33,006
    Thanks
    78
    Thanked 384 Times in 145 Posts
    Quote Originally Posted by PoppaGeek View Post
    OK still not understanding.

    Article says 4 32 bit single precision per core and you say 4 values in parallel for single-precision but end it saying 2 WUs per core. Try dumbing down your explanation.

    2 per core is cool and would make me happy, if affordable but trying to understand.

    Thanks!
    Poppa, I'm with you. These kids are WAY too smart for me.
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  10. #10
    BANNED!!!!!!!!!!! lkiller123's Avatar
    Join Date
    Dec 2008
    Location
    Los Angeles/Hong Kong
    Posts
    3,037
    Thanks
    20
    Thanked 2 Times in 2 Posts
    I am already confused by my homeworks, don't make it worse

  11. #11
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts
    Quote Originally Posted by Movieman View Post
    Poppa, I'm with you. These kids are WAY too smart for me.
    I wasn't too sure they were speaking English.

    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  12. #12
    Xtreme Cruncher xVeinx's Avatar
    Join Date
    Jul 2006
    Posts
    1,326
    Thanks
    3
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Otis11 View Post
    Sweet! So this is gonna be good?!?

    So, it has 8 threads on 8 cores? or is it 8 threads on 2 "cores"?

    Is it just me or are they starting to blur the line between cores and threads...
    Threads would be a software term, core hardware (well, mostly). They are changing the structure of what we have traditionally conceived of as a core, sort of like the change to the GPU and the advent of stream processors. I don't think the terms are being blurred, but the mention of threads may become more prominent simply because it's easier for most to comprehend rather than the complex interaction between the newer modular form of the cores.

  13. #13
    c[_] STEvil's Avatar
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    20,201
    Thanks
    20
    Thanked 83 Times in 49 Posts
    A little bit of tweaking and you could use this as a form of speedup for single threaded apps.
    Heatware || 01.01.08; AND 111.2%

    Dead Z7S-WS? Click!. || XS Z7S-WS Thread || Current Dead Asus Z7S-WS count: 26+ ($15,000 in dead motherboards).
    All along the watchtower the watchmen watch the eternal return.
    Want to use my Anti-asus logo? Go ahead, but use this link please!: http://i853.photobucket.com/albums/a...sus/noasus.gif
    Bring back the game. http://reclaimyourgame.com/. EA are mean.

  14. #14
    Xtreme Cruncher rcofell's Avatar
    Join Date
    Apr 2005
    Location
    TX, USA
    Posts
    913
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Otis11 View Post
    Sweet! So this is gonna be good?!?

    So, it has 8 threads on 8 cores? or is it 8 threads on 2 "cores"?

    Is it just me or are they starting to blur the line between cores and threads...
    Imho I'd say 8threads on 8cores (in this section's context), since each hardware thread has a respective dedicated integer "core", which is essentially the basic requirement for having a thread; the FP unit could be viewed as just being an extension for each of these threads, since it can't independently support a thread on its own (no control instructions for instance).

    Overall, it's certainly a novel concept that challenges the perception of what's a core, and the term module does more accurately represent the situation, although I'd still prefer the term core in some contexts :X

    From my perspective it seems like it hinges on how tightly you couple the software concept of a thread to a discrete unit of hardware; you could say in SMT's case there isn't really a discrete unit of hardware for each thread, as it's all shared, but in this case each thread has its own INT unit, while the FP unit is shared between the two.

    Anyways, it's mostly semantics... it is what it is



  15. #15
    Xtreme Member Enoc's Avatar
    Join Date
    May 2006
    Location
    Dominican Republic (Caribbean)
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    this is from another forum , a member(oliverda) asking JF-AMD(John Fruehe, works in AMD server division)

    Quote Originally Posted by Oliverda
    So, if I'm not mistaken the quad, six or octal core (in the usual sense, so X4, X6, X8) Bulldozer CPUs will contain four, six, or eight Bulldozer module and the OS will detect these as octal, twelve or sixteen core CPUs. Am I right?
    response.

    Quote Originally Posted by JF-AMD
    No.

    Interlagos:
    12-core = 6 bulldozer modules
    16-core = 8 bulldozer modules

    OS will see 12 and 16 core respectively.

    The Bulldozer module is a logical way to group components and allow for better power efficiency and a more modular scalable path.

  16. #16
    Xtreme Addict tbone8ty's Avatar
    Join Date
    Feb 2007
    Location
    West hartford, CT
    Posts
    2,408
    Thanks
    102
    Thanked 34 Times in 30 Posts
    maybe a Deneb picture along side the Bulldozer pic would help?
    FX-8350(1249PGT) @ 4.6ghz 1.44v, lapped Havik 140 push/pull
    Asus Crosshair Formula 5 Am3+ bios v1703
    G.skill Trident X (2x4gb) ~1200mhz @ 10-12-12-31-45-1T @ 1.65v
    MSI 7950 TwinFrozr *960/1250* Cat.13.1
    OCZ ZX 850w psu
    Lain-Li Lancool K62
    Samsung 830 128g
    2 x 1TB Samsung SpinpointF3
    Win7 Home 64bit
    http://farm7.staticflickr.com/6042/6...39baeed6_b.jpg

  17. #17
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts
    Quote Originally Posted by tbone8ty View Post
    maybe a Deneb picture along side the Bulldozer pic would help?
    Probably not. All I asked was will it do 1 or 2 or 4 WUs per core. Not sure I have my answer yet. But if you have a pic of a Bulldozer CPU from ADM I am sure many here would like to see it.
    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  18. #18
    c[_] STEvil's Avatar
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    20,201
    Thanks
    20
    Thanked 83 Times in 49 Posts
    1 WU per thread
    Heatware || 01.01.08; AND 111.2%

    Dead Z7S-WS? Click!. || XS Z7S-WS Thread || Current Dead Asus Z7S-WS count: 26+ ($15,000 in dead motherboards).
    All along the watchtower the watchmen watch the eternal return.
    Want to use my Anti-asus logo? Go ahead, but use this link please!: http://i853.photobucket.com/albums/a...sus/noasus.gif
    Bring back the game. http://reclaimyourgame.com/. EA are mean.

  19. #19
    BANNED!!!!!!!!!!! lkiller123's Avatar
    Join Date
    Dec 2008
    Location
    Los Angeles/Hong Kong
    Posts
    3,037
    Thanks
    20
    Thanked 2 Times in 2 Posts
    Wonder how much it will cost, hope to be in the sub-1k market.

  20. #20
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts
    I think I may kill myself.

    How many WUs per core?
    Not thread, not unit, module or Integer Unit or FMAC or anything else.

    1 core = how many WUs run?

    I may be stupid but this is a simple question or at least should be. A simple answer would be nice.

    If it is a desktop CPU then AMD cannot sell it too high else people will just buy Intel. It either has to be faster or cheaper. I hope Thuban is less than $300. AMD goes higher than that for a CPU it will be out of my price range.
    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  21. #21
    Xtreme Cruncher Otis11's Avatar
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    4,803
    Thanks
    102
    Thanked 36 Times in 30 Posts
    Based on my limited knowledge I would say 1 WU per "core" since AMD does not have HT...

    But there are 2 "cores" per bulldozer unit if I understand correctly?

  22. #22
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts
    One small addition,a picture is worth a thousand words :


    Bulldozer's floating-point multiply and accumulate (FMAC) units will be able to process two 64-bit double-precision or four 32-bit single-precision operations simultaneously, but not single, 128-bit operations.
    Let me try again using Informals pic.

    In the pic there are 2 FMAC units. Article says, each FMAC unit can process 2 64-bit double-precision or 4 32-bit single-precision operations simultaneously. I stated I thought WUs were 32 bit single-precision. Right? So does that mean each FMAC unit would do 4 WUs? In the pic it looks like each core has one FMAC unit. I do not care about modules or threads. I do not care about games, benchmark programs or whatever else. WCG WUs.

    How many WUs from WCG will run on one core? 1 core= 1 FMAC unit. The pic shows core 1 and core 2 and 2 FMAC units, one each core. 1 FMAC unit will run 4 32-bit precision operation.

    I dunno if it is just the change in architecture, terminology or a lack of people reading the first post. I think Informal and Enoc may have answered this but I am not understanding so I appreciate their patience.
    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  23. #23
    Xtreme Cruncher Otis11's Avatar
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    4,803
    Thanks
    102
    Thanked 36 Times in 30 Posts
    Quote Originally Posted by PoppaGeek View Post
    One small addition,a picture is worth a thousand words :




    Let me try again using Informals pic.

    In the pic there are 2 FMAC units. Article says, each FMAC unit can process 2 64-bit double-precision or 4 32-bit single-precision operations simultaneously. I stated I thought WUs were 32 bit single-precision. Right? So does that mean each FMAC unit would do 4 WUs? In the pic it looks like each core has one FMAC unit. I do not care about modules or threads. I do not care about games, benchmark programs or whatever else. WCG WUs.

    How many WUs from WCG will run on one core? 1 core= 1 FMAC unit. The pic shows core 1 and core 2 and 2 FMAC units, one each core. 1 FMAC unit will run 4 32-bit precision operation.

    I dunno if it is just the change in architecture, terminology or a lack of people reading the first post. I think Informal and Enoc may have answered this but I am not understanding so I appreciate their patience.

    Ah! Well for me it was just not understanding the question due to the new arch ad terminology... Honestly, this is nothing more than a guess, but looking at it, it seems to me that it could send a WU through each of those pipelines as long as they total less than 128 bits (which they do).

    If so, this could be a beast of a cruncher... :drool:

    Confirmation or refutation?

  24. #24
    Xtreme Cruncher PoppaGeek's Avatar
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    3,276
    Thanks
    11
    Thanked 13 Times in 7 Posts
    Quote Originally Posted by Otis11 View Post
    Ah! Well for me it was just not understanding the question due to the new arch ad terminology... Honestly, this is nothing more than a guess, but looking at it, it seems to me that it could send a WU through each of those pipelines as long as they total less than 128 bits (which they do).

    If so, this could be a beast of a cruncher... :drool:

    Confirmation or refutation?
    Yeah that is why I am trying to understand it. From the way I read it 4 WUs per core.
    Whatever you do will be insignificant, but it is very important that you do it.
    Mahatma Gandhi

  25. #25
    Xtreme Enthusiast poke349's Avatar
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    680
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Correct me if I'm wrong, but here's what I'm getting.

    They're saying that the each core gets 2 threads. (like HT)

    But unlike Intel, each core gets two integer units, and one FPU. Whereas each Intel core only has one of each.

    So running integer heavy multi-theaded code will theoretically double the speed since each thread gets its own integer unit. But since both threads on a core share the same FPU, floating point won't be much faster.

    But then when you factor in AVX, floating-point throughput will double.


    I'm not too sure about the integer SSE unit... Is that also part of the integer unit? So will it also double in speed? Does that mean 4x faster when you add in AVX?
    The first efficient and scalable Multi-threaded Pi Benchmark: http://www.xtremesystems.org/forums/...d.php?t=221773

    Quad Monitor setup for Work, Gaming, Coding: (A lanbox - small enough to hand-carry on a plane.)
    Intel Core i7 920 D0 @ 3.34 GHz (3.5 GHz Turbo Boost) - (167 x 20/21) ----- Cooler Master Hyper N520 with 2 x Delta FFB0912SH-F00 92mm Fans ----- 12 GB DDR3 1333 MHz @ 1336 MHz (167 x 8)
    Asus Rampage II GENE (Micro-ATX) --- XFX Geforce 9800 GTX+ ----- EVGA GeForce 275 ----- 1.5 TB Seagate (boot + data) --- 3 x 23in. 1080p monitors


    Fileserver and Primary Code-Testing Rig: (will probably become primary desktop in the future...)
    Intel Core i7 2600K @ 4.6 GHz (4.7 GHz prime/LinX stable) --- Corsair H50 with Scythe SY1225SL12SH "Slipstream" 120mm Fan --- 16 GB DDR3 @ 1333 MHz
    Asus P8P67 Pro --- PNY GeForce GTS 250 --- 1.5 TB Seagate (boot + data) --- 4 x 1 TB Seagate (code-testing) --- 3 TB Hitachi (swap space) --- >6 TB of externals


    Miscellaneous Workstations for Code-Testing:
    4 x AMD Opteron 8356 Barcelona @ 2.31 GHz (not OC'able) --- 8 GB DDR2 ECC Registered @ 533 MHz --- Tyan Thunder S4985 --- 80 GB Seagate --- Donated by skycrane
    2 x Intel Xeon X5482 Harpertown @ 3.2 GHz (not OC'able) --- 64 GB DDR2 FB-DIMM @ 800 MHz --- SuperMicro MBD-X7DWN+O --- Zotac GT218-ION --- 64 GB SSD --- 2 TB Seagate LP --- 8 x 2 TB Hitachi

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts