MMM
Page 26 of 29 FirstFirst ... 1623242526272829 LastLast
Results 626 to 650 of 719

Thread: AMD cuts to the core with 'Bulldozer' Opterons

  1. #626
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by JF-AMD View Post
    Each interlagos processor can run either 8 256-bit executions per cycle or 16 128-bit executions per cycle.

    In newer AVX-updated code, we will have the same number or executions as Intel. In older non-AVX code we will have 16 128-bit executions to their 8 128-bit executions.
    When speaking of Interlagos compared to Intel, it seems a bit silly to assume they won't avail themselves of the same trade-off that Interlagos uses to put up big throughput numbers, should they find its market compelling: taking advantage of the fact that power is non-linear with frequency, particularly at the top end. So you can drop the clock ~30%, and double the number of cores while remaining at the same power.

    Intel can produce a 12- or 16-cored low-voltage/low-power Sandy Bridge, either directly, or with the yield- and cost-effective two-die MCM approach, and thereby take advantage of this same, almost completely design-agnostic principle.

    So I think more focus on the non-MCM, baseline Zambezi/Valencia capabilities would be more likely to both illuminate the strengths and weaknesses of BD's *design* as well as let us properly gauge its true competitive position vs Intel's offerings.

    (I'll grant you that as Magny is also an MCM/(low-speed/double-core), Interlagos vs Magny changes do result from BD design features.)

  2. #627
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Well, here is how I understand it. We have 2 128-bit FMACs that fuse into a 256-bit AVX. Intel "double pumps" a 128-bit unit to get it to 256-bit (I think Hans has a die shot that shows the SB 256-bit being about the same size as the current 12-bit.)

    As it has been explained to me, the double pumping requires AVX, meaning that if the SW does not support AVX, then things execute at 128-bit only.

    Perhaps someone with some info could chime in.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  3. #628
    Xtreme Enthusiast
    Join Date
    Dec 2003
    Posts
    510
    Quote Originally Posted by Sn0wm@n View Post
    arent thuban allready a win in perf/mm2/watt ???? so i would expect the same trend to keep going at amd
    No, i7 Lynnfield uses less power, is faster in anything with less than 6 threads, roughly comparable with 6+ threads and has a smaller die size.

  4. #629
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by JF-AMD View Post
    Each interlagos processor can run either 8 256-bit executions per cycle or 16 128-bit executions per cycle.
    That's 8x4=32DP FLOPs, like I've said if I understand correctly. 8 FP clusters, each capable of 4 DP FLOPs.
    In newer AVX-updated code, we will have the same number or executions as Intel. In older non-AVX code we will have 16 128-bit executions to their 8 128-bit executions.
    How many cores do you assume for Intel ? Only 4 ?

    Core/Nehalem do 2 128 bit executions per cycle now , with 6 cores ( Westmere ) we have 12 for CPU and with Beckton/Eagleton we have 16 and 20.
    This brings another interesting point : without AVX, Interlagos won't be faster than MC in FP operation. MC can do 2x128bit execution per cycle x12 cores => 24 per CPU . Of course, that's a theoretical max and it is likely that BD will have a far higher utilization of those resources with a multithreaded front-end and a shared FP cluster.
    Interesting anyway.
    Last edited by savantu; 08-10-2010 at 09:18 AM.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  5. #630
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Hans answered you earlier, there are 16x4, not 8x4.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  6. #631
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by JF-AMD View Post
    Well, here is how I understand it. We have 2 128-bit FMACs that fuse into a 256-bit AVX. Intel "double pumps" a 128-bit unit to get it to 256-bit (I think Hans has a die shot that shows the SB 256-bit being about the same size as the current 12-bit.)

    As it has been explained to me, the double pumping requires AVX, meaning that if the SW does not support AVX, then things execute at 128-bit only.

    Perhaps someone with some info could chime in.
    double pumping basically cuts the delay of the execution stages on half. this means double the clockspeed so whether the code uses 128bits of the avx registers or all 256bits execution will be two times faster (in theory). BD will do double operations/clock and SB will double clockspeed, two ways to solve the same problem.

    i really dont think that they will use double pumping, hyperpipelining, or whatever you want to call it in sandy bridge. it caused a lot of issues in netburst. to put it simply making alu's clock 2x faster is very complex, it takes experienced designers and many circuit simulations to assure robustness. it is much more productive having these people work on other parts of the chip, like designing a really fast L1 cache, high speed I/O, or power/clock gating.

  7. #632
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Quote Originally Posted by Chumbucket843 View Post
    i really dont think that they will use double pumping, hyperpipelining, or whatever you want to call it in sandy bridge. it caused a lot of issues in netburst. to put it simply making alu's clock 2x faster is very complex, it takes experienced designers and many circuit simulations to assure robustness. it is much more productive having these people work on other parts of the chip, like designing a really fast L1 cache, high speed I/O, or power/clock gating.
    Just read Hans' post here:


    Note that the way Intel implements 256 bit AVX is somewhat of a trick to
    avoid bloating up the core to much. They actually used a 128 bit unit which
    runs at double the clock speed because they "hyper-pipelined" it .

    The FP/SSE/AVX area on the Sandy Bridge die is only slightly greater as
    the FP/SSE unit on Westmere.
    It does have consequences for the power
    consumption however and Sandy Bridge will be about as large a single
    Bulldozer module anyway.
    http://www.amdzone.com/phpbb3/viewto...rt=875#p185515

    It has to be hyper pipelined, because the FPU is nearly unchanged from Westmere's but - according to intel - it should be able to handle 256bits in 1 clock. Thus the FPU has to be double clocked. Plain logic.

    If you deny it, then you have to point out the 256bit units on SB's die plot.

    cheers

    P.S: Who else if not intel has "experienced designers" ? ;-)
    Last edited by Opteron146; 08-10-2010 at 10:42 AM.

  8. #633
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Chumbucket843 View Post
    i really dont think that they will use double pumping, hyperpipelining, or whatever you want to call it in sandy bridge. it caused a lot of issues in netburst. to put it simply making alu's clock 2x faster is very complex, it takes experienced designers and many circuit simulations to assure robustness. it is much more productive having these people work on other parts of the chip, like designing a really fast L1 cache, high speed I/O, or power/clock gating.
    While this is true, what do you say to the die photo analysis? There's also been an interesting comment by an Intel guy, here:

    http://software.intel.com/en-us/foru...ad.php?t=68554

    It seems point 1) may have assumed it requires monolithic 256-bit hardware to achieve 1 cycle throughput for 256-bit AVX instructions. That's not true.
    Vague, but perhaps a hint?

  9. #634
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    The way I see it is that intel fellow indirectly confirmed what Hans already found out from SB die photo.

  10. #635
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    keep in mind i am just giving my opinion. from a risk/reward perspective it just doesnt seem like a good decision, then again it is intel.

    also better floorplanning could easily increase FETs/mm2.

    http://yfrog.com/49iacores2p
    here is a comparison of intel's cores. they are normalized by height. you can see that the datapath in the upper right keeps getting proportionally smaller until sandy bridge.
    Quote Originally Posted by Opteron146 View Post
    Just read Hans' post here:



    http://www.amdzone.com/phpbb3/viewto...rt=875#p185515

    It has to be hyper pipelined, because the FPU is nearly unchanged from Westmere's but - according to intel - it should be able to handle 256bits in 1 clock. Thus the FPU has to be double clocked. Plain logic.

    If you deny it, then you have to point out the 256bit units on SB's die plot.

    cheers

    P.S: Who else if not intel has "experienced designers" ? ;-)
    no one can point out sb's alu's. individual units have been invisible to the naked eye for many years now.

    yes, intel has experienced engineers but not enough. also they need to use them effectively due to their scarcity.

  11. #636
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by informal View Post
    The way I see it is that intel fellow indirectly confirmed what Hans already found out from SB die photo.
    It seems likely. (edit: although, hmm, chum makes an interesting point ^^ there ^^ )

    Of course, as I argued above, the real comparison should be between an 8-core Sandy Bridge and an 8-core Valencia/Zambezi, if you want to look at relative design strengths.

    If you instead insist on comparing to Interlagos, what happens to that analysis if Intel says "we'll play that same trick, too" ? (Have to give credit to JF for framing the debate this way, however. )
    Last edited by terrace215; 08-10-2010 at 11:17 AM.

  12. #637
    Registered User
    Join Date
    Sep 2009
    Posts
    77
    Quote Originally Posted by terrace215 View Post
    It seems likely. (edit: although, hmm, chum makes an interesting point ^^ there ^^ )

    Of course, as I argued above, the real comparison should be between an 8-core Sandy Bridge and an 8-core Valencia/Zambezi, if you want to look at relative design strengths.

    If you instead insist on comparing to Interlagos, what happens to that analysis if Intel says "we'll play that same trick, too" ?
    Compare at the same price, otherwise it will be meaningless. So we have to wait, or starting a discussion about the manufacturing cost of 32nm.
    Last edited by superrugal; 08-10-2010 at 11:24 AM.

  13. #638
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Intel would first have to put that thing on the roadmap and then play that game . And as things are right now,there is not even a hint of such a design. 2 years per design cycle and maybe in 2012 we can see such a product. I don't think that current server socket that will host SB EX can support 2 SB dice linked via QPI.

  14. #639
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Quote Originally Posted by Chumbucket843 View Post
    no one can point out sb's alu's. individual units have been invisible to the naked eye for many years now.
    No you can, e.g. compare AMD's FPU from K8 -> K10. It was more or less doubled, because it was increased from 80bits -> 128bit. Very easily visible.
    The same should be the case if Intel would double their FPU from 128 -> 256b.

    But you cant see anything like that. The new FPU is just a tiny little bit bigger than Westmere's old 128bit FPU. Definitly not enough to handle "real" 256bits.

    However the double clocked FPUs could be a good solution, too.

    Let's wait and see.

  15. #640
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Chumbucket843 View Post
    keep in mind i am just giving my opinion. from a risk/reward perspective it just doesnt seem like a good decision, then again it is intel.

    also better floorplanning could easily increase FETs/mm2.

    http://yfrog.com/49iacores2p
    here is a comparison of intel's cores. they are normalized by height. you can see that the datapath in the upper right keeps getting proportionally smaller until sandy bridge.
    Those are.. 65nm conroe, 45nm nehalem, 32nm westmere, 32nm SB, from left to right?

    So, err... the last transition not being a shrink doesn't explain this?
    Last edited by terrace215; 08-10-2010 at 11:46 AM. Reason: 65nm part is conroe, not penryn

  16. #641
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by informal View Post
    Intel would first have to put that thing on the roadmap and then play that game . And as things are right now,there is not even a hint of such a design. 2 years per design cycle and maybe in 2012 we can see such a product. I don't think that current server socket that will host SB EX can support 2 SB dice linked via QPI.
    The MCM approach is only for yields / costs. And that would not take 2 years, and who's to say they aren't already at work on it?

    Just bumping the number of cores would be the more expensive, less design-work way to do it. Surely you don't think *that* would take them all that long, given the plethora of SB variants already coming?

  17. #642
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by superrugal View Post
    Compare at the same price, otherwise it will be meaningless. So we have to wait, or starting a discussion about the manufacturing cost of 32nm.
    The one-time CPU cost is a small fraction of the total system-over-time cost.

  18. #643
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    But Intel does not have an 8-core client, so you can't compare an 8-core SB to an 8-core Zambezi.

    Just compare client to client, and server to server. Choose the same price points and it is off to the races.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  19. #644
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by terrace215 View Post
    The MCM approach is only for yields / costs. And that would not take 2 years, and who's to say they aren't already at work on it?

    Just bumping the number of cores would be the more expensive, less design-work way to do it. Surely you don't think *that* would take them all that long, given the plethora of SB variants already coming?
    Right now if you look at the 2P and 4P intel products (not counting itanium) you have 2 2P architectures (5600 and 6500) and you have 1 4P (7500) with a rumored second one.

    Now, throw an MCM in the mix, and even if Intel wants to build it, it will create a mess of their roadmap and the OEMs probably wouldn't want to use it.

    Just because they can build it doesn't mean that everyone will buy it.

    Too many platforms, too many sockets, too litttle differentiation. They're smarter than to try that.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  20. #645
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by JF-AMD View Post
    But Intel does not have an 8-core client, so you can't compare an 8-core SB to an 8-core Zambezi.

    Just compare client to client, and server to server. Choose the same price points and it is off to the races.
    Valencia, then, I can't keep track of all your crazy names. Is that one the 8-core server part?

    Price points for the server CPUs only have little to do with things, as you know.

    ----

    As for client, has it been completely resolved whether client high-end SB will be only 6-core, or 8-core, or offered in both 6- and 8-core? I haven't heard this before.
    Last edited by terrace215; 08-10-2010 at 12:05 PM.

  21. #646
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by JF-AMD View Post
    it will create a mess of their roadmap and the OEMs probably wouldn't want to use it.

    Just because they can build it doesn't mean that everyone will buy it.

    Too many platforms, too many sockets, too litttle differentiation. They're smarter than to try that.
    Err, you'll have to do better than that, lol.

    Same platform, same socket, seems like a fair bit of differentiation to me...

    Yes, how on earth would the market survive if, in addition to a high-speed 8-core part, they also offered a medium-speed 12- or 14-core "continuous throughput optimized" part...

    If it wouldn't, well then clearly your Interlagos part will just ruin the market for Valencia, now won't it?

    It's the same principle: slow the cores into the most power-efficient speed-band, and add more cores on the same part. You guys used this trick with MC, and plan to continue with Interlagos. What's good for the goose is good for the gander, no?
    Last edited by terrace215; 08-10-2010 at 12:16 PM.

  22. #647
    Xtreme Addict
    Join Date
    Jan 2003
    Location
    Ayia Napa, Cyprus
    Posts
    1,354
    ** sorry 4 the off topic **

    wow just wow, we have had a good 2 pages of excellent constructive dialect with regards to Bulldozer architecture



    Keep the love flowing, lol

    Seasonic Prime TX-850 Platinum | MSI X570 MEG Unify | Ryzen 5 5800X 2048SUS, TechN AM4 1/2" ID
    32GB Viper Steel 4400, EK Monarch @3733/1866, 1.64v - 13-14-14-14-28-42-224-16-1T-56-0-0
    WD SN850 1TB | Zotac Twin Edge 3070 @2055/1905, Alphacool Eisblock
    2 x Aquacomputer D5 | Eisbecher Helix 250
    EK-CoolStream XE 360 | Thermochill PA120.3 | 6 x Arctic P12

  23. #648
    Banned
    Join Date
    May 2006
    Location
    Brazil
    Posts
    580
    Quote Originally Posted by JF-AMD View Post
    But Intel does not have an 8-core client, so you can't compare an 8-core SB to an 8-core Zambezi.

    Just compare client to client, and server to server. Choose the same price points and it is off to the races.
    there will be 8-core SB for desktops @ socket LGA2011

    be sure intel doesnt want to share the enthusiast client market

  24. #649
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by -Sweeper_ View Post
    there will be 8-core SB for desktops @ socket LGA2011

    be sure intel doesnt want to share the enthusiast client market
    Can you confirm that this mysterious (LGA-1356) intermediate socket (between LGA-2011 and LGA-1155) has been (wisely, IMO) ditched in favor of LGA-2011 for the high-end desktop client as well?

  25. #650
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    If S1356 is gone, say hello to 400€+ boards with the top of the cream going for 500€

Page 26 of 29 FirstFirst ... 1623242526272829 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •