MMM
Page 25 of 29 FirstFirst ... 1522232425262728 ... LastLast
Results 601 to 625 of 719

Thread: AMD cuts to the core with 'Bulldozer' Opterons

  1. #601
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by JF-AMD View Post
    Yes, I went over the slides last night. There will be data on ALU/AGU, extensions, front end, Flex FP and more.

    The slides will not have the same depth that the talk will because an engineer will be doing the voice over. We hope to get some of them to blog later, but the 20 questions blog should allow us to get some of the details out.

    People just shouldn't expect Hot Chips to be a big unveiling. It is sharing details. Just don't want people getting to hyped on it.
    I like slides, and if it's in realtime on internet i'll be ready to watch it like others.

    You can tell to the guy that ordered the new ATI marketing video, that was funny but a real mistake.

    It's funny for us who know what is a fermi. Most of buyers even don't know the name of the last Geforce, and even don't know it's nvidia the builder. No one for sure on Xtremsystems. And here i can bet, 80% of last generation GPU are RV870.

    Marketing is a way to sell to noobs in computer ... Old guys will buy perf/w/€ and will not be impressed by marketing. The problem is there is a lot of noobs, and a few old guys ...

    AMD never got a good marketing team ( and funds ) unlike intel and nvidia, and even marketing is not doing good stuff with that money, it doesn't help to not crash your corp because of lack of cash flow.

    The problem is that blue guys will never stop do huge marketing, even they don't spend money on their dev teams, they don't care.

    Yeah great P4D was sold a lot ? ... That was crazy, do you agree ?

  2. #602
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by madcho View Post
    The problem is that blue guys will never stop do huge marketing, even they don't spend money on their dev teams, they don't care.
    You think Intel doesn't spend a lot on development? As in R&D? Really?

    Intel's R&D spend is roughly equivalent to AMD's *revenues*. For example, last quarter's R&D spend was $1.66B.

    http://sec.gov/Archives/edgar/data/5...56011e10vq.htm

  3. #603
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by terrace215 View Post
    You think Intel doesn't spend a lot on development? As in R&D? Really?

    Intel's R&D spend is roughly equivalent to AMD's *revenues*. For example, last quarter's R&D spend was $1.66B.

    http://sec.gov/Archives/edgar/data/5...56011e10vq.htm
    R&D budgets are only a single factor on the overall product of a company.
    Otherwise, we would all be buying CRISP processor based computers, running the Mach 3.0 microkernel, and using the PostScript-based NeWS (Network extensible Window System) GUI.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  4. #604
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    Quote Originally Posted by JF-AMD View Post
    I have made exactly one comment on an Intel thread, about Beckton. I said "looks hot."
    I haven't seen anything yet in the discussions about the faith of Intel's
    Boxboro top server platform, currently host to Beckton and waiting to
    receive the 10 core Westmere-EX (Eagleton) in Q3-2011 and the next
    Itanium (the 32nm Poulson) in 2012.

    The bits and pieces now public and the inherent latency increase
    and bandwidth reduction from the Millbrook memory buffer-on-board
    architecture (1.066 GHz) seems to suggest that Interlagos will be
    15% faster in Integer and 50% to 60% faster in FP applications as a
    10 core 2.66 GHz Eagleton with 165W (cpu) +35W (buf) = 200W TDP.
    (even without using any AVX)

    With both having a 32nm process and about the same die size.

    I wonder what they can (and will) do to please Oracle and IBM in terms
    of performance or that they'll try to convert to Patsburg / SandyBridge EX
    as fast as they can.



    Regards, Hans

  5. #605
    Xtreme Member
    Join Date
    Feb 2010
    Posts
    138
    Quote Originally Posted by Hans de Vries View Post
    I haven't seen anything yet in the discussions about the faith of Intel's
    Boxboro top server platform, currently host to Beckton and waiting to
    receive the 10 core Westmere-EX (Eagleton) in Q3-2011 and the next
    Itanium (the 32nm Poulson) in 2012.

    The bits and pieces now public and the inherent latency increase
    and bandwidth reduction from the Millbrook memory buffer-on-board
    architecture (1.066 GHz) seems to suggest that Interlagos will be
    15% faster in Integer and 50% to 60% faster in FP applications as a
    10 core 2.66 GHz Eagleton with 165W (cpu) +35W (buf) = 200W TDP.
    (even without using any AVX)

    With both having a 32nm process and about the same die size.

    I wonder what they can (and will) do to please Oracle and IBM in terms
    of performance or that they'll try to convert to Patsburg / SandyBridge EX
    as fast as they can.



    Regards, Hans
    200 W TDP...
    I'm in India, with temps in summer creeping north of 47 degree celcius...

  6. #606
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Hans de Vries View Post
    I haven't seen anything yet in the discussions about the faith of Intel's
    Boxboro top server platform, currently host to Beckton and waiting to
    receive the 10 core Westmere-EX (Eagleton) in Q3-2011 and the next
    Itanium (the 32nm Poulson) in 2012.

    The bits and pieces now public and the inherent latency increase
    and bandwidth reduction from the Millbrook memory buffer-on-board
    architecture (1.066 GHz) seems to suggest that Interlagos will be
    15% faster in Integer and 50% to 60% faster in FP applications as a
    10 core 2.66 GHz Eagleton with 165W (cpu) +35W (buf) = 200W TDP.
    (even without using any AVX)
    Without AVX, how can 8 FPUs capable of 4 DP FLOPs be 50% faster than 10 FPUs also capable of 4 DP FLOPs ?
    Frequency wise, I do not expect Interlagos to clock faster than 2.5-2.7GHz at 137w.
    Secondly, the memory performance of Millbrook is very good considering the extra complexity : Tukwilla does 50GBs in Stream and Beckton does 60GBs+ ( SGI managed 69GBs ) at 4P level.

    http://www.cs.virginia.edu/stream/st...2010/0006.html


    I wonder what they can (and will) do to please Oracle and IBM in terms
    of performance or that they'll try to convert to Patsburg / SandyBridge EX
    as fast as they can.
    Regards, Hans
    Well Oracle is apparently pleased enough that they dropped the entire AMD based server lineup. I would guess Oracle has been testing Eagleton for quite some time now.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  7. #607
    Xtreme Member
    Join Date
    Feb 2010
    Posts
    138
    Quote Originally Posted by savantu View Post
    Frequency wise, I do not expect Interlagos to clock faster than 2.5-2.7GHz at 137w.
    Given that AMD is doing rather well with 12 physical cores rather comfortably at 115W of TDP. All indications are that Interlagos on a new node (32nm) would fare better than Magny Cours(45nm), TDP wise. This is not just down to the manufacturing node, but also has a lot to do with the way the cores are designed. It makes me wonder why you would preseume what you do... Would you please mind sharing some information on the same? Sincerely, i'm rather curiously/ anxiously waiting for BD to show up... so please do share this...
    Last edited by tifosi; 08-10-2010 at 01:43 AM. Reason: typo :P

  8. #608
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by tifosi View Post
    Given that AMD is doing rather well with 12 physical cores rather comfortably at 115W of TDP. All indications are that Interlagos on a new node (32nm) would fare better than Magny Cours(40nm), TDP wise. This is not just down to the manufacturing node, but also has a lot to do with the way the cores are designed. It makes me wonder why you would preseume what you do... Would you please mind sharing some information on the same? Sincerely, i'm rather curiously/ anxiously waiting for BD to show up... so please do share this...
    Real power is 137w for the 2.3GHz bin IIRC.

    Rule of thumb for a shrink is 20% more frequency et ceteris paribus. Interlagos has 33% more cores and 15% more frequency ( by my assumption ) and stays in the same power envelope. Sounds reasonable to me.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  9. #609
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by savantu View Post
    Real power is 137w for the 2.3GHz bin IIRC.

    Rule of thumb for a shrink is 20% more frequency et ceteris paribus. Interlagos has 33% more cores and 15% more frequency ( by my assumption ) and stays in the same power envelope. Sounds reasonable to me.
    We know nothing about frequencies yet. Could be high IPC part at 2GHz, or low IPC part at 4GHz+. Or any combination in between.

  10. #610
    Xtreme Member
    Join Date
    Feb 2010
    Posts
    138
    Quote Originally Posted by savantu View Post
    Real power is 137w for the 2.3GHz bin IIRC.

    Rule of thumb for a shrink is 20% more frequency et ceteris paribus. Interlagos has 33% more cores and 15% more frequency ( by my assumption ) and stays in the same power envelope. Sounds reasonable to me.
    Makes a lot of sense... so expected TDP would 125-140W (given that the actual cores are smaller thanks to the redesigning of cores)... which is not bad, to be quite honest...

    Thanks for the help mate!

  11. #611
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Posts
    504
    Quote Originally Posted by tifosi View Post
    ...Magny Cours(45nm)...
    Fixed.
    IQ_NOT_LESS_OR_EQUAL

    outdated hardware

  12. #612
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by tifosi View Post
    200 W TDP...
    I'm in India, with temps in summer creeping north of 47 degree celcius...
    Then I suggest when you order one of these you order a 20,000 BTU AC for that 6x8ft room it will go into!
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  13. #613
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by savantu View Post
    Real power is 137w for the 2.3GHz bin IIRC.

    Rule of thumb for a shrink is 20% more frequency et ceteris paribus. Interlagos has 33% more cores and 15% more frequency ( by my assumption ) and stays in the same power envelope. Sounds reasonable to me.
    So you expect it to have zero IPC improvement .Good luck with that

  14. #614
    Xtreme Member
    Join Date
    Feb 2010
    Posts
    138
    Quote Originally Posted by Sunfire View Post
    Fixed.
    Thanks! I noticed it too later on, but i'm running downloads and i didn't slow them :P so it took a while for the page to open on me yes, it is only a paltry 2Mbps that i have to make do with.

    Quote Originally Posted by Movieman View Post
    Then I suggest when you order one of these you order a 20,000 BTU AC for that 6x8ft room it will go into!
    LOL... Yes, perhaps i'll have borrow a cryogenic fuel tank from NASA :P

  15. #615
    Registered User
    Join Date
    Feb 2009
    Posts
    470
    well then put a big non-smoking up in front of the tank! guess with 200w wc these racks will be very popular...something like google is doing with their datacenters


    Tell it it's a :banana::banana::banana::banana::banana: and threaten it with replacement

    D_A on an UPS and life

  16. #616
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    Quote Originally Posted by savantu View Post
    Without AVX, how can 8 FPUs capable of 4 DP FLOPs be 50% faster than 10 FPUs also capable of 4 DP FLOPs ?
    Interlagos is double that isn't it : 16 cores x 4 DP Flops/cycle.
    More important really is the DDR3 1.866 GHz capability. Bandwidth is
    easily saturated even without using any AVX...

    Quote Originally Posted by savantu View Post
    Frequency wise, I do not expect Interlagos to clock faster than 2.5-2.7GHz at 137w.
    That sounds reasonable. I wonder if we'll see a high-end desktop/
    workstation version now, since with all the power gating it should be
    possible to have optimized single thread and few thread performance
    as well as highly threaded throughput computing.

    Quote Originally Posted by savantu View Post
    Secondly, the memory performance of Millbrook is very good considering the extra complexity : Tukwilla does 50GBs in Stream and Beckton does 60GBs+ ( SGI managed 69GBs ) at 4P level.

    http://www.cs.virginia.edu/stream/st...2010/0006.html

    Well Oracle is apparently pleased enough that they dropped the entire AMD based server lineup. I would guess Oracle has been testing Eagleton for quite some time now.
    If Boxboro would have arrived in Q1,2009 then it would have been fine.
    Now it has a rather short window of opportunity. Westmere EX and later
    Poulson seem to have the same fate as Dunnington.

    Kudos to AMD for switching away from the buffer-on-board approach on time.


    Regards, Hans
    Last edited by Hans de Vries; 08-10-2010 at 03:32 AM.

  17. #617
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by savantu View Post
    Tukwilla does 50GBs in Stream and Beckton does 60GBs+ ( SGI managed 69GBs ) at 4P level.

    http://www.cs.virginia.edu/stream/st...2010/0006.html
    We are getting 54GB/s in 2P configs today. And in 4P we are over 100GB/s. Not sure how you can be excited about SGI's 69GB/s for a 4P when we are 50% higher than that today.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  18. #618
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Hans de Vries View Post
    Interlagos is double that isn't it : 16 cores x 4 DP Flops/cycle.
    ...
    You have 8 modules, each module having 2 integer clusters ( 4 ALUs each ) and a single FP cluster. That is how I understood it.
    Each FP cluster is capable of 4 DP FLOPs cycle and 8 with AVX/FMA.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  19. #619
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by JF-AMD View Post
    We are getting 54GB/s in 2P configs today. And in 4P we are over 100GB/s. Not sure how you can be excited about SGI's 69GB/s for a 4P when we are 50% higher than that today.
    Well, I'm not excited, I said it performs ok for the kludge it is.

    Obviously, MC style direct connect of 4 DDR3 channels offers better performance and lower complexity.

    I am puzzle why Intel continously smashes its head when it comes to memory technology. I understand the decision to equip Beckton and Tukwilla with FBDIMM IMCs was done 5 years ago, but given the modularity why the hell did they not change it to direct connect ? Strange.

    Now they're stuck for 2 consecutive generations to this kludge ( Beckton, Eagleton ) and for 3 with Itanium ( Tukwilla,Poulson,Kittson ).
    Dumb as a post.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  20. #620
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    Quote Originally Posted by savantu View Post
    You have 8 modules, each module having 2 integer clusters ( 4 ALUs each ) and a single FP cluster. That is how I understood it.
    Each FP cluster is capable of 4 DP FLOPs cycle and 8 with AVX/FMA.
    I don't know exactly how the arbitration is but the FP cluster can handle
    four 128 bit Floating point operations/cycle (each 2 DP or 4 SP) and it can
    do so for any of the two cores.


    Regards, Hans

  21. #621
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    As I understand it ,each FMAC unit is 4DP capable and there are two of those 128bit FMAC units in the Flex FPU(or what savantu calls single FP cluster) AMD is implementing.

  22. #622
    Xtreme Member
    Join Date
    Feb 2010
    Posts
    138
    So i should read that it would be a spectacular win in perf/ watt for AMD?

  23. #623
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by savantu View Post
    You have 8 modules, each module having 2 integer clusters ( 4 ALUs each ) and a single FP cluster. That is how I understood it.
    Each FP cluster is capable of 4 DP FLOPs cycle and 8 with AVX/FMA.
    Each interlagos processor can run either 8 256-bit executions per cycle or 16 128-bit executions per cycle.

    In newer AVX-updated code, we will have the same number or executions as Intel. In older non-AVX code we will have 16 128-bit executions to their 8 128-bit executions.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  24. #624
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by tifosi View Post
    So i should read that it would be a spectacular win in perf/ watt for AMD?


    arent thuban allready a win in perf/mm2/watt ???? so i would expect the same trend to keep going at amd

    Quote Originally Posted by JF-AMD View Post
    Each interlagos processor can run either 8 256-bit executions per cycle or 16 128-bit executions per cycle.

    In newer AVX-updated code, we will have the same number or executions as Intel. In older non-AVX code we will have 16 128-bit executions to their 8 128-bit executions.
    so interlagos should do some damage on server workload? :O .. cant wait to get a full architectural analysis


    ohh btw is there going to be some sort of livestream of the hot chip conference online???
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  25. #625
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by savantu View Post
    Real power is 137w for the 2.3GHz bin IIRC.

    Rule of thumb for a shrink is 20% more frequency et ceteris paribus. Interlagos has 33% more cores and 15% more frequency ( by my assumption ) and stays in the same power envelope. Sounds reasonable to me.
    I think with "real power" you mean TDP. Anandtech measured ~89W "real power" (at full load) for a 2.2 GHz MC (80W ACP, 115W TDP).

    So with 90W and 2.2 GHz * 1.15 ~= 2.5 GHz there's actually some headroom.

    But I think there is a different problem in assuming, how such a processor works and how it is designed for existing TDPs. Magny Cours has neither the ability to use turbo to exploit thermal headroom nor could it fully utilize all the units in the cores under ISA and mem+I/O bandwidth based restrictions. So the given TDPs probably could never be reached. For further arguments I refer to all the past ACP/TDP/TDP discussions.

    Techniques not in use in MC, like power gating, frequency boost, unit scaling (adaptivity like flexible cache sizes), shared units (which might be at 60% avg. utilization in MC causing leakage for no work) still leave a lot of options on the table.

    From what I've seen so far I assume the big change in Interlagos simply won't be in it's 100% utilization scenario you're looking at, and as Hans already indicated by referring to single and few thread scenarios.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

Page 25 of 29 FirstFirst ... 1522232425262728 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •