MMM
Page 27 of 29 FirstFirst ... 17242526272829 LastLast
Results 651 to 675 of 719

Thread: AMD cuts to the core with 'Bulldozer' Opterons

  1. #651
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    LGA2011 is server still thinking, highend 1356
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  2. #652
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by Chumbucket843 View Post
    keep in mind i am just giving my opinion. from a risk/reward perspective it just doesnt seem like a good decision, then again it is intel.
    Hyperpipelining is less painful as it sounds. Simply clocking an existing unit twice as fast (without dividing it into more pipeline stages) would be painful instead.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  3. #653
    Xtreme Addict
    Join Date
    May 2005
    Posts
    1,341
    Quote Originally Posted by terrace215 View Post
    Those are.. 65nm conroe, 45nm nehalem, 32nm westmere, 32nm SB, from left to right?

    So, err... the last transition not being a shrink doesn't explain this?
    Quote Originally Posted by terrace215 View Post
    Valencia, then, I can't keep track of all your crazy names. Is that one the 8-core server part?.
    right in for a good laugh
    Quote Originally Posted by Movieman View Post
    Fanboyitis..
    Comes in two variations and both deadly.
    There's the green strain and the blue strain on CPU.. There's the red strain and the green strain on GPU..

  4. #654
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by FlanK3r View Post
    LGA2011 is server still thinking, highend 1356
    Then why no recent references to 1356 anywhere? Lately, any mention of high-end desktop mentions 2011...

    FWIW, someone even updated (without reference or attribution) the wiki entry for it from 1356 to 2011 several months back.

  5. #655
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Quote Originally Posted by FlanK3r View Post
    LGA2011 is server still thinking, highend 1356
    There were some news that just speak of socket 1155 and 2011, no 1356 or 1355. Eg.:
    http://www.engadget.com/2010/04/21/i...aving-those-p/

    http://vr-zone.com/articles/a-look-i...ay/8877-1.html

    If this is true, then I guess Intel will win in the Enthusiast desktop segment because it will be 8 real Intel cores (with SMT =16 threads) and quad channel, against 4 Zambezi Modules (with CMT = 8 Threads) with dual channel.

    However I wonder about the costs for the needed 8layer mainboards and for the CPUs, these will be also on an "enthusiast level". Furthermore, even in 2011, 4 Zambezi Modules should be enough for Gaming.

    I hope that Zambezi is better than a 4core LGA1155 Sandy, then it would be in between Intel's 1155 and 2011 offerings, that should be ideal.

    But lets wait and see.

  6. #656
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Opteron146 View Post
    I hope that Zambezi is better than a 4core LGA1155 Sandy, then it would be in between Intel's 1155 and 2011 offerings, that should be ideal.
    If it weren't, it would be a disaster. Llano is supposed to be the part positioned against SB 1155. Zambezi is to go against the high-end desktop SB.

    Although now that you mention it, I guess it is possible that in gaming and other low-thread situations, even SB 1155 might beat Zambezi.
    Last edited by terrace215; 08-10-2010 at 04:22 PM.

  7. #657
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by terrace215 View Post
    If it weren't, it would be a disaster. Llano is supposed to be the part positioned against SB 1155. Zambezi is to go against the high-end desktop SB.

    Although now that you mention it, I guess it is possible that in gaming and other low-thread situations, even SB 1155 might beat Zambezi.
    Only with discreet card.

  8. #658
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Hornet331 View Post
    Only with discreet card.
    Given that Zambezi has no iGPU, I'd say that it would be quite a blowout to compare without discrete GPUs.

  9. #659
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by terrace215 View Post
    Price points for the server CPUs only have little to do with things, as you know.
    You're absolutely right, people buy platforms.

    Go to Dell's site or HP's site and configure apples to apples R710 vs. R715 or DL380 vs. DL385.

    You will find that the processor savings does pass through to the platform level and AMD platforms are 10-15% less expensive.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  10. #660
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by JF-AMD View Post
    You're absolutely right, people buy platforms.

    Go to Dell's site or HP's site and configure apples to apples R710 vs. R715 or DL380 vs. DL385.

    You will find that the processor savings does pass through to the platform level and AMD platforms are 10-15% less expensive.
    And yet, AMD lost significant server share last quarter, so either your evaluation of apples to apples is off, or there are other factors at play...
    Last edited by terrace215; 08-10-2010 at 07:28 PM.

  11. #661
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Quote Originally Posted by terrace215 View Post
    Given that Zambezi has no iGPU, I'd say that it would be quite a blowout to compare without discrete GPUs.
    Because Zambezi surely won't be available on any platforms with integrated GPUs.

  12. #662
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Solus Corvus View Post
    Because Zambezi surely won't be available on any platforms with integrated GPUs.
    Indeed, that would be a rather odd paring: a chipset-integrated GPU with an enthusiast-class desktop part.

  13. #663
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Right, because we can't currently pair AMD's enthusiast desktop parts with integrated GPU chipsets.

    Nor will AMD release any BD core products with less then 8 cores.

  14. #664
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by informal View Post
    The way I see it is that intel fellow indirectly confirmed what Hans already found out from SB die photo.
    Or not

    Great questions … some more details to the response Max gave.

    1) The chart is wrong, we will fix it. Sandy Bridge has true 256-bit FP execution units (mul, add, shuffle). They are on exactly the same execution ports as the 128-bit versions. You can get a 256-bit multiply (on port 0) and a 256-bit add (on port 1) and a 256-bit shuffle (port 5) every cycle. 256-bit FP add and multiply bandwidth is therefore 2X higher flops than 128. See IACA for the ports on an instruction-by-instruction basis.
    2) The chart doesn’t mention 16-byte paths. We have true 32-byte loads (i.e. each load only uses one AGU resource and we have 2 AGU’s) but only a 48-byte/cycle total is supported to the L1 each cycle. You can’t get 48 bytes per cycle to the DCU using 128-bit operations (only 2 agu’s…). This is why a simple memory-limited kernel like matrix add (load, load, add, store) measures 1.42X speedup (would have predicted 1.5X with the current architecture in the limit; vs. 1.0X if we had double pumped).
    http://software.intel.com/en-us/foru...st.php?p=97176

    What can we understand from "true 256bit EU" and denial of double pumping ?
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  15. #665
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by terrace215 View Post
    If it weren't, it would be a disaster. Llano is supposed to be the part positioned against SB 1155. Zambezi is to go against the high-end desktop SB.

    Although now that you mention it, I guess it is possible that in gaming and other low-thread situations, even SB 1155 might beat Zambezi.
    Well, I don't think that was the plan. The delay made it so. Wouldn't you agree the K10 core and its derivates is a bit long in the tooth compared to SB which thrashes the Nehalem generation ?
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  16. #666
    Xtreme Addict
    Join Date
    May 2005
    Posts
    1,341
    Quote Originally Posted by terrace215 View Post
    Those are.. 65nm conroe, 45nm nehalem, 32nm westmere, 32nm SB, from left to right?

    So, err... the last transition not being a shrink doesn't explain this?
    Quote Originally Posted by terrace215 View Post
    Valencia, then, I can't keep track of all your crazy names. Is that one the 8-core server part?

    Price points for the server CPUs only have little to do with things, as you know.

    ----

    As for client, has it been completely resolved whether client high-end SB will be only 6-core, or 8-core, or offered in both 6- and 8-core? I haven't heard this before.
    So woodcrest -wolfdale - clover -harper - gaines - gulf - sandy bridge
    and tigerton - dunnington - delay -delay -delay - beckton -

    against
    barcelona - shanghai - istanbul - lisbon/magny cours - valencia/interlagos

    and yet you tell me that you can't keep track of the AMD based codenames, wouldn't call you an enthousiast but a pure intel fanboy

    for those who don't understand what architecture "delay" means, it is delay => http://en.wikipedia.org/wiki/Delay (is that the good way to explain MM? )
    Quote Originally Posted by savantu View Post
    Well, I don't think that was the plan. The delay made it so. Wouldn't you agree the K10 core and its derivates is a bit long in the tooth compared to SB which thrashes the Nehalem generation ?
    can you give me a link where SB is totally trashing Nehalem? so you are saying that we should totally stop buying any intel based solution to wait untill next generation. Well the good part is that you'll have to buy a whole new platform anyhow as usual thx to intel
    Last edited by duploxxx; 08-10-2010 at 11:11 PM.
    Quote Originally Posted by Movieman View Post
    Fanboyitis..
    Comes in two variations and both deadly.
    There's the green strain and the blue strain on CPU.. There's the red strain and the green strain on GPU..

  17. #667
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by duploxxx View Post
    can you give me a link where SB is totally trashing Nehalem? so you are saying that we should totally stop buying any intel based solution to wait untill next generation. Well the good part is that you'll have to buy a whole new platform anyhow as usual thx to intel
    I don't know...maybe being 20-25% better int/fp performance ? This was discussed here only last week.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  18. #668
    Xtreme Addict
    Join Date
    May 2005
    Posts
    1,341
    Quote Originally Posted by savantu View Post
    I don't know...maybe being 20-25% better int/fp performance ? This was discussed here only last week.

    what I have seen is some VERY theoretical benchmarks which always prove to provide the best increase (although yet according to some intel fanboys some of those like stream can't be used when amd shows data since they are to theoretical now off course they count since it is intel) with no single thread improvement (so lets assume there will be some thx to the "untuned setup") and about 20% in multi which proves the enhancements of turbo and HT. So I wouldn't call it a killer because Neh 45nm to 32nm is also a 10% integer increase core/core//ghz/ghz yet (check official integer performance results) the example baseline used to compare against SB is based on the 45nm q720 so the enhancement will be much less and about 10%.

    Secondly while Gulf was able to add 2 cores and stay in the same TDP level with the same ghz it has yet to be seen if SB can do the same thing, adding 2 cores without having to reduce the ghz.

    Quote Originally Posted by terrace215 View Post
    And yet, AMD lost significant server share last quarter, so either your evaluation of apples to apples is off, or there are other factors at play...
    Do you have any personal user experience with Server world? I guess not, first it takes about 6months before huge companies shift orders to a new platform even when they get ES samples months in front and they still have huge orders which will remain in the old platform.
    Secondly this is the Nehalem influence that you mention why the AMD server sales dropped, actually now OLD school IT bosses can yet count again on there "we are standardized on INTEL ONLY" rubbish since thx to nehalem this cpu was actually better in most cases, while previous generations it was very easy to show that opteron platform was most of the time the better buy price/performance/power wise from an IT point of view. THE MC introduction is not yet seen into server shipments. Perhaps some day you will understand that some within this forum not just work with few servers or desktops but 1000s.

    Quote Originally Posted by terrace215 View Post
    If it weren't, it would be a disaster. Llano is supposed to be the part positioned against SB 1155. Zambezi is to go against the high-end desktop SB.

    Although now that you mention it, I guess it is possible that in gaming and other low-thread situations, even SB 1155 might beat Zambezi.
    and since you have all data you off course know already way before a platform is launched what will be the outcome, utter fanboy crap
    pls stay only in intel related topics if that is all you can bring to the table.
    Last edited by duploxxx; 08-10-2010 at 11:29 PM.
    Quote Originally Posted by Movieman View Post
    Fanboyitis..
    Comes in two variations and both deadly.
    There's the green strain and the blue strain on CPU.. There's the red strain and the green strain on GPU..

  19. #669
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by savantu View Post
    Or not



    http://software.intel.com/en-us/foru...st.php?p=97176

    What can we understand from "true 256bit EU" and denial of double pumping ?
    Can you be so kind then and shows us the extended FP unit on the die photo?The total die size difference is 7% and the FP unit block is just slightly different.

  20. #670
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by savantu View Post
    Or not

    Great questions … some more details to the response Max gave.

    1) The chart is wrong, we will fix it. Sandy Bridge has true 256-bit FP execution units (mul, add, shuffle). They are on exactly the same execution ports as the 128-bit versions. You can get a 256-bit multiply (on port 0) and a 256-bit add (on port 1) and a 256-bit shuffle (port 5) every cycle. 256-bit FP add and multiply bandwidth is therefore 2X higher flops than 128. See IACA for the ports on an instruction-by-instruction basis.
    2) The chart doesn’t mention 16-byte paths. We have true 32-byte loads (i.e. each load only uses one AGU resource and we have 2 AGU’s) but only a 48-byte/cycle total is supported to the L1 each cycle. You can’t get 48 bytes per cycle to the DCU using 128-bit operations (only 2 agu’s…). This is why a simple memory-limited kernel like matrix add (load, load, add, store) measures 1.42X speedup (would have predicted 1.5X with the current architecture in the limit; vs. 1.0X if we had double pumped).
    http://software.intel.com/en-us/foru...st.php?p=97176

    What can we understand from "true 256bit EU" and denial of double pumping ?
    Did you check the context of the double pumping statement? For me this looks to be related to loads and the cache bandwidth/AGU resources. It's also contained in point 2). I highlighted some different parts. You can also double pump cache accesses etc.

    The first version of the chart (said to be wrong in 1)) contained "AVX LO" and "AVX HI" units, also drawn at the same width as the 128 bit units. Maybe they're even not using double pumping but other techniques like wave pipelining (less likely).

    How would you explain the nearly unchanged area of the FPU on die? Surely not by chip stacking.
    Last edited by Dresdenboy; 08-11-2010 at 11:44 PM.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  21. #671
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by informal View Post
    Can you be so kind then and shows us the extended FP unit on the die photo?The total die size difference is 7% and the FP unit block is just slightly different.
    Well, you can always the question directly to Intel on the respective thread.

    My answer : I do not know why the FPU is only 7% larger ( if we were to trust an analysis based on a low resolution photo with large margins of error ). Doubling the datapaths to 256bit causes what increase in die area ? I haven't seen an analysis on this.

    Quote Originally Posted by Dresdenboy View Post
    Did you check the context of the double pumping statement? For me this looks to be related to loads and the cache bandwidth/AGU resources. It's also contained in point 2). I highlighted some different parts. You can also double pump cache accesses etc.

    The first version of the chart (said to be wrong in 1)) contained "AVX LO" and "AVX HI" units, also drawn at the same width as the 128 bit units. Maybe they're even not using double pumping but other techniques like wavefronts (less likely).

    How would you explain the nearly unchanged area of the FPU on die? Surely not by chip stacking.
    Instead of trying to find some weirdo explanations, we could take his words at face value. The words he uses are pretty straightforward :
    "The chart is wrong, we will fix it. Sandy Bridge has true 256-bit FP execution units (mul, add, shuffle).
    I wouldn't be surprised of some intentional misleading done previously for deceiving the competition.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  22. #672
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by savantu View Post
    Well, I don't think that was the plan. The delay made it so. Wouldn't you agree the K10 core and its derivates is a bit long in the tooth compared to SB which thrashes the Nehalem generation ?
    He is correct Llano was suppose to go against SB, yes it has aged K10 derivatives but the GPU is quite strong from what i heard. On the other hand SB had a strong CPU but not so strong GPU.

    Intel tried to equalize this imbalance and i also posted about it

    http://www.xtremesystems.org/forums/...8&postcount=16

    It was also reported by Fud after 7 months, well his report i a bit ammm wrong, i mean it not 100% correct.

    http://www.xtremesystems.org/forums/...59&postcount=1
    Coming Soon

  23. #673
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    Quote Originally Posted by Dresdenboy View Post
    Hyperpipelining is less painful as it sounds. Simply clocking an existing unit twice as fast (without dividing it into more pipeline stages) would be painful instead.

    It's no coincidence that the architects behind the very long SIMD words
    (256 bit, 512 bit and longer) are Doug Carmean and Eric Sprangle who joined
    Intel from Ross technologies.

    These are exactly the Hyperpipelining specialists at Intel:

    (1) They co-authored the original hyperpipelining paper:
    Increasing Processor Performance by Implementing Deeper Pipelines

    (2) They leaded the original ~60 stage hyperpiplined Nehalem project.
    http://www.theinquirer.net/inquirer/...em-slated-2005

    (3) They initiated the Larrabee project. One of the main ideas behind
    Larrabee is to achieve a theoretical maximum number of FLOPs on a
    certain die with a limited number of transistors. A fourfold hiperpipelined
    128 bit unit running at 4.8 GHz can produce 512 bit results at 1.2 GHz
    using only 25%(+a bit) of the transistors of a non hyperpipelined unit.
    ftp://download.intel.com/technology/...abee_paper.pdf
    http://www.drdobbs.com/high-performa...ting/216402188


    The SIMD units are the easiest (of all units) to hyperpipeline. All instructions
    which could cause problems for hyperpipelining have been systematically
    left out of the AVX and LNI specifications. (for instance data shuffles
    crossing 128 bit boundaries)


    Regards, Hans
    Last edited by Hans de Vries; 08-11-2010 at 03:03 AM.

  24. #674
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by savantu View Post
    Well, you can always the question directly to Intel on the respective thread.

    My answer : I do not know why the FPU is only 7% larger ( if we were to trust an analysis based on a low resolution photo with large margins of error ). Doubling the datapaths to 256bit causes what increase in die area ? I haven't seen an analysis on this.
    Intel will never discuss that info in public,you may ask all day long.
    Also the whole die is ~7% larger,not the FPU.The FPU is just tiny bit bigger.The AVX support may have contributed to that.
    When you look in the past,like Yonah to Merom(both done @ 65nm),the core size investment was radical,going from 19mm2 to 31mm2 -some of that huge increase in core logic was due to physical doubling of the SSE capabilities which is the most prominent and largest perf. change when compared to Yonah.All this resulted in 15-20% perf. increase on average over Yonah and especially in the SSE code the jump was ~50-60%. We can't use AMD as an example since Hound was done on 65nm while RevF was 90nm,but you can see Hans' work here .As Hans showed,single Hound core(65nm) takes up 20% less space than an old single RevF(which was 90nm) core and you can see the 2nd FP unit in Hound highlighted by Hans De Vries .

    edit: completely forgot the Brisbane core . 20.8 to 25.5mm2 is Brisbane to Barcelona(single core size). 22.5% increase for various core improvements and 2x SSE throughput(in theory,due to 2nd FP unit). This brought the very similar 15-20% perf. increase on average,50-60% in SSE code.
    Last edited by informal; 08-11-2010 at 03:10 AM. Reason: brisbane,corrected %

  25. #675
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Quote Originally Posted by savantu View Post
    Doubling the datapaths to 256bit causes what increase in die area ? I haven't seen an analysis on this.
    That's an easy question .. double pipelines = double die size.
    Take AMD's 80/128bit FPUs from k8 -> K10 as a comparision.
    7% is not near the needed die size increase. Hyper pipelining is the only reasonable explanation.

    @Hans:
    Thx for the links.

Page 27 of 29 FirstFirst ... 17242526272829 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •