Page 12 of 39 FirstFirst ... 2910111213141522 ... LastLast
Results 276 to 300 of 954

Thread: AMD's Bobcat and Bulldozer

  1. #276
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by Hornet331 View Post
    well x86 quite sucks at ipc.. 1.5 is a good value.
    That less depends on ISA but more on actual hardware implementation but depends even more on software code quality. There are many effective techniques to optimize code for OOO architectures (such as loop unroling e.t.c).

    Quote Originally Posted by informal
    I know it's hard to believe but Hornet is correct.The average IPC in the spec2006 test suit is around ~1 .This is on a Core 2 class chip,that is 4(+1 )wide.
    It seems they counted only arithmetic instructions (IPC < 1 does not make sence). Also in a different part of a code the ALU consumption may greatly vary.

  2. #277
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by informal View Post
    You are basing this on Paul Demone's comment?
    You might also consider AMD's very own slide showing the relative improvement of client / server / hpc.

    Client is only 1/2 of server, and 1/3rd of hpc. They've been quite open about BD's emphasis. Only when people start asking about single-threaded performance, etc, do they get defensive and start claiming that's going to be just wonderful, too.

  3. #278
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    i cant wait for AMD to do sub 10s superpi 1M runs on air

    o wait, i dont give a crap....

  4. #279
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by kl0012 View Post


    It seems they counted only arithmetic instructions (IPC < 1 does not make sence). Also in a different part of a code the ALU consumption may greatly vary.
    They counted address and math instructions :
    Figure 2.2(a) and Figure 2.2(b) represent the instruction profile of CPU2006 and CPU2000 respectively. It is evident from the figure that a very high percentage of instructions retired consist of loads and stores. CPU2006 benchmarks like h264ref, hmmer, bwaves, lesli3d and GemsFDTD have comparatively high percentage of loads while astar, bzip2, gcc, gobmk, libquantum, mcf, omnetpp, perlbench, sjeng, xalancbmk and gamess have high percentage of branch instructions. On the contrary CPU2000 benchmarks like gap, parser, vortex, applu, equake, fma3d, mgrid and swim have comparatively high percentage of loads while almost all integer programs have high percentage of branch instructions.
    You could see that it was never higher than 1.8x throughout the whole range of spec suit applications. The point is there is a lot of loads and stores that constitute a big part of instruction mix.

    Quote Originally Posted by terrace215 View Post
    You might also consider AMD's very own slide showing the relative improvement of client / server / hpc.

    Client is only 1/2 of server, and 1/3rd of hpc. They've been quite open about BD's emphasis. Only when people start asking about single-threaded performance, etc, do they get defensive and start claiming that's going to be just wonderful, too.
    What slides?The ones from 2007 that talked about BD version that got delayed in order to be reworked and improved?

  5. #280
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by terrace215 View Post
    They are optimizing for server application throughput, at the expense of client low-threaded performance.
    This is simply not true. No matter how many times you say it.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  6. #281
    Nerdy Powerlifter
    Join Date
    Jul 2007
    Location
    Down in the Bayou
    Posts
    4,553
    12 pager of discussion and I understand maybe half of it.

    So my AM3 board will not take Bulldozer. Glad I only invested in a low end board anyways.

    I hope this'll be like Athlon & Pentium III or the dual-core races all over again. I need an upgrade to my mini-itx already.
    You must [not] advance.


    Current Rig: i7 4790k @ stock (**** TIM!) , Zotac GTX 1080 WC'd 2214mhz core / 5528mhz Mem, Asus z-97 Deluxe

    Heatware

  7. #282
    Registered User
    Join Date
    Sep 2009
    Posts
    77
    Quote Originally Posted by terrace215 View Post
    They are optimizing for server application throughput, at the expense of client low-threaded performance. That might make sense for them, considering the initial target market is virtually all server/hpc. In client, they have Llano in the middle, and Ontario down low... so maybe they decided they couldn't be all things to all segments with BD.
    If I only read the bold-character part, I will think you are describing Nehalem.

  8. #283
    Xtreme Enthusiast
    Join Date
    Dec 2003
    Posts
    510
    Quote Originally Posted by superrugal View Post
    If I only read the bold-character part, I will think you are describing Nehalem.
    But Nehalem-based CPUs have the highest throughput and highest client performance. Magny Cours on the other hand...

  9. #284
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by JF-AMD View Post
    This is simply not true. No matter how many times you say it.
    Let's say "at the relative expense of", then.

    Unless you are contradicting your own company's slide? From the 2007 tech analyst day.

    Client perf/W up X, server up 2X, HPC up 3 TO 4X

    Clearly, design choices were made to favor server throughput improvements over client improvements, or else the little bar graph lines wouldn't be this way.

    I guess we'll be able to see a full desktop evaluation in about... oh, 15 months, unless you guys release some perf data early.

  10. #285
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Sweden
    Posts
    2,084
    Quote Originally Posted by terrace215 View Post
    Unless you are contradicting your own company's slide? From the 2007 tech analyst day.
    Nothing wrong with that, since it's for the 45 nm BD that never showed up.
    Quote Originally Posted by informal View Post
    What slides?The ones from 2007 that talked about BD version that got delayed in order to be reworked and improved?

  11. #286
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by terrace215 View Post
    Let's say "at the relative expense of", then.

    Unless you are contradicting your own company's slide? From the 2007 tech analyst day.

    Client perf/W up X, server up 2X, HPC up 3 TO 4X

    Clearly, design choices were made to favor server throughput improvements over client improvements
    i think the only math you know is:

    single threaded perf per watt went up X
    multi threaded perf per watt went up Y
    0<X<Y

  12. #287
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Mats View Post
    Nothing wrong with that, since it's for the 45 nm BD that never showed up.
    Somehow, I doubt the whole design philosophy changed since then.

  13. #288
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Hong Kong
    Posts
    526
    Quote Originally Posted by terrace215 View Post
    Somehow, I doubt the whole design philosophy changed since then.
    You always doubt AMD's methodology, sales, technical details, release dates and... etc


    Last edited by qcmadness; 08-25-2010 at 11:12 AM.

  14. #289
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by Manicdan View Post
    i think the only math you know is:

    single threaded perf per watt went up X
    multi threaded perf per watt went up Y
    0<X<Y
    The old "those arrows weren't meant to imply anything specific" thing again?

    I think they are qualitatively accurate...we'll have to wait, and wait, and wait... to see for certain..

  15. #290
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Yeah,3 years from now when Bulldozer launches,right?

  16. #291
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by Hornet331 View Post
    well x86 quite sucks at ipc.. 1.5 is a good value.
    IPC isnt the same across different architectures. for example a single SSE instruction can do 4 multiplies on 32bit floating point numbers in one instruction (mulps). fmul can do only one. yes, sse is explicitly data parallel but that is part of the weakness of ipc measurements.

    a better example would be a sine function. you can use the taylor series to get a good estimate. modern x86 cpu's take ~40-100 cycles to execute the fsin instruction.

    taylor series approximation:
    x - (x^3)/3! + (x^5)/5! - (x^7)/7!

    2 subtractions
    30 multiplies
    3 divide
    1 add

    36 arithmetic operations in a RISC processor is equal to 1 (very slow)instruction in x86. this is a select case. normally risc uses 30% more code space.

    this algorithm has room for improvement actually. we can store the value of x to a power and save many redundant multiplications with a look up table. i.e. compute x^3 then multiply by x^2 or add the exponents. evenutually algebra will give you a nice shortcut.

  17. #292
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by terrace215 View Post
    Somehow, I doubt the whole design philosophy changed since then.
    The key people changed, esp. Chuck Moore. So why shouldn't the design philosophy change with them? Chuck even talked about improved design philosophies in some of his older presentations.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  18. #293
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by Chumbucket843 View Post
    IPC isnt the same across different architectures. for example a single SSE instruction can do 4 multiplies on 32bit floating point numbers in one instruction (mulps). fmul can do only one. yes, sse is explicitly data parallel but that is part of the weakness of ipc measurements.

    a better example would be a sine function. you can use the taylor series to get a good estimate. modern x86 cpu's take ~40-100 cycles to execute the fsin instruction.

    taylor series approximation:
    x - (x^3)/3! + (x^5)/5! - (x^7)/7!

    2 subtractions
    30 multiplies
    3 divide
    1 add

    36 arithmetic operations in a RISC processor is equal to 1 (very slow)instruction in x86. this is a select case. normally risc uses 30% more code space.

    this algorithm has room for improvement actually. we can store the value of x to a power and save many redundant multiplications with a look up table. i.e. compute x^3 then multiply by x^2 or add the exponents. evenutually algebra will give you a nice shortcut.
    Yeah, but approximation mean it's not the good real result of the function. So it's a mistake to use it.


    ---------

    About BD @ hotchips, what about that was said ? Now we have slides, but someone read them, or someone talked about BD in same time ????

    No other information ?

  19. #294
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by terrace215 View Post
    Let's say "at the relative expense of", then.

    Unless you are contradicting your own company's slide? From the 2007 tech analyst day.

    Client perf/W up X, server up 2X, HPC up 3 TO 4X

    Clearly, design choices were made to favor server throughput improvements over client improvements, or else the little bar graph lines wouldn't be this way.

    I guess we'll be able to see a full desktop evaluation in about... oh, 15 months, unless you guys release some perf data early.
    Since then we have got MCM on the server side. That's nothing you do at the client side. And the fact that servers is faster per socket is absolutely not the same thing as servers is faster at the clients expense.
    You know, you can boost one without crippling the other.

    And a bulldozer module running one thread has much more resources to that thread than it has to two threads. The modular approach boosts single thread performance more than multi thread performance. The advantage at multi thread performance is less die space.

  20. #295
    Xtreme Member
    Join Date
    Nov 2006
    Posts
    324
    Quote Originally Posted by Chumbucket843
    x - (x^3)/3! + (x^5)/5! - (x^7)/7!
    2 subtractions
    30 multiplies
    3 divide
    1 add
    It could be counted as 9 multiplies, actually:
    with intermediate results a=x^2 and b=x^5:
    x - (x*x*x)* (1/3!) + (b=((a=x*x)*a*x)) * (1/5!) - (a*b) * (1/7!)
    Last edited by SEA; 08-25-2010 at 11:38 AM.
    Windows 8.1
    Asus M4A87TD EVO + Phenom II X6 1055T @ 3900MHz + HD3850
    APUs

  21. #296
    Xtreme Cruncher
    Join Date
    Jul 2006
    Posts
    1,374
    Quote Originally Posted by terrace215 View Post
    Let's say "at the relative expense of", then.

    Unless you are contradicting your own company's slide? From the 2007 tech analyst day.

    Client perf/W up X, server up 2X, HPC up 3 TO 4X

    Clearly, design choices were made to favor server throughput improvements over client improvements, or else the little bar graph lines wouldn't be this way.

    I guess we'll be able to see a full desktop evaluation in about... oh, 15 months, unless you guys release some perf data early.
    It isn't a zero-sum game .

  22. #297
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    surprisingly, they didn't include any more details about decode.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  23. #298
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by xVeinx View Post
    It isn't a zero-sum game .
    Hence, "relative", however, there are trade-offs.

  24. #299
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    terrace love only Sandy bridge
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  25. #300
    Xtreme Addict
    Join Date
    Jan 2009
    Location
    SF
    Posts
    1,070
    Quote Originally Posted by FlanK3r View Post
    terrace love only Intel
    fixed.

Page 12 of 39 FirstFirst ... 2910111213141522 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •