Page 15 of 39 FirstFirst ... 51213141516171825 ... LastLast
Results 351 to 375 of 954

Thread: AMD's Bobcat and Bulldozer

  1. #351
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    If I ran AMD, I would redirect the company's effort toward building a low-cost, low-power, high-density, flash-based cloud server platform around Bobcat. Intel's Justin Rattner has admitted that for certain cloud workloads, these types of high-density solutions are superior to a monolithic server chip like Xeon. So AMD should stop obsessing over netbooks and monolithic server parts—both of these amount to fighting the last war—and just jump straight into the cloud server market that ARM is set to tackle with its upcoming Eagle part.
    the idea would be to use a mass array of bobcat cores vs a traditional 2p/4p BD?

  2. #352
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Dresdenboy View Post
    A quick and raw estimation of single threaded performance for Zambezi based on the 50% number given for Interlagos (just to show, what has to be counted in at the least):

    Relative_perf_1_thread_to_AMD_fam_10h = (Perf_Magny_Cours*1.5 * 12 / 16) * Freq_ratio_of_half_#_of_Cores * Perf_boost_single_core_in_Module * Perf_boost_single_module_on_chip

    Freq_ratio_of_half_#_of_Cores = 3.2/2.3 = 1.39
    Perf_Magny_Cours = 1
    Perf_boost_single_core_in_Module = 1.11 (while going from 90% back to 100%)
    Perf_boost_single_module_on_chip = 1.3 (some cheap turbo)

    Relative_perf_1_thread_to_AMD_fam_10h = (1 * 1.5 * 12/16) * 1.39 * 1.11 * 1.3 = 2.26

    So with some frequency scaling a Zambezi core will be about 126% faster than a core running in a 2.3GHz MC without turbo. This would equal a 5.2GHz PhII core.

    This is just speculation. Anyone is invited to check this.
    That's very interesting prediction.It looks like 25%(1.125x1.11) is the core vs core improvement(or what some like to call IPC),while the rest is the improvement in the starting clock speed and power gated Turbo.In any case,the 5.2Ghz Phenom II level speed out of the box with Zambezi ,in single threaded apps,is what some might call leapfrog performance jump .

  3. #353
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by =SOC= Admiral View Post
    Im a gamer too and currently I can only use one gpu for gaming as I have amd but another limitation is that none of my cards are of the same generation. I have 3 cards. I will most likely be selling my 8800 and will be getting the CIVE when it launches in September. Hopefully I can get my 1055T to 4.0GHz on that board. Just a question where is the FSB located for AMD? Is it on the mobo or the cpu?

    GTX470
    GTX275
    8800GTS 512
    There is no more FSB on Athlon from old first K8. It use hypertransports links. It's like QPI but far more advanced.

  4. #354
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Sweden
    Posts
    2,084
    Quote Originally Posted by -Boris- View Post
    And power usage isn't the only limitation to high frequency. Even if you have the headroom power wise you can't just clock higher. If that were the case it would be possible to clock an Intel Atom to insane performance.

    So even if you give one module four times as high power headroom it will not necessary improve turbo capabilities much more than just 50% more headroom would.
    Wow. Can you be any more boring?
    I just finished soldering 48 coils and field effect transistors to my Atom board and now this?

  5. #355
    Xtreme Enthusiast
    Join Date
    Dec 2008
    Location
    Austin, Texas
    Posts
    599
    Quote Originally Posted by terrace215 View Post
    They are optimizing for server application throughput, at the expense of client low-threaded performance. That might make sense for them, considering the initial target market is virtually all server/hpc. In client, they have Llano in the middle, and Ontario down low... so maybe they decided they couldn't be all things to all segments with BD.
    The question is, where is the client user pain on low thread applications? And if there is some pain, is it better addressed by GPGPU?

    My belief is that if you take the subset of applications which are low thread count, will not benefit from GPGPU, and cause user pain - you have an empty set. And if there remain a few app classes, BD IPC and frequency serve well.

    With BD based desktops discrete is the most likely scenario, so an abundance of GPGPU and graphics performance to speed low thread apps.

    I think we are talking less and less about trying to solve existing app class performance problems, and more about opening new app classes.
    Last edited by 64NOMIS; 08-26-2010 at 07:59 AM.

  6. #356
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by -Boris- View Post
    Just to make some things clear.

    Deeper Pipeline != Higher Frequency
    Shorter Pipeline != Higher IPC

    I have a feeling that people have read some P4 articles and made some conclusions of their own. To achieve high frequency a deep pipe can help, but there are reasons to put more stages in a processor other than frequency. And the other way around, more stages don't automatically lower IPC.
    ummm... no. the concept of pipelining is to increase throughput at the cost of latency.

    P4 is an extremely good example of how hyperpipeling exacerbates current problems such as branch prediction and cache misses.

    simplistic logic would assume that doing more things in one clock cycle means higher ipc thus pipeling increases ipc.

    there are dependencies in between instructions.
    http://en.wikipedia.org/wiki/Data_dependency

    if you miss a branch you have to flush all n stages of the pipelines. the probablity of an misprediction increases exponentially with pipeline length. this means doubling BP accuracy will increase clockspeed linearly. that's not ideal and a waste of xtor budget.

    if you miss a cache line all dependent instructions have to wait for that result. OoO will only hide so much latency and it must keep the FIFO policy.

    we have reached a point of diminishing returns for pipelining. at every level it makes things more complex, from architecture to circuit to layout.

    it's obvious that amd knows this but saying intel did it wrong and amd did it right/better is a foolish way to look at it. a lot of decisions are based off of what the design team is good at. they are going to do things differently.
    Depending on the nature of the stages, more stages can actually improve IPC.
    frequency depends on the nature of the stage. the instructions being executed are independent of the hardware.

  7. #357
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by JF-AMD View Post
    Ssssshhhhhh! Don't tell terrace. He'll say "Single-threaded performance will have significant improvement"
    I'd be more likely to point out that an increase in perf/W/mm^2 , be it single-thread or throughput is more interesting to AMD & its investors than it is to any potential users / customers.

    Particularly when it coincides with a process shrink.

    Make the same claim on a slide *without* the mm^2 part, and remove the item on your other slide that talks about minimizing single-threaded losses, and we'll talk again.

    (Sometimes, engineering reveals things in their presentations that marketing can't fully spin.)
    Last edited by terrace215; 08-26-2010 at 08:35 AM.

  8. #358
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by terrace215 View Post

    Make the same claim on a slide *without* the mm^2 part, and remove the item on your other slide that talks about minimizing single-threaded losses, and we'll talk again.
    The minimizing single thread losses is in reference to a module design.When 2 threads run on the module there is a 10% penalty when compared a single thread running on a module(this implies actually that single thread performance will be better by those 10%).It has nothing to to with minimizing single thread losses compared to present design(aka 10h).They are simply stating that the integer core grouping in Bulldozer does not hinder much the "strength" of the single threads running in parallel on a module(10% is the penalty).

  9. #359
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by mindfury View Post
    Single-threaded performance will have significant improvement.
    As posted above, unfortunately the claim is only for an improvement in:

    performance per watt per mm^2

    Shrinking thuban from 45nm to 32nm would accomplish that easily.

  10. #360
    Xtreme Member
    Join Date
    Aug 2009
    Posts
    244
    Quote Originally Posted by terrace215 View Post
    As posted above, unfortunately the claim is only for an improvement in:

    performance per watt per mm^2

    Shrinking thuban from 45nm to 32nm would accomplish that easily.
    Single-threaded performance per mm^2?

    How could you measure that?

    Your understanding skills are ridiculous.
    Last edited by mindfury; 08-26-2010 at 08:48 AM.

  11. #361
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by terrace215 View Post
    As posted above, unfortunately the claim is only for an improvement in:

    performance per watt per mm^2

    Shrinking thuban from 45nm to 32nm would accomplish that easily.
    Yeah but that won't bring the IPC up as BD will do .
    Check out Dresdenboy's post about the possible Zambezi performance.

  12. #362
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by informal View Post
    The minimizing single thread losses is in reference to a module design.When 2 threads run on the module there is a 10% penalty when compared a single thread running on a module(this implies actually that single thread performance will be better by those 10%).It has nothing to to with minimizing single thread losses compared to present design(aka 10h).They are simply stating that the integer core grouping in Bulldozer does not hinder much the "strength" of the single threads running in parallel on a module(10% is the penalty).
    The module design ALSO involved making the non-shared elements leaner (max 2 ALU ops in parallel) than the current cores in some aspects.

  13. #363
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by mindfury View Post
    Single-threaded performance per mm^2?

    How could you measure that?
    Um, measure the performance. Note the power used (for the per W) part. Note the core size used to run it. Compare to previous offering.

    The "per mm^2" is from AMD's claim, btw, check the slide you posted.

  14. #364
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by informal View Post
    Check out Dresdenboy's post about the possible Zambezi performance.
    There would be no point for anyone to choose Zambezi over octal Sandy, except price, possibly.

    Also, JF admitted elsewhere that the 50% / 33% claim for Interlagos / MC has been substantially juiced by including, in the aggregate used to measure, serial workloads that benefit from the Turbo that MC lacked. While adding Turbo is a good thing, this means the "fully-parallel" throughput improvement is considerably less than 50%. In order to make sense of the claim, you're going to need to see exactly what they've chosen to average over, and at that point, you'll likely have access to simpler zambezi benchmarks of all sorts.
    Last edited by terrace215; 08-26-2010 at 08:59 AM.

  15. #365
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Chumbucket843 View Post
    ummm... no. the concept of pipelining is to increase throughput at the cost of latency.

    P4 is an extremely good example of how hyperpipeling exacerbates current problems such as branch prediction and cache misses.

    simplistic logic would assume that doing more things in one clock cycle means higher ipc thus pipeling increases ipc.

    there are dependencies in between instructions.
    http://en.wikipedia.org/wiki/Data_dependency

    if you miss a branch you have to flush all n stages of the pipelines. the probablity of an misprediction increases exponentially with pipeline length. this means doubling BP accuracy will increase clockspeed linearly. that's not ideal and a waste of xtor budget.

    if you miss a cache line all dependent instructions have to wait for that result. OoO will only hide so much latency and it must keep the FIFO policy.

    we have reached a point of diminishing returns for pipelining. at every level it makes things more complex, from architecture to circuit to layout.

    it's obvious that amd knows this but saying intel did it wrong and amd did it right/better is a foolish way to look at it. a lot of decisions are based off of what the design team is good at. they are going to do things differently.

    frequency depends on the nature of the stage. the instructions being executed are independent of the hardware.
    Stages can be dedicated to branch prediction, those stages don't increase frequency, and if they wouldn't increase IPC they wouldn't be there.

  16. #366
    Xtreme Member
    Join Date
    Aug 2009
    Posts
    244
    Quote Originally Posted by terrace215 View Post
    Um, measure the performance. Note the power used (for the per W) part. Note the core size used to run it. Compare to previous offering.
    The core needs other parts on chip to work properly,so Single-threaded performance per mm^2 is totally nonsense.


    Quote Originally Posted by terrace215 View Post
    The "per mm^2" is from AMD's claim, btw, check the slide you posted.
    That slide only said "Single-threaded performance",not "Single-threaded performance per mm^2".

  17. #367
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by terrace215 View Post
    The module design ALSO involved making the non-shared elements leaner (max 2 ALU ops in parallel) than the current cores in some aspects.
    First of all AMD is not disclosing all the details of the integer core organization(those AGen units could also do some math ops for example).Second of all ALU units in 10h were hindered in many ways(uops couldn't switch lanes),3rd ALU was there just for a possibly better OoO execution opportunity ,under utilization problem was present,3rd AGU was redundant as AT was told,etc. Now we have leaner and meaner integer core that can easily be 20% faster than previous 10h integer unit(this is an end effect which counts in all the new goodies AMD brought into Bulldozer design as extremely improved prefetching,branch prediction,full OoO loads/stores,double L/S BW to L1 cache,decoupled predict and fetch pipelines,branch fusion,unified integer scheduler for math and address ops,shared L2,prediction directed instruction prefetch etc).

    PS Note the wording : "Throughput advantages for multi threaded workloads without significant losses on serial single-threaded workload components"
    This clearly shows that they mean what I already wrote in my previous post: there is minimal loss for 2 threads(per thread) when both are being executed in parallel inside a module .

  18. #368
    Xtreme Addict
    Join Date
    Mar 2005
    Location
    Rotterdam
    Posts
    1,553
    Are we still wasting 2 pages trying to debunk what terrace picked up in a couple of AMD slides and is using as argument against BD for 5 months straight?

    Good god..

    I'm simply godsmacked at JF-AMD's patience to keep posting here with so much trolling around.
    Last edited by sierra_bound; 08-26-2010 at 06:13 PM.
    Gigabyte Z77X-UD5H
    G-Skill Ripjaws X 16Gb - 2133Mhz
    Thermalright Ultra-120 eXtreme
    i7 2600k @ 4.4Ghz
    Sapphire 7970 OC 1.2Ghz
    Mushkin Chronos Deluxe 128Gb

  19. #369
    Xtreme Addict
    Join Date
    Jan 2008
    Location
    milwaukee
    Posts
    1,683
    just reading the thread you would think this thread was called "101 factless reasons terrace thinks BD will suck "
    Last edited by crazydiamond; 08-26-2010 at 09:32 AM.
    LEO!!!!
    amd phenom II x6 1100T | gigabyte 990fxa-ud3 . .
    2x2gb g.skill 2133c8 | 128gb g.skill falcon ssd
    sapphire ati 5850 | x-fi xtrememusic. . .
    samsung f4 2tb | samsung dvdrw . .
    corsair tx850w | windows 7 64-bit.
    ddc3.25 xspc restop | ek ltx | mc-tdx | BIP . .
    lycosa-g9-z2300 | 26" 1920x1200 lcd .

  20. #370
    Xtreme Mentor
    Join Date
    Nov 2006
    Location
    Spain, EU
    Posts
    2,949
    Quote Originally Posted by Dresdenboy View Post
    So with some frequency scaling a Zambezi core will be about 126% faster than a core running in a 2.3GHz MC without turbo. This would equal a 5.2GHz PhII core.
    I'd toss a few GHz more in there.

    Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)


    Quote Originally Posted by PerryR, on John Fruehe (JF-AMD) View Post
    Pretty much. Plus, he's here voluntarily.

  21. #371
    Registered User
    Join Date
    Sep 2009
    Posts
    77
    Another information-integrate diagram made by Hiroshige Goto

    http://pc.watch.impress.co.jp/docs/c...27_389491.html

    translated: http://translate.google.com/translat...ml&sl=ja&tl=en

    Last edited by superrugal; 08-26-2010 at 10:09 AM.

  22. #372
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by mindfury View Post
    That slide only said "Single-threaded performance",not "Single-threaded performance per mm^2".
    Erm, it's a sub-bullet of "Significant improvement in Performance/Watt/mm2"

    You want to pretend that the sub-bullet is not related to its section title?

    Well, in that case, there's no mention of "significant improvement" either.

    BTW, why don't you write a letter to AMD, telling them that "Performance/Watt/mm2" does not make sense.

    You realize AMD *has* estimations of single-thread IPC comparison for both integer and fp workloads at this point, relative to Phenom II, right?
    As we're talking IPC, these don't depend on final clocks. So, if they wanted to, they could put out a bullet that says, for example: "estimated single-threaded IPC gains of 15-20% on integer workloads, compared to Ph-II."

    Now JF will tell you that he "doesn't want his competitor to know this information" yet. Do you buy that? Both BD and SB core designs are long locked-down at this point.

    I think he doesn't want YOU to know it, because the (largely client-oriented) fan base is going to be disappointed, and that would be a negative for AMD.
    Last edited by terrace215; 08-26-2010 at 10:23 AM.

  23. #373
    Xtreme Member
    Join Date
    Aug 2009
    Posts
    244
    Quote Originally Posted by terrace215 View Post
    Erm, it's a sub-bullet of "Significant improvement in Performance/Watt/mm2"

    You want to pretend that the sub-bullet is not related to its section title?

    Well, in that case, there's no mention of "significant improvement" either.
    It is related to "Significant improvement",not "Performance/Watt/mm2".That's why they only mention "Performance" instead of "Performance/Watt/mm2".

    Quote Originally Posted by terrace215 View Post
    BTW, why don't you write a letter to AMD, telling them that "Performance/Watt/mm2" does not make sense.
    The whole chip's "Performance/Watt/mm2" make sense,but single-threaded performance/mm2 doesn't make sense in a multicore chip.

    You realize AMD *has* estimations of single-thread IPC comparison for both integer and fp workloads at this point, relative to Phenom II, right?
    As we're talking IPC, these don't depend on final clocks. So, if they wanted to, they could put out a bullet that says, for example: "estimated single-threaded IPC gains of 15-20% on integer workloads, compared to Ph-II."

    Now JF will tell you that he "doesn't want his competitor to know this information" yet. Do you buy that? Both BD and SB core designs are long locked-down at this point.

    I think he doesn't want YOU to know it, because the (largely client-oriented) fan base is going to be disappointed, and that would be a negative for AMD.
    Nonsense FUD.
    Last edited by mindfury; 08-26-2010 at 10:35 AM.

  24. #374
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by mindfury View Post
    It is related to "Significant improvement",not "Performance/Watt/mm2"
    I see, so you just pick the parts you like! Fun.


    The whole chip's "Performance/Watt/mm2" make sense,but single-threaded performance/mm2 doesn't make sense in a multicore chip.
    I suggest you consider whole_chip(Perf/W/mm2)/#threads -- presumably you think THAT makes sense, as well, since we're just dividing by a constant for each part being compared. Now, is it so hard to see why you can talk about the perf/W/mm2 for a single-threaded workload?

    Nonsense FUD.
    Ok, just don't get mad later on...
    Last edited by terrace215; 08-26-2010 at 10:52 AM.

  25. #375
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Sweden
    Posts
    2,084
    Quote Originally Posted by terrace215 View Post
    I see, so you just pick the parts you like!
    So do you.

Page 15 of 39 FirstFirst ... 51213141516171825 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •