MMM
Page 9 of 39 FirstFirst ... 678910111219 ... LastLast
Results 201 to 225 of 954

Thread: AMD's Bobcat and Bulldozer

  1. #201
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    Quote Originally Posted by Mats View Post
    What's the reason behind the odd die shape, do you know?
    The synthesis starts with rectangular shapes but the logic migrates
    during the optimization process. Some pieces of one unit even end up
    in the middle of other units (typically interface logic between the two
    units) For some reason the hardware synthesizer concludes that it's
    electrically/timingwise better to move it there.


    Regards, Hans

  2. #202
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Sweden
    Posts
    2,081
    Ok, thanks for the explanation, both of you!

  3. #203
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Sweden
    Posts
    2,081
    So the Hot Chip conference ended five minutes ago, where will we find the first reports from it?

  4. #204
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    843
    Quote Originally Posted by xlink View Post
    fewer execution units...

    those three factors coupled with other architectural improvements would need to be able to be at least 50% faster in some instances.
    Today's processors have 3 execution units that are shared between ALU/AGU. That is essentially 1.5 ALU and 1.5 AGU. With BD we get 2 AGU and 2 ALU. Much better.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  5. #205
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,176
    Quote Originally Posted by god_43 View Post
    honestly wtf cares about single thread performance? geez if ppl cared about it, they would not be buying dual-cores even. the whole nature of multiple cores is for multi-threading, anything else is fail!
    Single thread performance does not equal single core purpose.

    Example:

    I play dolphin emulator with 4 cores set to individual threads.
    If the single thread performance is low, the game is not playable unless you like choppy slow motion.

    If I have a 4ghz i3, the game is fast and smooth, despite having 2 less cores.

    Individual core performance is FAR FAR FAR FAR FAR more important for home users and gamers than core count.

    I'm not running a server, I'm running a performance machine that requires fast processing of audio threads to avoid lag or stuttering, same for games, same for time dependent applications like my sql real time stats head up display for stocks analysis.

    Screw having 999 cores, just give me 4 FAST ones.

  6. #206
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Quote Originally Posted by JF-AMD View Post
    Today's processors have 3 execution units that are shared between ALU/AGU. That is essentially 1.5 ALU and 1.5 AGU. With BD we get 2 AGU and 2 ALU. Much better.
    2 mem ops per core per cycle
    not 3 mem ops(2 load and 1 store) per core per cycle
    but .............
    -------------------------------------
    FPU with
    2x128 bit MMX
    2x128 bit FMAC
    --> non SSE: 4 DP per cycle ?
    --> SSE 128 bit: 8 DP per cycle ?
    --> SSE 256 bit: 8 DP per cycle ?
    Last edited by vietthanhpro; 08-24-2010 at 07:59 PM.
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  7. #207
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,173
    Quote Originally Posted by vietthanhpro View Post
    2 mem ops per core per cycle
    not 3 mem ops(2 load and 1 store) per core per cycle
    but .............
    Since 10h could do 1x128bit load and 1x64bit store,basically 2x load capability of K8,without even using the 3rd AGU(look at AT article,3rd AGU was redundant/unused but kept for other reasons-symmetry of the units),i don't see how AMD couldn't double this with BD. BD will have a full OoO load/store capability,different what K8->10h brought(limited to loads only). There are other clues about 2 load/1 store per core,namely GCC source code that describes BD scheduling.
    Last edited by informal; 08-24-2010 at 07:59 PM.

  8. #208
    Registered User
    Join Date
    Nov 2008
    Posts
    28
    Looks like full slides are up at anand:

    http://www.anandtech.com/show/3865/a...tations-online

    No transcripts yet. Some interesting stuff about pointers being used to prevent unncessary data transfers in bobcat.

  9. #209
    Xtreme Enthusiast
    Join Date
    Dec 2008
    Location
    Austin, Texas
    Posts
    599
    Quote Originally Posted by Hans de Vries View Post
    1) JF told you at the other thread that IPC is higher.

    2) If that's true then the higher frequency design comes on top of that.

    3) And last but not least: Power gating Turbo now allows much higher single core frequencies.


    Looks like a 1-2-3 speed bump for single thread performance to me....


    Regards, Hans
    I've been spending some time thinking about client loads on BD. Improving per core power consumption & max frequency required significant rethinking of the x86 pipeline and we will see these differences play out as changes in IPC performance across workloads. Outside of desktop these are necessary to improving mobile performance and battery life which in my view will remain a consumer-felt issue for decades to come. I think of the smartphone and tablet like the original IBM PC. We've just begun.

    Bobcat will bring x86 and the performance of the PC experience - multi-tasking, web experience, media processing, standards-based connectivity - to ultra mobility and appliance computing. My biggest concern for Bobcat is the application environments for some of the potential form factors. I want Bobcat and its descendants in an alarm clock, but I want an application environment well suited to an alarm clock. Voice, remote gesture, intelligent agency, handheld graphics need an application environment built around them, not extending to them. Sitting down and using this platform was an interesting experience because I found myself pushing it around in all the wrong ways - it wants to live in a small ultra-mobile device or a specialty client and to do things suited to those form factors. With novel IPC/TDP combinations I am hopeful for form factor and usage innovation.

  10. #210
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    This one is the most important slide i think :

    http://www.anandtech.com/Gallery/Album/754#16

  11. #211
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Chumbucket843 View Post
    lol, it is the exact opposite. hand layout is much better. humans are better at finding eulerian paths and coming up with clever layouts. computers cant really do that with all of the design rules and other parameters as effectively. the difference in performance is 2.6-7x faster with custom designed circuits.

    really what happens is a coder will simulate his module and make sure it reaches the targeted timing, which is usually much higher than actual delay to assure robust operation. if the logic cant reach the speed it is either rewritten or circuit designers optimize it. in certain logic families it must be entirely custom designed.

    circuits that are custom designed are usually things like power gating, clock distribution, and analog circuits such as pll's, dll's, and memory controllers/ io pads.
    I've heard humans is better at those tasks, but that was many years ago, I thought that computers would be better at this point. Honestly I don't see any reason why computers would fail at such "logical" tasks.

  12. #212
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,794
    Given the information that AMD has released today, is it possible for anyone here to make an educated guess on how much faster BD will be clock for clock over Deneb? Those slides go beyond my basic understanding of CPU design.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  13. #213
    Registered User
    Join Date
    Jun 2009
    Location
    India
    Posts
    28

    Point to note is that L2 will run at half the speed of core.
    I think that used to be the case for past architectures as I don't know any modern x86 core that does that or is there any?
    Last edited by I_no; 08-24-2010 at 11:04 PM.
    Are there more ways to love than to break a perfectly working computer?
    Don't think so.

  14. #214
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by freeloader View Post
    Given the information that AMD has released today, is it possible for anyone here to make an educated guess on how much faster BD will be clock for clock over Deneb? Those slides go beyond my basic understanding of CPU design.
    Phenom was bad in branch prediction, so AMD improved it far beyond intel's best, just watch the die space used by it on the only "litle bobcat". It's just impressive. Alu Pipes seem to have been change from 3ALU/AGU that can do 1.5 of each per cyle to 2AGU + 2ALU that can do 2 per cylce. So more IPC.

    About L1D it's not very clear, we need know if it's inclusive or not. So it can be incredible faster, or simply as good as old Phenom II, latency is said to be masked.

    L2 latency is 17cycles, so it's not bad. and seem to be 1MB and shared between 2 Alu cores, so that mean less data will go L3 to change of core.

    Pipe is longer, but aimed for ramping up clocks, and prediction is far better to hide the bad effect of the long pipe.

    It's gonna be the "core 2" effect i think.

  15. #215
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by I_no View Post
    Point to note is that L2 will run at half the speed of core.
    I think that used to be the case for past architectures as I don't know any x86 core that does that or is there any?
    L3 is not full speed in Phenom i think.
    And L2 half speed in bobcat help the power to stay low, that's the first goal and the IPC don't go too bad.

    For BD it's full speed.

  16. #216
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,794
    The only other thing I've read that's got me confused is the following...

    http://www.rage3d.com/articles/amd_h.../index.php?p=5

    "Bulldozer on the Desktop"

    For the desktop, the Zambezi processor is good news and bad news. The good news is it's an 8 core product, the bad news is it needs a new socket - AM3r, or AM3+. This is an electrical upgrade of the AM3 platform, to provide the power phases and planes/states required by the power gating features of Zambezi. As you might have guessed from the name, this socket is backwards compatible with existing AM3 processors,.....


    So BD is not compatible with AM3? Not a big deal as a new arch usually requires a new socket anyhow. I've read up to this point that BD would be compatible with AM3.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  17. #217
    Xtreme Mentor
    Join Date
    Aug 2006
    Location
    HD0
    Posts
    2,646
    Quote Originally Posted by -Boris- View Post
    I've heard humans is better at those tasks, but that was many years ago, I thought that computers would be better at this point. Honestly I don't see any reason why computers would fail at such "logical" tasks.
    transistor count is going up faster than performance is increasing.. It's getting more and more complex overall. Computers can generate a solid generic path which is OK, but humans still need to tweak it.

  18. #218
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by xlink View Post
    transistor count is going up faster than performance is increasing.. It's getting more and more complex overall. Computers can generate a solid generic path which is OK, but humans still need to tweak it.
    Yeah, but one might think that computers would be better at the rough overall layout, better utilization of die space and so on. And while transistor count is going up faster than performance, the human brain is the same.

  19. #219
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Anyone willing to take bets on cache inclusiveness/exclusiveness?

    I'd bet all I've got for inclusive cache. ..unless cutting the L1D down by 75 % is merely for smaller die size. But in that case it would be only logical to go inclusive, because the benefit of exclusive cache would be lost. This again would require some reworking, but would allow far greater cache latencies and bandwidth. I'd also claim that the increase in performance would be higher than with Athlon II with no L3 vs. PhII with L3, which is around 5-6%, ranging from 0 to ~20%.
    Last edited by Calmatory; 08-25-2010 at 12:28 AM.

  20. #220
    Xtreme Member
    Join Date
    May 2005
    Posts
    159
    I'm not sure if this has been posted already, but it might be useful to some.

    http://www.youtube.com/watch?v=VIs1CxuUrpc
    Quote Originally Posted by Movieman
    been lots of years since I played with an AMD and this is just an hour so bear with me..
    My first thoughts on it is that it's fast, it's smoothe and it's fun.
    Quote Originally Posted by Movieman
    Yes, the i7 does have the edge in pure grunt but then again the AMD has that little something I can't quite put my finger on except to use that word 'smoother" and that will get me flamed faster than posting kiddy :banana::banana::banana::banana: on the Christian networks site.
    Main Rig: Phenom II 550 (x4) @3.9Ghz - Gigabyte 6950@6970 - Asus M4A-785D M Pro - Samsung HDs 2x2TB,1x1.5TB,2x1TB - Season X-650 | OpenCL mining rigs: 2x Phenom II 555(x4) - 1xMSI 890FXA-GD70 - 1xGB 990FXA-UD7 (SICK ) - 1xHD6990 - 1x6950@70 - 6x5850 - 2xCooler Master Silent Pro Gold 1kW

  21. #221
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by Calmatory View Post
    Anyone willing to take bets on cache inclusiveness/exclusiveness?

    I'd bet all I've got for inclusive cache. ..unless cutting the L1D down by 75 % is merely for smaller die size. But in that case it would be only logical to go inclusive, because the benefit of exclusive cache would be lost. This again would require some reworking, but would allow far greater cache latencies and bandwidth. I'd also claim that the increase in performance would be higher than with Athlon II with no L3 vs. PhII with L3, which is around 5-6%, ranging from 0 to ~20%.
    I bet for an exclusive L1-I and inclusive L1-D, and L2-L3 exclusive.

  22. #222
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,885
    Quote Originally Posted by -Boris- View Post
    I've heard humans is better at those tasks, but that was many years ago, I thought that computers would be better at this point. Honestly I don't see any reason why computers would fail at such "logical" tasks.


    because computers cant think for themselves to implement complicated logic maybe ????
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  23. #223
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Sn0wm@n View Post
    because computers cant think for themselves to implement complicated logic maybe ????
    And you think it must be capable of thought to arrange predetermined connections in an even and efficient way?

  24. #224
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Way too much discussion for me to answer everything in detail. Some thoughts:

    Who said, that each mem op in BD actually needs an AGU op? Could there also be a fast path (single addition) address generation somewhere else? Do addresses have to be calculated each time?

    3 ALUs/3 AGUs with their respective reservation stations were used in a symmetrical configuration in K8 to create OoO opportunities for execution. µOps couldn't change their "lane". If that "add rax, rdx" was in reservation station 0, which is busy, while the other RS' were free, this instruction would still have to wait -> IPC goes down

    2 ALUs+2 AGUs having a unified scheduler would allow to use these units as they are available. There won't be any binding of our "add rax, rdx" to a busy ALU so it could execute on the free one -> IPC goes up (vs. reservation stations).
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  25. #225
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,885
    Quote Originally Posted by -Boris- View Post
    And you think it must be capable of thought to arrange predetermined connections in an even and efficient way?


    computers need to be programed by humans in order to make stuff up ... but even then we dont make perfect stuff up .. so it would be easier for a human to design the best layout by hand then let a robot do the work ... that's all im saying.... some logic can be done by computers .. but some complicated logic portion of a chip would indeed benefit a human intervention
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

Page 9 of 39 FirstFirst ... 678910111219 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •