MMM
Results 1 to 25 of 954

Thread: AMD's Bobcat and Bulldozer

Hybrid View

  1. #1
    Registered User
    Join Date
    Dec 2008
    Location
    Chicago
    Posts
    49
    Quote Originally Posted by Chumbucket843 View Post
    is that a joke? that's the most ridiculous floorplan i have ever seen. it looks like a map or a cloud or something.
    Computer synthesized, only a few things are laid out by hand.

  2. #2
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by deeperblue View Post
    Computer synthesized, only a few things are laid out by hand.
    synthesizers do not floorplan. generally modern logic blocks are 50-100K gates, probably around 300-600K transistors. this is a rather large chunk when the core itself, including L1 & L2 it is probably <20M transistors.

  3. #3
    Registered User
    Join Date
    Dec 2008
    Location
    Chicago
    Posts
    49
    Quote Originally Posted by Chumbucket843 View Post
    synthesizers do not floorplan. generally modern logic blocks are 50-100K gates, probably around 300-600K transistors. this is a rather large chunk when the core itself, including L1 & L2 it is probably <20M transistors.
    AMD says (http://www.youtube.com/watch?v=VIs1CxuUrpc)
    "Synthesizable with small number of custom arrays"
    Together with what was said before I think one of the main goals that AMD wants to achieve is to have easily customizable processors. Add a gpu core here, some cache there and another core here. From the slide it looks like lots of their process is already capable of being laid out by a computer.
    We have the caches, the integer units and the floating point units being the fixed hand optimized blocks with stuff like the x86 decode organically filling up the space in between. AMD also says it makes it easier to put the whole thing on a different process.

    I've only limited knowledge about modern synthesizing and floor planning from working with some FPGAs.
    Maybe Hans or somebody in the industry can say something about Bobcat?

  4. #4
    Xtreme Cruncher
    Join Date
    Apr 2005
    Location
    TX, USA
    Posts
    898
    Quote Originally Posted by Chumbucket843 View Post
    synthesizers do not floorplan. generally modern logic blocks are 50-100K gates, probably around 300-600K transistors. this is a rather large chunk when the core itself, including L1 & L2 it is probably <20M transistors.
    Not exactly sure which context your mean when you say they "do not floorplan", but they definitely allow floorplanning at some level. The first step of synthesis, RTL -> Netlist, doesn't floorplan (it just cares about standard cells usage & timing estimates/constraints), if that's what you're trying to get at. However, the second step of synthesis, Netlist -> Placement (placement tool), definitely does floor-planning.

    Tools like Cadence Encounter take floorplan constraints and allow for partitioning sub-modules, however like the picture above shows the results tend to look like a jumbled mess, since strict boundaries aren't adhered to.
    Quote Originally Posted by -Boris- View Post
    Finally, chips from AMD has always been nicely ordered, pointing at a mostly hand made layout. I can imagine that it leads to an uneven power usage and unnecessary long circuits and timings. And wastes die space.
    Well, historically [x86] chips from both camps have always been mainly custom layout in the datapath with a varying amount of synthesized control logic, seeing pictures like this is a bit of an eye opener from the norm

    Another example is Intel's Pine-Trail (bigger):

    The huge purple blotch running down the middle is all synthesized logic
    Quote Originally Posted by deeperblue View Post
    AMD says (http://www.youtube.com/watch?v=VIs1CxuUrpc)
    "Synthesizable with small number of custom arrays"
    Together with what was said before I think one of the main goals that AMD wants to achieve is to have easily customizable processors. Add a gpu core here, some cache there and another core here. From the slide it looks like lots of their process is already capable of being laid out by a computer.
    We have the caches, the integer units and the floating point units being the fixed hand optimized blocks with stuff like the x86 decode organically filling up the space in between. AMD also says it makes it easier to put the whole thing on a different process.

    I've only limited knowledge about modern synthesizing and floor planning from working with some FPGAs.
    Maybe Hans or somebody in the industry can say something about Bobcat?
    You pretty much sum up my thoughts on the matter, it looks like they shot for a semi-custom approach by supplying some of the main datapath logic (not necessarily say the whole FPU, etc., just the important chunks) and the arrays as hard-macros/external-IP (in- or out-of-house, doesn't matter) while synthesizing the rest.

    While they're definitely not unique in the approach, it will certainly provide a quicker process adaptation, since only a standard cell library and select logic/array-IP pieces would technically be necessary. Granted there's still a bit more work than just swapping libraries/IP and pressing a few buttons
    Quote Originally Posted by Chumbucket843 View Post
    lol, it is the exact opposite. hand layout is much better. humans are better at finding eulerian paths and coming up with clever layouts. computers cant really do that with all of the design rules and other parameters as effectively. the difference in performance is 2.6-7x faster with custom designed circuits.

    really what happens is a coder will simulate his module and make sure it reaches the targeted timing, which is usually much higher than actual delay to assure robust operation. if the logic cant reach the speed it is either rewritten or circuit designers optimize it. in certain logic families it must be entirely custom designed.

    circuits that are custom designed are usually things like power gating, clock distribution, and analog circuits such as pll's, dll's, and memory controllers/ io pads.
    Just pointing out that while we humans can definitely be more adapt at coming up with these clever (sometimes novel) solutions to optimizing layout area/timing/congestion-constraints, it's also a significant capital and time investment, so it's for ROI and time to market reasons that it doesn't always work out. The case of Bobcat is obvious an example of this, and Atom for that matter.

    Honestly I would find the logically optimal euler path to be much easier for a computer to solve
    But yes computerized tools aren't very good when it comes to balancing the plethora of added constraints in a physical world, hence us restricting them to sub-optimal standard cells + wiring constraints.



  5. #5
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    Quote Originally Posted by deeperblue View Post
    AMD says (http://www.youtube.com/watch?v=VIs1CxuUrpc)
    "Synthesizable with small number of custom arrays"
    Together with what was said before I think one of the main goals that AMD wants to achieve is to have easily customizable processors. Add a gpu core here, some cache there and another core here. From the slide it looks like lots of their process is already capable of being laid out by a computer.
    We have the caches, the integer units and the floating point units being the fixed hand optimized blocks with stuff like the x86 decode organically filling up the space in between. AMD also says it makes it easier to put the whole thing on a different process.

    I've only limited knowledge about modern synthesizing and floor planning from working with some FPGAs.
    Maybe Hans or somebody in the industry can say something about Bobcat?
    Another nice example is the 1.9W TDP 2GHz hardmacro version of the dual
    core ARM cortex A9 in the TSMC 40G process (total size of only 6.7 mm2)

    http://www.arm.com/products/CPUs/Cor...ard-Macro.html



    Regards, Hans

  6. #6
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Sweden
    Posts
    2,084
    Quote Originally Posted by Hans de Vries View Post
    Another nice example is the 1.9W TDP 2GHz hardmacro version of the dual
    core ARM cortex A9 in the TSMC 40G process (total size of only 6.7 mm2)
    What's the reason behind the odd die shape, do you know?

  7. #7
    Xtreme Cruncher
    Join Date
    Apr 2005
    Location
    TX, USA
    Posts
    898
    Quote Originally Posted by Mats View Post
    What's the reason behind the odd die shape, do you know?
    It's supposed to still be a rectangle, the discrepancy is the white "missing" area is for external-IP (L2 cache SRAM mainly, but chip-level interfacing too).



  8. #8
    Xtreme Member
    Join Date
    Sep 2008
    Posts
    235
    Quote Originally Posted by Mats View Post
    What's the reason behind the odd die shape, do you know?
    The synthesis starts with rectangular shapes but the logic migrates
    during the optimization process. Some pieces of one unit even end up
    in the middle of other units (typically interface logic between the two
    units) For some reason the hardware synthesizer concludes that it's
    electrically/timingwise better to move it there.


    Regards, Hans

  9. #9
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by justin.kerr View Post
    lost planet 2 uses 12 threads, at 4.5Ghz it keeps all 12 above 70%
    How do you know? In a true 12 core design do you think it would be over 70%? Maybe it uses four cores, and the HT-threads easily gets maxed out. Four cores is around 66% of a quad.

    My point is, having a hexa core being utilized 70% can only mean at least 4 cores. It could be 12, but there is no way for you to see it, and since it isn't 100%, I seems like it isn't using all cores.

    Quote Originally Posted by deeperblue View Post
    Computer synthesized, only a few things are laid out by hand.
    Finally, chips from AMD has always been nicely ordered, pointing at a mostly hand made layout. I can imagine that it leads to an uneven power usage and unnecessary long circuits and timings. And wastes die space.

  10. #10
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by -Boris- View Post
    Finally, chips from AMD has always been nicely ordered, pointing at a mostly hand made layout. I can imagine that it leads to an uneven power usage and unnecessary long circuits and timings. And wastes die space.
    lol, it is the exact opposite. hand layout is much better. humans are better at finding eulerian paths and coming up with clever layouts. computers cant really do that with all of the design rules and other parameters as effectively. the difference in performance is 2.6-7x faster with custom designed circuits.

    really what happens is a coder will simulate his module and make sure it reaches the targeted timing, which is usually much higher than actual delay to assure robust operation. if the logic cant reach the speed it is either rewritten or circuit designers optimize it. in certain logic families it must be entirely custom designed.

    circuits that are custom designed are usually things like power gating, clock distribution, and analog circuits such as pll's, dll's, and memory controllers/ io pads.

  11. #11
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Chumbucket843 View Post
    lol, it is the exact opposite. hand layout is much better. humans are better at finding eulerian paths and coming up with clever layouts. computers cant really do that with all of the design rules and other parameters as effectively. the difference in performance is 2.6-7x faster with custom designed circuits.

    really what happens is a coder will simulate his module and make sure it reaches the targeted timing, which is usually much higher than actual delay to assure robust operation. if the logic cant reach the speed it is either rewritten or circuit designers optimize it. in certain logic families it must be entirely custom designed.

    circuits that are custom designed are usually things like power gating, clock distribution, and analog circuits such as pll's, dll's, and memory controllers/ io pads.
    I've heard humans is better at those tasks, but that was many years ago, I thought that computers would be better at this point. Honestly I don't see any reason why computers would fail at such "logical" tasks.

  12. #12
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    Given the information that AMD has released today, is it possible for anyone here to make an educated guess on how much faster BD will be clock for clock over Deneb? Those slides go beyond my basic understanding of CPU design.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  13. #13
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by freeloader View Post
    Given the information that AMD has released today, is it possible for anyone here to make an educated guess on how much faster BD will be clock for clock over Deneb? Those slides go beyond my basic understanding of CPU design.
    Phenom was bad in branch prediction, so AMD improved it far beyond intel's best, just watch the die space used by it on the only "litle bobcat". It's just impressive. Alu Pipes seem to have been change from 3ALU/AGU that can do 1.5 of each per cyle to 2AGU + 2ALU that can do 2 per cylce. So more IPC.

    About L1D it's not very clear, we need know if it's inclusive or not. So it can be incredible faster, or simply as good as old Phenom II, latency is said to be masked.

    L2 latency is 17cycles, so it's not bad. and seem to be 1MB and shared between 2 Alu cores, so that mean less data will go L3 to change of core.

    Pipe is longer, but aimed for ramping up clocks, and prediction is far better to hide the bad effect of the long pipe.

    It's gonna be the "core 2" effect i think.

  14. #14
    Xtreme Mentor
    Join Date
    Aug 2006
    Location
    HD0
    Posts
    2,646
    Quote Originally Posted by -Boris- View Post
    I've heard humans is better at those tasks, but that was many years ago, I thought that computers would be better at this point. Honestly I don't see any reason why computers would fail at such "logical" tasks.
    transistor count is going up faster than performance is increasing.. It's getting more and more complex overall. Computers can generate a solid generic path which is OK, but humans still need to tweak it.

  15. #15
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by xlink View Post
    transistor count is going up faster than performance is increasing.. It's getting more and more complex overall. Computers can generate a solid generic path which is OK, but humans still need to tweak it.
    Yeah, but one might think that computers would be better at the rough overall layout, better utilization of die space and so on. And while transistor count is going up faster than performance, the human brain is the same.

  16. #16
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by -Boris- View Post
    I've heard humans is better at those tasks, but that was many years ago, I thought that computers would be better at this point. Honestly I don't see any reason why computers would fail at such "logical" tasks.


    because computers cant think for themselves to implement complicated logic maybe ????
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  17. #17
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Sn0wm@n View Post
    because computers cant think for themselves to implement complicated logic maybe ????
    And you think it must be capable of thought to arrange predetermined connections in an even and efficient way?

  18. #18
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by -Boris- View Post
    And you think it must be capable of thought to arrange predetermined connections in an even and efficient way?


    computers need to be programed by humans in order to make stuff up ... but even then we dont make perfect stuff up .. so it would be easier for a human to design the best layout by hand then let a robot do the work ... that's all im saying.... some logic can be done by computers .. but some complicated logic portion of a chip would indeed benefit a human intervention
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •