Page 3 of 29 FirstFirst 12345613 ... LastLast
Results 51 to 75 of 719

Thread: AMD cuts to the core with 'Bulldozer' Opterons

  1. #51
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    Quote Originally Posted by haylui View Post

    If BD is JUST 10~30% faster than current K10.5, then I'm afraid that it could be another i7 vs K10.5 when facing SB
    A single BD module will be 10 to 30% faster than a dual core K10.5, but keep in mind that this single BD module is about the same size as a single Sandy Bridge core and maybe even smaller than this. Just my guess though. For multithreaded software this means that a single BD module will be 120 to 160% faster than a single K10.5 core. Quite a leap in performance.
    Quote Originally Posted by Manicdan View Post
    ^nice, i wish though they did a bit more comparing the die space to other chips, instead of just itself.
    I agree with you, but we can always do some guesswork.

    We should judge how this chip will perform once it's launched though, not from what we think to know so far.
    "When in doubt, C-4!" -- Jamie Hyneman

    Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |

  2. #52
    Xtreme Enthusiast
    Join Date
    Jun 2009
    Location
    Singapore
    Posts
    560
    from my limited understanding a BD module would be amd's answer to intel's single core with hyperthreading right? so if intel releases an 8 core SB with HT then amd's answer to it would be 8 module BD which would have 16 cores... that would be awesome. time to save up and splash on when BD comes to town
    Phenom Monsta - Gallery
    AMD Phenom II X6 1055T | MSI 790FX-GD70 | Dominator 1600 C8 8GB | 4770 CF | 2xWD640GB Raid0 | 2xWD1.5TB Raid1 | Corsair HX850 |Lian-Li PC-7FW
    Enzotech Luna Rev.A | 2 x MCW60 | MCP-350 | XSPC Dual DDC Res | TFC Monsta 420/360 Limited Edition


    Canon EOS 7D | EF-S 17-55 F2.8 IS | Nissin Di866 | D-Lite4 | 17" MiniSoft | 53" Midi-Octa | 7" Reflector + 20º Grid | Explorer XT SE | Crumpler 6MDH

  3. #53
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    i seriously hope that amd will deliver same amount of BD modules than intel cores. But i'm think this is not possible.



    Why ? Because BD cores a lot biger than older cores, even 32nm will not be enough for that, but BD modules are better than 1 intel cores with hyperthreading, that's easy too understand without any doubt.

    I hope that AMD will go for 4 modules fast. That could be competitive to intel with 6 cores with hyperthreading. 8 threads on real cores vs 12 threads on hyperthreading.

    AMD's way is beautyfull, a real bruteforce in integer; the big lack in K8, and the big problem of the K10 as my mind. And my best hope, that is AMD will enable with SSE5 to use FPUs of the GPU for CPU calcs. That's maybe why the FPU on BD is lower than on K10 ( lower with the same amounth of threads ) same if you concider that a BD is a new core.

    4 pipelines for integer is the most beautyfull thing that i would love to see at start of the K10. Very sad that AMD didn't do it.

  4. #54
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    Quote Originally Posted by madcho View Post
    i seriously hope that amd will deliver same amount of BD modules than intel cores. But i'm think this is not possible.

    Why ? Because BD cores a lot biger than older cores, even 32nm will not be enough for that, but BD modules are better than 1 intel cores with hyperthreading, that's easy too understand without any doubt.
    A single BD module will probably be close to the same size as a single Intel Sandy Bridge core, when both made on each respective 32 nm process node. A BD module might even end up being smaller than Sandy Bridge. One limiting factor in core scaling though, is that AMD has apparently designed its cache structure in such a manner to only allow 4 modules to share their L3 cache. For AMD this means they will have 2 separate L3 cache pools when they put 8 modules on a single die and I don't think we will be seeing 8 BD modules single die CPUs for their first generation BD chips. I could be wrong though on this one, have to read up on it again.
    Quote Originally Posted by madcho View Post
    I hope that AMD will go for 4 modules fast. That could be competitive to intel with 6 cores with hyperthreading. 8 threads on real cores vs 12 threads on hyperthreading.

    AMD's way is beautyfull, a real bruteforce in integer; the big lack in K8, and the big problem of the K10 as my mind. And my best hope, that is AMD will enable with SSE5 to use FPUs of the GPU for CPU calcs. That's maybe why the FPU on BD is lower than on K10 ( lower with the same amounth of threads ) same if you concider that a BD is a new core.

    4 pipelines for integer is the most beautyfull thing that i would love to see at start of the K10. Very sad that AMD didn't do it.
    Bulldozer's FPU unit is more than twice as fast as K10.5's FPU unit. One of the biggest improvements it has is support for single cycle FMA, although I'm sure they will have made many other improvements as well. Not to mention support for AVX and as a result full support for all current SSE extensions.
    "When in doubt, C-4!" -- Jamie Hyneman

    Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |

  5. #55
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    I would say FMA is the best thing over Sandy bridge that one can see on paper that is something intel cant bring out in time and most likely will with Ivy Bridge....
    Coming Soon

  6. #56
    Banned
    Join Date
    Sep 2009
    Posts
    97
    Quote Originally Posted by Helmore View Post
    I don't think we will be seeing 8 BD modules single die CPUs for their first generation BD chips. I could be wrong though on this one, have to read up on it again.
    Interlagos? 8 modules right there.

  7. #57
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    Quote Originally Posted by LesGrossman View Post
    Interlagos? 8 modules right there.
    Isn't that a dual die solution?
    "When in doubt, C-4!" -- Jamie Hyneman

    Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |

  8. #58
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Helmore View Post
    Isn't that a dual die solution?
    Yeah MCM of two 4 module MPUs.

  9. #59
    Xtreme Addict
    Join Date
    Jan 2006
    Posts
    1,321
    Quote Originally Posted by Helmore View Post
    Isn't that a dual die solution?
    I doubt mcm will be that bad, even fsb was usable with quad cores, and amd's ht is a very fast link.
    Core i7 920 3849B028 4.2ghz cooled by ek hf | 6gb stt ddr3 2100 | MSI HD6950 cf cooled by ek fc | Evga x58 e760 Classified | 120gb G.Skill Phoenix Pro | Modded Rocketfish case + 1200w toughpower | mcp 655 pump + mcr 320 + black ice pro II

  10. #60
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by haylui View Post
    how does Anandtech got such information???
    Anandtech is press, AMD communicates to press as part of public relations.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  11. #61
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by ridney View Post
    from my limited understanding a BD module would be amd's answer to intel's single core with hyperthreading right? so if intel releases an 8 core SB with HT then amd's answer to it would be 8 module BD which would have 16 cores... that would be awesome. time to save up and splash on when BD comes to town
    No, it is actually a different design philosophy. AMD believes that based on the current technologies, the best way to solve multi-threaded problems is with more threads over more discrete cores.

    In every architecture there will be shared and discrete components (look at L3 caches and memory controllers today). Within the integer core you can make resources either shared or discrete.

    Shared resources need to be wide enough to allow for more throughput without bottlenecks or contention.

    The challenge with hyperthreading (or SMT in the more generic sense) is that it's philosophy is about "filling the pipeline when one thread stalls" and not about driving better efficiency. In a perfectly efficient system, SMT would not be needed because there would be no gaps in the pipeline. (this world does not exist).

    Think of SMT like carpooling. It may appear to be more efficient for 2 people to carpool to work and save money, but that depends on how far they live from each other and how far they live from work. Clearly if they live 3 miles apart and work is only 1 mile away, carpooling becomes less efficeint.

    Having seperate cars may appear to be less efficeint, but if the car are hybrids and the carpool car was an SUV, suddenly the math starts to make sense.

    The key with our architecture is that there are always cores always available (up to 16 per CPU). You won't find a case where you have 16 cores but you can only run 8 threads because the others are waiting for a chance to "jump in."

    Long term, over time, you want to drive to greater CPU efficiency. Every time you increase efficiency with real cores, you have the potential to get more overall throughput. Every time you increase efficiency on SMT you may simply "squeeze the balloon." More efficiency in the primary thread means less opportunity for the SMT thread to "jump in" so you get a net zero gain in throughput.

  12. #62
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    JF-AMD welcome to XS

    Nice first post...

    EDIT: As you said SMT decrease's resources shared but in a situation where the code is broken and not very effective SMT does help does it not?

    Not only that but old code also profit from it and synthetic benchs.

    Also whts preventing AMD from doing a 3-4 core CMT? Will AMD use 2 core CMT in the future or would 3-4 core CMT show up?
    Last edited by ajaidev; 01-05-2010 at 10:53 AM.
    Coming Soon

  13. #63
    Xtreme Mentor
    Join Date
    Jun 2008
    Location
    France - Bx
    Posts
    2,601
    Yeah, right ! Welcome to XS JF-AMD !

    Really informative first post

  14. #64
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by JF-AMD View Post
    In a perfectly efficient system, SMT would not be needed because there would be no gaps in the pipeline. (this world does not exist).
    ....
    More efficiency in the primary thread means less opportunity for the SMT thread to "jump in" so you get a net zero gain in throughput.
    I'm not sure if I understand you. Ability of increasing instruction parallelism is very limited. So the only way I see to increase core efficiency is to decrase number of pipelines and/or execution units per core. Is that what will happen in Buldozer?

  15. #65
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    It's about cache efficiency. Today, when there is a cache miss, the thread stalls while the core waits for the data to be fetched from memory. While that thread is stalled, SMT will dump the cache, insert a new thread, run that, then return the cache contents for the old thread (that just got the memory data.)

    I know that is a REALLY simplistic description but should help you visualize.

    HT originally came about in P4 because they had a very long pipeline and one cache miss had lots of penalty associated. But as they shortened the pipeline (i.e. Core2) they tossed out HT because they no longer needed that band-aid.

    If you take that same logic and extend it, as a microarchitecture, you should always be striving to reduce cache misses as much as possible. As you reduce the misses, you increase the efficiency. That is good. But the cache misses give you the "opportunity" that you need for SMT to work. So as primary core efficeincy goes up, the SMT efficiency generally goes down.

    The ability for parallelism to increase has more to do with the OS schedulers for the most part. OS's deployed 3 years ago were written when single cores ruled the earth. OS's deployed today were focused more on dual core and even to a small extent quad core, so they do a better job of scheduling. OS's that you will use in 3 years will do much better than today's. It is all a progression. Saying you don't need more cores in the future because today's OS's don't utilize all of the cores is like saying that a 1TB drive is too big. Give people enough storage space and they will fill it. Give them enough cores and they will figure out how to use them.

    My notebook probably has 50 different services running (and 3-4 actual programs). There is always a use for more cores, the OS just needs to come along for the ride - and that will be happening.

  16. #66
    Xtreme Addict
    Join Date
    Mar 2008
    Posts
    1,163
    Quote Originally Posted by JF-AMD View Post
    It's about cache efficiency. Today, when there is a cache miss, the thread stalls while the core waits for the data to be fetched from memory. While that thread is stalled, SMT will dump the cache, insert a new thread, run that, then return the cache contents for the old thread (that just got the memory data.)

    I know that is a REALLY simplistic description but should help you visualize.

    HT originally came about in P4 because they had a very long pipeline and one cache miss had lots of penalty associated. But as they shortened the pipeline (i.e. Core2) they tossed out HT because they no longer needed that band-aid.

    If you take that same logic and extend it, as a microarchitecture, you should always be striving to reduce cache misses as much as possible. As you reduce the misses, you increase the efficiency. That is good. But the cache misses give you the "opportunity" that you need for SMT to work. So as primary core efficeincy goes up, the SMT efficiency generally goes down.

    The ability for parallelism to increase has more to do with the OS schedulers for the most part. OS's deployed 3 years ago were written when single cores ruled the earth. OS's deployed today were focused more on dual core and even to a small extent quad core, so they do a better job of scheduling. OS's that you will use in 3 years will do much better than today's. It is all a progression. Saying you don't need more cores in the future because today's OS's don't utilize all of the cores is like saying that a 1TB drive is too big. Give people enough storage space and they will fill it. Give them enough cores and they will figure out how to use them.

    My notebook probably has 50 different services running (and 3-4 actual programs). There is always a use for more cores, the OS just needs to come along for the ride - and that will be happening.
    You certainly sound like sb. who knows a lot, yet you use crappy PR metaphors instead of talking straight.

    And BTW, I bet that that your 50 services could easily use just 1 core and have plenty of spare power. What matters is programs and I'm pretty sure you know this.

  17. #67
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    The key with "our" architecture is that there are always cores always available (up to 16 per CPU). You won't find a case where you have 16 cores but you can only run 8 threads because the others are waiting for a chance to "jump in."

    Freudian slip?


    nice explanation though

    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  18. #68
    Xtreme Addict
    Join Date
    Jun 2007
    Location
    Thessaloniki, Greece
    Posts
    1,307
    Quote Originally Posted by god_43 View Post
    Freudian slip?


    nice explanation though

    No, he works for AMD

    Oh and to XS
    Last edited by BrowncoatGR; 01-06-2010 at 05:00 AM.
    Seems we made our greatest error when we named it at the start
    for though we called it "Human Nature" - it was cancer of the heart
    CPU: AMD X3 720BE@ 3,4Ghz
    Cooler: Xigmatek S1283(Terrible mounting system for AM2/3)
    Motherboard: Gigabyte 790FXT-UD5P(F4) RAM: 2x 2GB OCZ DDR3 1600Mhz Gold 8-8-8-24
    GPU:HD5850 1GB
    PSU: Seasonic M12D 750W Case: Coolermaster HAF932(aka Dusty )

  19. #69
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by BrowncoatGR View Post
    No, he works for AMD
    oh....nvm then.
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  20. #70
    Xtreme Member
    Join Date
    Apr 2008
    Location
    Hiding under a blanky with a flash light
    Posts
    192
    Quote Originally Posted by Helmore View Post
    Bulldozer's FPU unit is more than twice as fast as K10.5's FPU unit. One of the biggest improvements it has is support for single cycle FMA, although I'm sure they will have made many other improvements as well. Not to mention support for AVX and as a result full support for all current SSE extensions.
    How exactly do we know this to be true?

  21. #71
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    i am not seeing the advantages of FMA other than higher accuracy. it sounds more like PR to me.

  22. #72
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    Quote Originally Posted by BatteryOperated View Post
    How exactly do we know this to be true?
    I don't know this, it's just a guess based on the information we have thus far. What we know thus far is that Bulldozer can do two 128-bit FMA FPU operations concurrently, while Barcelona/Shanghai can only do one 128-bit FPU operation. That gives Bulldozer an FPU execution unit that is more than twice as fast as Barcelona.
    Quote Originally Posted by Chumbucket843 View Post
    i am not seeing the advantages of FMA other than higher accuracy. it sounds more like PR to me.
    IIRC FMA allows you to do more work per clock and it gives you higher accuracy. That is because with FMA Bulldozer can do the calculation: a = a + a*b in one cycle and it will do the rounding afterwards. On Barcelona this calculation would take 2 clock cycles and it would do the rounding in between. Correct me if I'm wrong though, as I'm not completely sure about what I just said.
    "When in doubt, C-4!" -- Jamie Hyneman

    Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |

  23. #73
    Xtreme Mentor
    Join Date
    Nov 2006
    Location
    Spain, EU
    Posts
    2,949
    JF-AMD, can you comment on Bulldozer's single thread perfomance? Or at least about the approach you've taken.
    Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)


    Quote Originally Posted by PerryR, on John Fruehe (JF-AMD) View Post
    Pretty much. Plus, he's here voluntarily.

  24. #74
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Welcome to XS JF!
    Last edited by Mechromancer; 01-05-2010 at 07:49 PM.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  25. #75
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    cpu's and gpu's today use MADD which is an independent add and mul unit. they can execute two flops per cycle. with FMA the multiplication and addition logic are in one fpu so if you cant do both a multiply and add then you have to insert a constant i.e. (a*1)+b or (a*b)+0. this is wastefull. a lot of algorithms can use FMA but my problem is that it does not reduce latency enough relative to power and area.

Page 3 of 29 FirstFirst 12345613 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •