Page 1 of 11 1234 ... LastLast
Results 1 to 25 of 262

Thread: Dresdenboys' blog: AMD Bulldozer - Patent based research

  1. #1
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663

    Dresdenboys' blog: AMD Bulldozer - Patent based research

    Dresdenboy's Blog

    AMD has a slew of patents from the last couple of years that point into the direction AMD is going with their upcoming microarchitecture codenamed Bulldozer (the Interlagos CPU). A bright German going by the screen name Dresdenboy has been following these patents for quite some time and putting together an image of what Bulldozer will look like. His blog is very informative and insightful as to the possible inner workings of AMD's future CPU.

    Here is the most recent diagram from August 21st:


    An interesting part of his research came when he explored whether or not AMD will implement SMT:
    More details on Bulldozer's multi-threading and single thread execution

    by Dresdenboy @ 2009-07-07 - 11:36:01 am

    Unfortunately I both did not have enough time and details (some things were to guess) to create the promised architecture diagram. However, now the missing details can be found in new published patent applications. I think that will help me getting back to the task. But now I switch to another topic: Will bulldozers have SMT or not?

    AMD's John Fruehe recently said thread in an AMDZone forum that, AMD will not do SMT in the next years. That could be understood in a way that the architecture revealed here will not be able to execute more than one thread per core. However, given this is not the case, because such a statement has not been. So far, John said that, AMD would not implement SMT. In my eyes it was a smart move to mention SMT - just to be able to deny it. However, this is still speculation.

    Instead we saw the term "cluster-based multi-threading (also known as clustered multi-threading, CMT) already years ago in an AMD presentation. If you look at Chuck Moore's slide below, you see, that SMT is the least admirable multi-threading variant to AMD. So far they were underway in the CMP part of this diagram and it just seems logical to move to much greener CMT area from there - even more since they explicitly state a 50% area for investment gain 80% throughput. They had this view already four years ago with first patents covering the new architecture being filed just two years later. If bulldozers would have been ready already for 2009 or 2010, these time frames seem ok to me. And even the four year difference from patent filing dates to 2011 fits well to what we know from older architectures.



    So we find the new arch again in:
    20090164758 - System and method for performing operations locked
    20090172359 - having parallel processing pipeline dispatch and method thereof
    20090172362 - Processing pipeline stage having specific thread selection and method thereof
    20090172370 - Eager execution in a processing pipeline having multiple integer execution units

    And most of these patent applications now give much more detail on how the threads are executed and the likes. Most of it fits well to what Hans de Vries already described in his detailed post on aceshardware.

    These patent application describe ways to execute a single thread on both clusters. This could be done by having a thread run ahead for early prefetches memory or by executing both ways of a branch in parallel and scrap the wrong way after branch resolution. A different variant is the parallel execution of the same code to gain reliability of the results by comparing them afterwards.

    Some of the mentioned patent applications also state, that the 4 way decoders could decode more than 4 instructions per cycle if there are both a micro coded and a fastpath instruction (of different threads) in one decoding path

    Another interesting and related topic is the way future general and how graphics processing units could be combined. This is covered in the following patent applications:
    20090164726 - Programmable address processor for graphics applications
    20090160863 - unified processor architecture for graphics and general processing workload
    CMT?!?!?!?! The above is just one of his entries on his blog. The rest are just as interesting, especially the entry called "Faster adaption to ISA Extensions". I thought his blog is newsworthy and needs a bit of healthy discussion.

    EDIT: Updated CPU Diagram as of August 27th. Check out Dresdenboy's blog for details on the changes!
    Last edited by Mechromancer; 08-27-2009 at 06:13 PM.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  2. #2
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Totallly like good... i used to read the blog forgot about it some months ago. Seems the BD it very very different from K10.

    The core is a cluster and reminds me of semi modular forum of the nehalem arc. The arc. is off course something new all together. The k10/10.5 is not so complex dont know about nehalem tough. The cores seems to have higher core inter dependency and communication. The sandy bridge seems more of a evo of the nehalem/clarkdale than a full new arc.
    Last edited by ajaidev; 08-23-2009 at 01:01 PM.

  3. #3
    Xtreme Enthusiast
    Join Date
    Mar 2008
    Location
    Dallas, TX
    Posts
    965
    the picture says, AMD 2005 analyst day in the bottom right hand corner?

    definitely some interesting stuff though.
    "fightoffyourdemons"


  4. #4
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    Bulldozer looks very promising, if this doesn't save AMD, nothing will.
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  5. #5
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by Chruschef View Post
    the picture says, AMD 2005 analyst day in the bottom right hand corner?

    definitely some interesting stuff though.
    It sure does. Read his blog though. He states that previous published and more recently published patents state that AMD is on track with this tech for the time frame Bulldozer will be lunched, 2011. AMD seems to be on point with what they told the world in 2005 which should make everybody grin. This has me wondering if Intel is going down the CMT route as well when AVX comes on the scene .
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  6. #6
    Xtreme Member
    Join Date
    Oct 2008
    Location
    New Hampshire
    Posts
    242
    Quote Originally Posted by Smartidiot89 View Post
    Bulldozer looks very promising, if this doesn't save AMD, nothing will.
    AMD is actually making strides to get back into the game. It needs a solid launch of BD chips come 2011. The company isn't really in need of "saving" though. They do have some folks in position to get the company back on track and the stock values have doubled in the last few months. ~3.50/share up from <2.00 share. If BD is a huge success they will be able to drastically pickup some market share as long as it's priced right and Intel hasn't dropped a monster in their playground by then.

  7. #7
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    omg this made me jizz my pants. thank you mechromancer. and lets all hope this thing has better retirement. still some of the pieces are missing but itsn nice to get some news on this.

  8. #8
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by Mechromancer View Post
    This has me wondering if Intel is going down the CMT route as well when AVX comes on the scene .
    If AMDs CTM is what i think it is, intel already walks that path with terascale and mitosis (speculative threading), but its still some time away.

    If amd can relase such a architecture with Bulldozer we we'll see quite a nice speedup on multithreaded workloads, wonder if they will include speculative threading as well.

    Also i couldn't resit to smirk when i read that:
    Some of the mentioned patent applications also state, that the 4 way decoders could decode more than 4 instructions per cycle if there are both a micro coded and a fastpath instruction (of different threads) in one decoding path
    Macro-OPs Fusion anyone?
    Last edited by Hornet331; 08-23-2009 at 03:19 PM.

  9. #9
    Xtreme Enthusiast
    Join Date
    Apr 2007
    Posts
    772
    About time AMD went to a 4-issue core. This will put the heat on Intel.

    It would be great to see a new "CPU race" come out of this.

  10. #10
    Xtreme Member
    Join Date
    Dec 2008
    Posts
    285
    He's right though, whether K11 matters depends as much on timely execution as it does on promising new technologies. I got a Phenom 9500 as a review sample a while back and I was quite disappointed but I remember a few AMD crazies defending K10 on the grounds that it was a 'native quad core' and so 'more advanced' than Kentsfield. Which is of course all irrelevant because it was later, slower and worse than the competition. Given AMDs track record, bits and pieces like these are interesting, even promising but we shouldn't start getting excited; I'll start excited as soon as people are smuggling the things out and the results appear on HKEPC.
    Core i7 920, Gigabyte x58-USB3, Radeon 5850 [CF coming soon], 6GB OCZ Platinum, Corsair 40GB Force, 3x 2TB Spinpoint F4, Silverstone OP1000, Dell XPS Studio Case.

    Alienware M11x.

  11. #11
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    oh, I remember this stuff.. looks like its come a little way since we were all fighting about 4Ghz P4's..

    All along the watchtower the watchmen watch the eternal return.

  12. #12
    Xtreme Addict
    Join Date
    Jun 2007
    Location
    Thessaloniki, Greece
    Posts
    1,307
    Quote Originally Posted by Hornet331 View Post
    If AMDs CTM is what i think it is, intel already walks that path with terascale and mitosis (speculative threading), but its still some time away.

    If amd can relase such a architecture with Bulldozer we we'll see quite a nice speedup on multithreaded workloads, wonder if they will include speculative threading as well.
    It's not what you think it is
    Seems we made our greatest error when we named it at the start
    for though we called it "Human Nature" - it was cancer of the heart
    CPU: AMD X3 720BE@ 3,4Ghz
    Cooler: Xigmatek S1283(Terrible mounting system for AM2/3)
    Motherboard: Gigabyte 790FXT-UD5P(F4) RAM: 2x 2GB OCZ DDR3 1600Mhz Gold 8-8-8-24
    GPU:HD5850 1GB
    PSU: Seasonic M12D 750W Case: Coolermaster HAF932(aka Dusty )

  13. #13
    Xtreme X.I.P.
    Join Date
    Nov 2002
    Location
    Shipai
    Posts
    31,147
    so what exactly is cmt?
    and is it really 4 issue? thats how i reas the drawing as well...

  14. #14
    Xtreme Member
    Join Date
    May 2009
    Location
    Calgary, Alberta
    Posts
    115
    Nice find sir!

  15. #15
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Singapore
    Posts
    970
    cluster, the next small thing
    Main Rig:
    Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
    Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
    Graphic Card:XFX RX 580 4GB
    Power Supply Unit:FSP AURUM 92+ Series PT-650M
    Storage Unit:Crucial MX 500 240GB SATA III SSD
    Processor Heatsink Fan:AMD Wraith Spire RGB
    Chasis:Thermaltake Level 10GTS Black

  16. #16
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    Quote Originally Posted by mstp2009 View Post
    About time AMD went to a 4-issue core. This will put the heat on Intel.

    It would be great to see a new "CPU race" come out of this.
    Like actually pushing pressure on Intel to deliver something new and breathtaking again just like Conroe?
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  17. #17
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by BrowncoatGR View Post
    It's not what you think it is
    So if it's not one of this approaches, tell me what it is...

  18. #18
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Who can explain me what is so exciting in this, home made scheme? (lets put aside possible errors in this scheme). What level of performance is expected and what based on? And what is the point of making prediction of a future hardware based on some patents? Intel alone gets 60-70 patents a month. Imagine how would look possible Intels CPU with all Intels patents implemented at once.
    BTW Intels patent about clustered multithreading:
    http://www.patents.com/Multithreaded...7478198/en-US/

  19. #19
    Xtreme X.I.P.
    Join Date
    Nov 2002
    Location
    Shipai
    Posts
    31,147
    instead of talking about what this is NOT, can we please move the discussion towards what it is and what it does?
    thx

  20. #20
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by kl0012 View Post
    Who can explain me what is so exciting in this, home made scheme? (lets put aside possible errors in this scheme). What level of performance is expected and what based on? And what is the point of making prediction of a future hardware based on some patents? Intel alone gets 60-70 patents a month. Imagine how would look possible Intels CPU with all Intels patents implemented at once.
    BTW Intels patent about clustered multithreading:
    http://www.patents.com/Multithreaded...7478198/en-US/
    Two things you should do is read Dresdenboy's blog and look at that diagram carefully. He has sorted through AMD's patents to find the ones that point to an actual upcoming uarch. Also he is mindful of the amount of time it takes between a patent being filed and it showing up in a part, around 4 years. Hes' been analyzing patents from 2007 to now so we'll possibly be seeing a lot of the ones he highlights in his blog between 2011 and 2013.

    The real exciting part is that AMD has been AMD has been coming out with patents concerning per-core multithreading since the 1990s. It is finally looking like we will see it in the next processor revision in a more efficient form that Intel's SMT. AMD said they set this goal in 2005 (when they had cash money) so it looks like they actually did use their time at the top to go forward with a radically new uarch. A development time frame of 6 years (2005 to 2011) fits that theory. Concerning the CPU diagram he made, this chip could be 4 threads per core!

    A few other interesting tidbits are these:
    1)
    There is also a new inventor name: Nhon Quach. Besides other companies he already worked for Intel on Itanium's RAS and system architecture. During that time he also worked on a reliable architecture with two cores, which don't share resources (see his patent no. 6,615,366, with a typo in the abstract BTW). So now he is at AMD doing similar stuff, which fits nicely to the clustered architecture.
    2)Hans De Vries is the person that analyzed some patents at aceshardware and came up with the details about how CMT may work with this core configuration:
    Bulldozer's clustered multiprocessor architecture

    I've always interpreted AMD's clustered multiprocessing, which they
    claimed as adding 80% performance with 50% extra transistor, as
    something like the following:

    A 2-way superscalar processor can reach 80%-100% of the performance
    of a 3-way for lots of applications. Only a subset of programs really
    benefits from going to a 3-way. A still smaller subset benefits from going
    to a 4-way superscalar.

    Now, if you still want to have the bennefits of a 4-way core but also
    want to have the much higher efficiency of the 2-way cores then you
    can do as follows:

    Design a 4-way processor which has a pipeline which can be split
    up into two independent 2-way pipes. In this case both threads have
    there own set of resources without interfering with each other.

    Part of the pipeline would not be split. Wide instruction decoding would
    be alternating for both threads.

    The split would be beneficial however for the integer units and the
    read/write access units to the L1 data cache. The total 4-way core
    could have more read/write ports which should certainly improve
    IPC for a substantial subset.

    The 128 bit SSE/FP units could be modified partly in connection
    with the read/write ports. There was some improvement but not
    that much when AMD almost doubled the SSE2/FP hardware going
    from 64 bit units in K8 to 128 bit units in the K10.

    There is lots of efficiency to be gained by using two K8 like SSE/FP
    which can operate independently in 2-way mode and which can operate
    together as a single 128 bit unit in 4-way mode. Other similar tricks
    can be beneficial as well.

    Part of the higher IPC of Itanium is due to it's multiple read write
    ports to cache and it's 64bit FP units which can work independently
    instead of in a "dumb" 2x64 way mode. The two independent FP units
    of the Itanium can be fed directly from cache due to all these read
    ports (and they can write directly to cache as well)

    Something like this is what you would gain in the 4-way mode while
    the 2-way modes bring the efficiency in throughput computing.


    Regards, Hans
    2-way to 4-way processing = Clustered Multi-threading possibly. I think Sun's Niagara(II) from 2005 (4 years ago!) uses a similar scheme as it is 4 threads per core. The Bulldozer uarch could simply be an x86 version of Sun's tech, just like K7 was very similar to Sun's Alpha.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  21. #21
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Just to add, Sun has their UltraSPARC T2 Processor specs HERE. Notice, 8-threads per core, 2 instruction pipelines + 1 floating point unit + 1 stream processing unit (cryptographic). In Dresdenboy's diagram, I see 4 instruction pipelines. Given the differences between SPARC and x86, AMD may very well be getting 4 threads out of a core (welcome to the 21st century, AMD).

    If anything, we can at least expect 4 instructions per cycle like Intel's Core 2 and Nehalem, up from K10's 3 .
    Last edited by Mechromancer; 08-24-2009 at 06:25 AM.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  22. #22
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    I doubt the core(s) would be able to execute 4 threads(maybe Mechromancer meant 4 instructions per clock?). The patent simply describe a 4-way design(4-way as 4 way decoding ;in K8/K10 we are stuck at 3 way). The new approach is flexible as it allows the superior efficiency of 2-way decoding and combines these 2 int pipelines to achieve as much as possible efficiency from integer code(as close as possible to ideal 4-way execution).Similarly,in FP/SSE case we have a 4-way,although "single", SEE unit which is "super" wide(256bit and supporting AVX and FMA4 extensions) and which is able of splitting in many various ways(1x256,2x128 or even 4x64bit),similarly to "Itanium way",which in turns could make it much more efficient then the present ,as Hans call the,dumb way of doing things.

    The described design would still receive many more improvements to other aspects of the core and uncore parts,but the underlying uarchitecture is pretty well presented in dresdenboy's blog. The described design should be very efficient in both multithreading and singlethreading,relying on the split int pipelines for great efficiency and even possible speculative execution(it could be useful with branch prediction and data reliability). There are also many patents on improvements in the area of power management,GPU/CPU integration(2nd gen. of Fusion in ~2012) etc.
    Also the design can be always extended in future,by adding one more int "cluster" and thus making a possible efficient 3x2-way integer "super cluster"(a real unified 6-way design,ie. the natural extension of what we have today,would be a power hog and much less efficient). The FP/SSE part would need appropriate rework and this could be a challenge since in present day patents the SSE unit is still unified and not split in smaller clusters.

  23. #23
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    How it is actual?2005....it can be today diferent
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  24. #24
    Xtreme Member
    Join Date
    Jan 2009
    Posts
    169
    Quote Originally Posted by Mechromancer View Post
    The Bulldozer uarch could simply be an x86 version of Sun's tech, just like K7 was very similar to Sun's Alpha.
    Digital Equipment Corporation is calling. They want their Alpha back.

    XmX

  25. #25
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by Mechromancer View Post
    Just to add, Sun has their UltraSPARC T2 Processor specs HERE. Notice, 8-threads per core, 2 instruction pipelines + 1 floating point unit + 1 stream processing unit (cryptographic). In Dresdenboy's diagram, I see 4 instruction pipelines. Given the differences between SPARC and x86, AMD may very well be getting 4 threads out of a core (welcome to the 21st century, AMD).

    If anything, we can at least expect 4 instructions per cycle like Intel's Core 2 and Nehalem, up from K10's 3 .
    I think the PowerPC7 tech may have deeper impact on BD than SPARC... It was long rumored that PoerPC7 and Future amd server processors will share the same socket. Also SPARC uses SMT to get the 8 TPC AMD already said they don't like it, i even asked a question on SMT on AMD work the replay was pretty anti SMT to say the least.
    Last edited by ajaidev; 08-24-2009 at 07:22 AM.

Page 1 of 11 1234 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •