Dresdenboy's Blog
AMD has a slew of patents from the last couple of years that point into the direction AMD is going with their upcoming microarchitecture codenamed Bulldozer (the Interlagos CPU). A bright German going by the screen name Dresdenboy has been following these patents for quite some time and putting together an image of what Bulldozer will look like. His blog is very informative and insightful as to the possible inner workings of AMD's future CPU.
Here is the most recent diagram from August 21st:
An interesting part of his research came when he explored whether or not AMD will implement SMT:
CMT?!?!?!?! The above is just one of his entries on his blog. The rest are just as interesting, especially the entry called "Faster adaption to ISA Extensions". I thought his blog is newsworthy and needs a bit of healthy discussion.More details on Bulldozer's multi-threading and single thread execution
by Dresdenboy @ 2009-07-07 - 11:36:01 am
Unfortunately I both did not have enough time and details (some things were to guess) to create the promised architecture diagram. However, now the missing details can be found in new published patent applications. I think that will help me getting back to the task. But now I switch to another topic: Will bulldozers have SMT or not?
AMD's John Fruehe recently said thread in an AMDZone forum that, AMD will not do SMT in the next years. That could be understood in a way that the architecture revealed here will not be able to execute more than one thread per core. However, given this is not the case, because such a statement has not been. So far, John said that, AMD would not implement SMT. In my eyes it was a smart move to mention SMT - just to be able to deny it. However, this is still speculation.
Instead we saw the term "cluster-based multi-threading (also known as clustered multi-threading, CMT) already years ago in an AMD presentation. If you look at Chuck Moore's slide below, you see, that SMT is the least admirable multi-threading variant to AMD. So far they were underway in the CMP part of this diagram and it just seems logical to move to much greener CMT area from there - even more since they explicitly state a 50% area for investment gain 80% throughput. They had this view already four years ago with first patents covering the new architecture being filed just two years later. If bulldozers would have been ready already for 2009 or 2010, these time frames seem ok to me. And even the four year difference from patent filing dates to 2011 fits well to what we know from older architectures.
So we find the new arch again in:
20090164758 - System and method for performing operations locked
20090172359 - having parallel processing pipeline dispatch and method thereof
20090172362 - Processing pipeline stage having specific thread selection and method thereof
20090172370 - Eager execution in a processing pipeline having multiple integer execution units
And most of these patent applications now give much more detail on how the threads are executed and the likes. Most of it fits well to what Hans de Vries already described in his detailed post on aceshardware.
These patent application describe ways to execute a single thread on both clusters. This could be done by having a thread run ahead for early prefetches memory or by executing both ways of a branch in parallel and scrap the wrong way after branch resolution. A different variant is the parallel execution of the same code to gain reliability of the results by comparing them afterwards.
Some of the mentioned patent applications also state, that the 4 way decoders could decode more than 4 instructions per cycle if there are both a micro coded and a fastpath instruction (of different threads) in one decoding path
Another interesting and related topic is the way future general and how graphics processing units could be combined. This is covered in the following patent applications:
20090164726 - Programmable address processor for graphics applications
20090160863 - unified processor architecture for graphics and general processing workload
EDIT: Updated CPU Diagram as of August 27th. Check out Dresdenboy's blog for details on the changes!
![]()
Bookmarks