the guy say jf is wrong, it's crazy, he's maybe not an engineer but he know deeply his work.
Printable View
the guy say jf is wrong, it's crazy, he's maybe not an engineer but he know deeply his work.
I'm not sure if this has been posted before:
Bulldozer at ISSCC 2011 - The Future of AMD Processors
Some analysis from ISSCC from Pcper, it's worth the read.
This I thought was interesting
Quote:
Clock gating, which turns off individual components such as execution units, has been much more thoroughly implemented. There is something like 30,000 clock enables throughout the design, and it should allow an unprecedented amount of power savings (and heat reduction) even when the CPU is at high usage rates. Even though a processor might be at 100% utilization, not all functional units are being used or need to be clocked. By having a highly granular control over which units can be gated, overall TDP and heat production can be reduced dramatically even at high utilization rates.
If im not mistaken, at HotChips AMD said that Bulldozer will PowerGate the entire Module and not individual components, have I missed something here ??Quote:
Clock gating, which turns off individual components such as execution units, has been much more thoroughly implemented. There is something like 30,000 clock enables throughout the design, and it should allow an unprecedented amount of power savings (and heat reduction) even when the CPU is at high usage rates. Even though a processor might be at 100% utilization, not all functional units are being used or need to be clocked. By having a highly granular control over which units can be gated, overall TDP and heat production can be reduced dramatically even at high utilization rates.
Power gating != Clock gating; ;)
Power gating involves completely turning off large portions of the design so that no power (well, extremely close to nil) is dissipated. Given the physical complexity and analog characteristics involved (takes time to turn power on/off), this is usually limited to something such as a whole core/module, like you're talking about.
Clock gating is done at a finer granularity and just involves turning off the clock signal to smaller portions of the design that aren't being used, essentially turning off the flip-flops in a piece of logic that doesn't need to change state/be used anyways. Because of this it can be integrated into a design at such an extent.
I thought it was the same, ok thx ;)
I've noticed some stuff. The whole module aproach is to save diespace, the 2 ALUs and shared FPU is all about performance per mm2, and performance per watt. At the same time they made a tradeoff where they sacrificed die space for higher clocks. They have aggressive turbo and energy saving features, maybe the most effective ever.
In other words. They done everything the can to get high performance per watt and mm2, then they've done everything the can in translating this advantage to higher clocks and turbo.
Sure, the 4 BD pipes will certainly be better than the 6old of K8/K10. Nevertheless is is a mistake to just count 3 pipelines for the latter, as it was done in the schematic.
Congratulations, you understood BD's design philosophy ;-)
Did for some time, but everytime new information is posted it fits this pattern. :)Small effective cool cores, made for high clocks, and then turbo and power saving. Which enables clock speeds far above the rated frequency, which until now always has been bound by theoretical and unrealistic 100% usage.
The entire processor seems to be built around their new turbo technology. I wonder if this new turbo technology has something to do with the decision to scrap 45nm BD in favor for an enhanced 32nm BD.
heh, this is nice! Thx to yuri from CZ for the link....
Interlagos !!
Bulldozer can work on AM3 ?
http://translate.google.fr/translate...oard%2FNews%2F
this is getting annoyingly repetitive
either they found a work around that does not need 900 series chipsets
or were still being lied to.
i think its quite possible to fake compatibility at a loss of some features, but i somehow wonder if that loss is going to be major (like >10% efficiency loss due to simplistic turbo or complete lack of)
We're not being lied to. AMD said that it is possible to make an BD for AM3, but it would be different from those made for AM3+. So they would need two different designs. Which would cost more for AMD than it would be worth. Since enthusiast probably stand for a single digit percentage of overall sales, and many (including me) is upgrading from AM2 and need a new motherboard anyway. While others would go for AM3+ even if they have AM3 because of the performance advantage. The AM3 version would have a very small marketshare if you count OEM, AM2 users and AM3+ upgraders.
If they would go for the fully AM3 compatible design only they would suffer a signigicant performance loss on all models, even if you put them in AM3+ -boards.. And since we not only have much more agressive power saving features, but also a new kind of turbo which uses this new headroom efficiently I think it's a safe bet that we would loose much of the turbo functions, including the boost on all cores. And it seems like the whole turbo business will play a major role in BDs final performance.
Basically AMD has 3 choices.
1: AM3-design only. Suffer a heavy performance loss on all processors, no matter which socket you put them in.
2: Two different designs, much more expensive, and the enthusiasts that upgrade from AM3 would be to few for this to be economically viable.
3 AM3+ only. More performance on all models, cheaper. Probably add some badly needed extra cash from increased motherboard sales too.
no, no and no! Its simple guys,next thinking a lot about it...AM3 socket is not compactible with AM3+ CPUs, because AM3+ CPUs has more pins than AM3 socket! Remember it! But this doesnt mean existing hybrid-AM3+ socket boards with older 800 chipset!
This story is going on and on and on ... and nobody knows anything.
AMD stated last year at a telephone conference that BD wont be AM3 compatible.
Now MSI launched the GD65 a few weeks back, which comes with a mysterious AM3+ printing.
Explanation from MSI: There will be AM3+ (yes, AM3+) CPUs that will fit in the board socket (which has an AM3 socket).
However we know that the AM3b socket has 1 pinhole more. That means, a AM3b CPU that will use that pin, wont mechanically fit into AM3.
The solution now is that AMD will do the same as with Deneb: First launch BD in the old socket AM3, as back then with Phenom2 920&940, and a bit later the real deal with AM3+.
In any case, the (desktop) marketing would have been abysmal bad, lots of people didn't buy AM3 last year, because AM3 was suppose to be a dead end like 1156, thus you buy the product with the better performance, i.e. Intel.
hehe, then maybe ,-) (secret pin from AMD for beauty appearance :D)
Great find, Olivion. It all makes sense now.
Curious what this pin does, though...
Bulldozer if I remember correctly uses HT3.1 which basically is a just a simple clock increase from 2.6 to 3.2GHz, meaning 5200MT/s versus 6400MT/s.
AM3 processors work with HT1.0 chipsets... so until the function of that extra pin gets revealed (if there is such) is just the same cattle manure that Intel did by going from LGA1156 > LGA1155. :down:
I know companies need their profit, but in my eye that pins solely purpose just to make people buy a new motherboard with a new chipset if they want the shiny new architecture.
It's already been explained that BD is a totally different architecture than Deneb or other Phenom/Athlon based chips. It's made in such a way so that AM3 chips (like Phenom II's) will slot right into AM3+ motherboards, but BD variants (AM3+) will not be able to fit into existing AM3 motherboards.
That's the reasoning that I can tell. Don't change the socket type unless you have to, and in this case, AMD doesn't have to, but BD won't work with existing architectures. Simple.
What part of "Bulldozer is different" do you guys not understand?
We had some marketing info:
http://www.planet3dnow.de/cgi-bin/ne...?id=1282840508Quote:
"When we initially set out on the path to Bulldozer we were hoping for AM3 compatibility, but further along the process we realized that we had a choice to make based on some of the features that we wanted to bring with Bulldozer. We could either provide AM3 support and lose some of the capabilities of the new Bulldozer architecture or, we could choose the AM3+ socket which would allow the Bulldozer-base Zambezi to have greater performance and capability.
The majority of the computer buying public will not upgrade their processors, but enthusiasts do. When we did the analysis it was clear that the customers who were most likely to upgrade an AM3 motherboard to a Bulldozer would want the features and capability that would only be delivered in the new AM3+ sockets. A classic Catch-22.
Why not do both you ask? Just make a second model that only works in AM3? First, because that would greatly increase the cost and infrastructure of bringing the product to market, which would drive up the cost of the product (for both AMD and its partners). Secondly, adding an additional product would double the time involved in many of the development steps.
So in the end, delivering an AM3 capability would bring you a less featured product that was more expensive and later to market. Instead we chose the path of the AM3+ socket, which is a path that we hope will bring you a better priced product, with greater performance and more features - on time.
When we looked at the market for AM3 upgrades, it was clear that the folks most interested in an AM3-based product were the enthusiasts. This is one set of customers that we know are not willing to settle for second best when it comes to performance, so we definitely needed to ensure that our new architecture would meet their demanding needs, for both high performance and overclockability. We believe they will see that in AM3+."
I really wonder why AMD should have changed that decision now. Maybe the MSI people were smoking something weired ...
We have AMD saying that they could do it, but the performance disadvantage wouldn't be to big since some features wouldn't work. That's the truth, yhe speculative part is my thought about these features being power saving features enabling the new turbo technology.
It probably has nothing to do with hyper transport. There is nothing that points in that direction. My guess is that it's limiting performance since the power saving features and thus the new turbo won't work as good. If so AM3 would lower the performance in pretty much all workloads.
Thing is, for one, disabling turbo alltogether doesnt mean the cpu should not work entirely.
I was referring to hardware/engineering obstacles that would prevent BD chips from working on the AM3 socket.
Yes ,there was a blurb about their "choices", but no concrete information as to why this wasnt possible.
And Boris your whole post was speculative ,we dont know what the performance hit could be,or why ,or if any.Because there is no HARD FACT info about it.You just made many assumptions based on a marketing blurb.Which may be or may not be true.
Going from AM2 to AM2+ ,in pure reality ,performance loss was neglible, yes, there was some feature loss.But in the end it worked almost as good when manually setting platform.
@Mad pistol
Think about it for a second, if it WAS such a strong departure from existing infrastrcture as you say it is, it would not work with basically the same socket (sans different keying), that works with current chips using the same chipsets and type of bus .And morover, it would not be a DROP IN replacement in server arena.
Boris seems to be closer to the truth, some voltage/turbo related feature that needs more advanced power distribution seems more likely, however it would imply the cpu itself should work just fine without it on regular AM3.
UPDATE 3:
Hiroshige Goto has a nice photo about bulldozer, maybe more accurate than ever.(Article is in Japanese:confused:)
http://pc.watch.impress.co.jp/docs/c...01_430044.html
http://pc.watch.impress.co.jp/img/pc.../430/044/1.jpg
None of the info on AM3 w/BD is from AMD.
If this turns out to not be true, which I suspect is the case, don't get mad. I doubt we would have said it doesn't work if it actually did. I am guessing that because this is from a third party document that is translated from the original.
There is die space empty on the BD core or it's me ?
omg !
That is where we are hiding the flux capacitor. Don't tell anyone.
Actually, I would get mad, if it would work, because you stated earlier last year that it does not.
Ahh come on ... dont hide something useless like a flux capacitor, better hide something useful like a dedicated directory cache ;-)
JF, I was wondering, since the Bulldozer architecture is modular, and previous slides have shown the next gen bulldozer to be upto 10cores.
Why do a 10 core, when it would make more sense to do a 12 core (in my eyes) so then a rectangle die can be cut out instead of an odd shape?
10 core I assume would be:
[x][x][x]
[x][x]
and a 12 core would be
[x][x][x]
[x][x][x]
There is a power limit on a socket level and a power limit that your microacrhitecture and process allows for certain class of parts. They will manage to increase core count next year by 25% with 20C improved Bulldozer cores compared to 16C interlagos we will have soon on the market. The improved cores will probably mean faster with better power characteristics. You will have also 25% more headroom to run your Turbo up,which will be good for poorly threaded server workloads. Similar applies to Komodo on desktop,if it ships with 10 cores next year. It's 25% more cores,which should be "enhanced", within 125W TDP envelope and with a bit higher TDP headroom for Turbo. All this on the same process node ;).
Hmmm, dual module BD with 4MB L3 ??
They already have som empty space between two of the modules, if you squeez anither module in between them and add some empty space between the other modules next to the NB it will be around the same amount of dead space.
xx=module
e= Empty space
n= North Bridge
8X:
[xx][e][xx]
[xx][n][xx]
10X:
[xx][xx][xx]
[xx][en][xx]
12X:
[xx][e][xx][xx]
[xx][n][xx][xx]
Or the small empty area could be used for extra cache:
http://bildr.no/thumb/834422.jpeg
EDIT:
There, fixed it.
Click to open larger pic. 10 core with 10Mb L3 and 10Mb L2 is just 8,8% larger, and I haven't even squeezed it. ;)
EDIT2:
Added an extra row of modules in the same way to make a 12core. It was 40% larger than the 8 core. Hardly worth the 50% increase in cores. Then 9% larger die to get 20% more cores looks much more promissing.
I mean that it will be made with only 2x 2MB L3, it will not have 8MB and 4MB disabled, that's why the empty space between the two modules.
No thanks John, I prefer Family Guy's spin off in "Black to the Future"
http://www.youtube.com/watch?v=LIAYLxaAed0
what about of integrate the NB in the next gen?
Unlikely, Harvesting Methods like this are not used for high-priced chips.
They would try to sell the full chip in any case. The defect ones could still be sold in the desktop segment.
Maybe AMD is planning this, Komodo is labeled as 8core CPU in the roadmap pictures, i.e. one module is probably deactivated there.
The integration of PCIe will use up some die space, just have a look of the PCIe area of Intels 1155/56 chips. Now imagine that AMD will pack 32 Lanes in the chip, not only 16 as in Linnfield / Sandy, and we have quite a big area.
the only thing not to like is that a 65w quad BD might turbo very little compared to an 8 core 125w beast.
if your game uses 4 threads, you have 16.25w per thread in the 65w envelope. and the 125w envelop would have 31.25w per thread, assuming some need a few watts to idle, lets just say its ~28w per thread, still 75% more headroom.
while this means nothing for us overclockers, this could mean that reviews show 8 core chips just raping the 4 core versions across the board.
which then makes the quads worth less and thus cheaper and easier for us to get our hands on them :D
... what? Where do you think AMD gets its 8-core Magny Cours Opterons??
Or Intel its 4 and 6-core Nehalem-EXs?Code:Processor Cores Clock speed L3 cache ACP Price
Opteron 6176 SE 12 2.3 GHz 12 MB 105 W $1,386
Opteron 6174 12 2.2 GHz 12 MB 80 W $1,165
Opteron 6172 12 2.1 GHz 12 MB 80 W $989
Opteron 6168 12 1.9 GHz 12 MB 80 W $744
Opteron 6136 8 2.4 GHz 12 MB 80 W $744
Opteron 6134 8 2.3 GHz 12 MB 80 W $523
Opteron 6128 8 2.0 GHz 12 MB 80 W $266
Opteron 6164 HE 12 1.7 GHz 12 MB 65 W $744
Opteron 6128 HE 8 2.0 GHz 12 MB 65 W $523
Opteron 6124 HE 8 1.8 GHz 12 MB 65 W $455
Harvesting methods are most definitely used for server chips.Code:Processor Cores/threads Speed L3 cache TDP Price
Xeon X7560 8/16 2.26 GHz 24 MB 130W $3,692
Xeon X7550 8/16 2.00 GHz 18 MB 130W $2,729
Xeon E7540 6/12 2.00 GHz 18 MB 105W $1,980
Xeon E7530 6/12 1.86 GHz 12 MB 105W $1,391
Xeon E7520 4/8 1.86 GHz 18 MB 95W $856
Xeon L7555 8/16 1.86 GHz 24 MB 95W $3,157
Xeon L7545 6/12 1.86 GHz 18 MB 95W $2,087
Xeon X7542 6/6 2.66 GHz 18 MB 130W $1,980
Xeon X6550 8/16 2.00 GHz 18 MB 130W $2,461
Xeon E6540 6/12 2.00 GHz 18 MB 105W $1,712
Xeon E6510 4/8 1.73 GHz 12 MB 105W $744
There are all sorts of things Intel and AMD haven't done that they could start doing if they so desire.
And how exactly do you make a rectangular region that fits 5 modules but not 6? Why do you think there have been no 3-module dies proposed?
Have you seen Bob Colwell's lecture for Stanford's EE380 colloquium, where he talks about how shrinking the Pentium's FPU had no impact on die size? "If you take out Kansas, North Dakota and Texas are still the same distance apart!"
Like this:
http://bildr.no/thumb/834422.jpeg
Less than 9% larger than an 8 core.
A 12 core is 40% larger if you add 2 modules on a 8 core.
A 10 core is good business since it wastes less space. A 12 core don't, it's a lot harder to produce.
Not if you get higher yields on the 12-core, since only 5 out of 6 modules = 83% need to work.
How would they route the cache traffic between the five L3 blocks? I doubt that 5-module photoshop is a feasible design. It at least would need more space for the crossbar than the 4-module design. Your numbers don't look likely to me.
They might not even have 5 cache blocks, they may be settling for 4 anyway. Try to fit 6 modules and cache blocks in less than 40% larger die space.
And we have official roadmaps mentioning 10 core BDs. We don't have any roadmap mentioning 12 cores. So lets see how it turns out.
Nice design if photoshop is your CAD tool :ROTF:
However I suspect there are at least a handful of reasons why it's not such a simple matter :)
1. Just because those areas "look" like empty space doesn't necessarily mean they are, there's a strong chance a bit of it is synthesized logic, which hides itself well in the black & white photos. Compare the top part of the colored core to the rest of the chip to see an example (unless there's additional intentional obfuscation).
2. The photo doesn't show the interconnect/wiring track congestion at the higher levels, which could negate the ability to use such space.
3. [More nitpicky] L3 cache accesses appear to go through the crossbar (as opposed to Intel's ring bus), so who's to say how much additional wiring complexity/area would result from adding another core following the same methodology? The impact on latency would certainly be a factor too.
I strongly suspect that the module-L3 interconnect is a set piece of design that is done once and then replicated for each module. If they dissociate the number of L3 blocks from the number of modules, they would have to special-case route each module's interconnect, which would blow up the engineering effort required.
And there are latency issues involved too, as rcofell mentioned. They might have to redesign the module to tolerate variable distances to the closest L3 block, or worse, use two types of modules.
Yes, let us see. :p:Quote:
And we have official roadmaps mentioning 10 core BDs. We don't have any roadmap mentioning 12 cores. So lets see how it turns out.
I did it in paint. ;)
But yes, I understand all that. But adding 2 modules won't make the problem easier. My picture is only a suggestion, and since AMD talks about 10 cores and not 12 cores. I can't be to far off. Let me be the first to say that I would be a bit suprised if the 10 core would look like my picture. But I think some resemblance is possible.