(Best apple video ever.)
Printable View
Lies, iflea > all :)
http://www.youtube.com/watch?v=UvPj22jANDw
Andy Glew has a few things to say about Bulldozer and considering his background and experience, you may want to listen what he says. Lots of details about the inner workings of Intel and AMD projects :
http://groups.google.com/group/comp....736a56?q&pli=1Quote:
BRIEF:
AMD's Bulldozer is an MCMT (MultiCluster MultiThreaded)
microarchitecture. That's my baby!
DETAIL:
Thursday was both a very good day and a very bad day for me. Good,
because my MCMT ideas finally seem to be going into a product. Bad,
because I ended up driving 4 hours from where I work with IV in the
Seattle area back to Portland, to my wife who was taken to a hospital
emergency room. The latter is personal. The former is, well, personal
too, but also professional.
I can't express how good it feels to see MCMT become a product. It's
been public for years, but it gets no respect until it is in a product.
It would have been better if I had stayed at Intel to see it through.
I know that I won't get any credit for it. (Except from some of the guys
who were at AMD at the time.) But it feels good nevertheless.
The only bad thing is that some guys I know at AMD say that Bulldozer is
not really all that great a product, but is shipping just because AMD
needs a model refresh. "Sometimes you just gotta ship what you got." If
this is so, and if I deserve any credit for CMT, then I also deserve
some of the blame. Although it might have been different, better, if I
had stayed.......
Let's hope his contacts at AMD are wrong and the product lives up to the hype. Otherwise, there would me quite a few facepalms around here. Read his other posts in the thread, it's fascinating how many different paths AMD took and ended up with the current K10, the lousiest of all.
I have no idea who Andy Glew is, but he left the company several years ago.
I work with engineering teams and the general feeling is that future prospects today are far better than they have ever been, so I am not sure what his agenda is.
The world seems happy to declare bulldozer dead before anyone ever has silicon in their hand. As someone on the inside, I will say that as I look at our products, there is a pretty strong belief that Magny Cours will be a significant game changer in the server business and that Bulldozer will have a similar, if not greater impact on the market.
So now you have 2 opinions, one from someone no longer connected to the project and one from someone who is connected. You decide for yourself which you want to believe.
Since 2007 there have been significant changes structurally to both the teams and process, and we find more efficiency.
Shanghai was 3 months early with higher frequency than expected.
Istanbul was 5 months early with higher frequency than expected.
Magny Cours will be 1 quarter earlier than expected with higher frequency.
We are executing strongly these days, problems from the past are just that - from the past.
things are looking up for AMD.
intel= cpu only
nvidia= gpu only
only one company has the complete CPU+GPU designs to leverage a complete computing solution.
AMD.
I am pretty sure the older guys from the CPU teams know who Andy Glew is.
Umh none. The guy is simply an uber geek who happens to have quitte a reputation in the CPU world for coming with revolutionary new stuff.Quote:
I work with engineering teams and the general feeling is that future prospects today are far better than they have ever been, so I am not sure what his agenda is.
Besides, he is extremely happy that AMD is incorporating his idea of a cluster uarch; the bad news come from AMD people themselves if you read his post.
What you feel <> what architects inside the company feel.
How is Magny-Cours going to be a game changer when a 6 core Istanbull loses on all major benchmarks to a 4 core Nehalem, and that is at roughly similar frequency. For MC, you double the core count in an inneficient way, drop the frequency to 2.1-2.3GHz. Scaling would definetly suffer.Quote:
The world seems happy to declare bulldozer dead before anyone ever has silicon in their hand. As someone on the inside, I will say that as I look at our products, there is a pretty strong belief that Magny Cours will be a significant game changer in the server business and that Bulldozer will have a similar, if not greater impact on the market.
OTOH Nehalem EX adresses exactly that with an innovative ring bus uarch for connecting the cores and the L3 ( same as GPUs ) and a few other goodies.
Considering this, will AMDs position in the market improve ? I doubt it. At best it will stay the same. Bulldozer can change that, but it needs to deliver.
Why not ? They offer similar performance levels to current IGPs , be it from ATI or NVIDIA. Secondly, they are more than adequate for their role; given there is no charge for them, why should anyone complain ?
If you care about gaming, you buy a discrete card.
As for Larrabee, the project is ongoing and more and more resources are poured in. Fortunately for Intel, they can afford it without interfering with the main CPU teams.
The 1st version will arrive as development vehicle this year allowing developers to get a feeling what it is about. The 32nm version due to ship next year is probably the marke or break part for the project. This we will have to wait and see.
Nvidia can make really good ARM based processors. The A9 and A8 designs are just reference a big enough company can change the arc or make a new one if they want.
A9 is all ready OoO and Nvidia can improve on it, just sticking a better ipg does not do it for me. I do expect Tegra 3 to be more than a ARM A10 + GF100.
Intel also have invested quite a bit on lrb and the tech may be used first in haswell or maybe a discreet card will be available before who knows.
Does anyone have any questions they'd like me to pose? I will have access to some very high level eng. and i should be able to get some detailed answers :)
Apart from obvious ones?
- will Bulldozer incorporate improved interconnects for MCM packaging.
- what will be maximum memory speed supported by IMC.
- how many DIMMs IMC will be able to drive (single, not MCM)
- which extensions will be supported (SSE(S3,4,4.1,4.2,AVX,3DNow,etc) and to what degree
Just a few :)
What are the anticipated frequencies of bulldozer?
russian: think, u are man, who had at home Deneb C0 revision?:)
I think you've missed thread? ;)
this is correct one: http://www.xtremesystems.org/forums/...d.php?t=243714
#1 is already known. AMD (in a blog, if I recall) has stated that the interconnect in their MCM product uses an even faster version of the normal HT link used in MP systems. Yes, Istanbul has a snoop filter. What isn't known is how it will apply. Given the extra apparent bandwidth, MC may conceivably not use any filter between the two modules. It would be completely unnecessary for two CPUs in one package just as it is for current 2P systems. (CPU0 goes, "Hey, I don't have that memory address. Since there are only two of us, it must be at CPU1.") I would imagine how that works inside the package will be dictated by how they've rigged up the newly-combined (quad-channel DDR3) memory controller. If each chip in the MCM is set up as one NUMA node, I suspect we'll see the potential for the snoop filter to exist between both dies on the same package when there are 2+ CPUs installed. It makes the most sense though, I think, that they are treated as one. Otherwise it isn't so much "quad-channel" as they've been marketing it as it would be "dual, dual-channel". There's a real difference there.
And that's the question I would like answer to :)
AMD might have something more than Istanbul like Snoop filter.
They also might decide to upgrade HT links on MCM to full 16-bit, double pump freq. compared to external links, etc... There are a lot of things which can be done if they give enough benefits in targeted usage patterns.
Is it me, or is that drawing confusing? Or, incomplete.
Xoulz, each Sao Paulo core inside the Magny Cours have 4 HT links, 3 of them are Cache Coherent. (the 4th one is to connect the cpu to the chipset)
Both dies are connected by one and a half HT link (x16+x8) and there is another one and a half HT link for each die to connect to one of the dies on the other Magny Cours (in a 2P system).
According to a presentation, the internal links work at 6.4 GT/s. Their way of connection looks like this:
http://www.planet3dnow.de/vbulletin/...1&d=1259239385
Thanks for that, it does seem that Lisbon "This is MCM Lisbon rite" is quite different as compared to Istanbul...
http://hothardware.com/newsimages/It...-cpu-die-1.jpg
Also since Thuban is suppose to be based on Lisbon this is quite exciting, where is the L3 by the way on the slide?
thought this thread about bulldozer not something we've already seen >_> ?
I don't get this bulldozer anymore.
Lisbon will be very similar to Istanbul, but it is a new stepping with some new features not found in Istanbul. Thuban is a desktop die that has features you won't find in server.
Snoop filter is enabled in all server chips (assuming 2P or greater combinations.)
1. All communications between the die happen at package speed.
2. That is not being disclosed yet.
3. For server we are supporting 12 DIMMs on MC and 6 DIMMs on LS (platforms for DB are the same, but could be greater)
4. Not sure on that, would need to look it up.
Oouu, JF is here so i cant talk freely ;-)
How accurate is that information ?
Last time you commented on desktop products was something like "there wont be any desktop 6 core CPU" :)
I appreciate all your server information, but it seems to me that your desktop knowledge is limited. Has anything happened that changed this since the last 6core comment ?
Thanks
Edit:
I hope he is referring to more than unbuffered & registered memory ;)
regards
Opteron146
Well, since I am in the server world, I don't comment on desktop plans. I have personally said in the past that I didn't see a need for a 6-core desktop processor. There were requests to take the Istanbul die and turn it into a Phenom and I said this would not happen because Istanbul is a server-only die. That continues to be true.
As a server guy I know little about the client world.
And bulldozer will support both unbuffered and registered memory. Those DIMM counts are for registered memory.
You just did :)
Ahh .. that is too much bean counter / marketing slang. Yes - Istanbul is a server die, because you marketing guys named it like that in the server segment / Socket-F package. Let's use the more practical engineering term "Hydra". A Hydra would fit nicely into an AM3 package, AMD's designs are very flexible, you should know that better than me :)Quote:
There were requests to take the Istanbul die and turn it into a Phenom and I said this would not happen because Istanbul is a server-only die. That continues to be true.
If it would be a great success in the desktop segment without a Turbo mode is another question.
Anyways, I guess your desktop colleagues would came up with a nice "desktop die" code name very soon - and of course that desktop die would be desktop only and be never be usable in servers.
It would have been the same as with the Greyhound die, which was used exclusively as Deneb desktop die. Once upon the time there were plans for special server dies named Shanghai and Suzuka. However it was not possible :D ;)
Of course, because Bulldozer is not the exclusive server die code name. I told you that AMD designs are flexible, didnt I ? ;)Quote:
And bulldozer will support both unbuffered and registered memory. Those DIMM counts are for registered memory.
However, we spoke about Lisbon and Thuban before. That are again the "special" desktop / server die names. I wonder if the difference is more than the memory support.
Thanks for the reply !
Yeah, I know that you're a server guy.:)
According to the "tradition" (since K8) AMD always uses its server dies on (high end) desktop.
Italy = Toledo
Santa Rosa = Windsor
Barcelona = Agena
Shanghai = Deneb
The difference is the package and the desktop versions don't support the registered DIMMs.
Thanks for clarifying that, JF-AMD.
Any hint on the interconnect of the Bulldozer variation of Magy Cours ? :)
Thanks JF-AMD for clearing some of my questions!
Much appreciated.
Desktop die, server die, what is the difference? We start out with the same base design, for instance Shanghai was the same as Deneb (I think, again I don't know my desktop code names). The cores are identical.
The memory controllers are the same, you could support unbuffered or registered memory on the same controller (though not at the same time). It is the memory validation that drives the platform. Desktop would probably never validate for registered memory because a.) nobody would probably want that and b.) it eats up a lot of validation and support cycles (OEMS don't like it either because it causes more work for them.) We don't validate for unbuffered memory even though we could support it because nobody wants a server limited to 8GB of memory.
When it came to Istanbul, that was a server only design. Desktop never asked for a version, so when we did the design, all considerations were for server. We added things like APML and HT Assist that desktop would never use. It was, and it still is, a server-only design. Lisbon is the same way.
Thuban, as I understand it, is probably based on the same general design, they probably took out server features and added desktop features.
Once you punch out a wafer of Istabuls or Lisbons, you could put them in any package, but they would still be Istanbul or Lisbon. There is a feature set that makes them different from desktop. It's not a fusing recipe during APM that makes it a server die or a desktop die, it is the actual design.
JF: There are already (exists) first samples for server Bulldozer? Thx.
http://translate.googleusercontent.c...1M32ea2kxmQ_kA
Very interesting read.... the approach is quite conventional "hehe get it"
Interesting read
Anyway, BullDozer is confirmed to have lower single threaded int performance compared to K10.
But multi-threaded and ILP apps would be very beneficial.
Hopefully AMD could fuse GPU into BullDozer module concept. Throw the FPU load to GPU, so that AMD could add more performance on int with CPU transistors
Lol confirmed by whom?? Hiroshige Goto's ranting? Yeah right. :rolleyes:
You cannot have higher multithreaded performance (ie. QC Orochi Vs QC K10) if single thread performance is lower in Orochi's case.It's simple logic at work. Even AT got the 35% higher integer performance,on average, in QC vs QC case(yeah,that's 2 module BD vs QC K10 btw). Add in that BD will come in 4 module version to desktop,so that's 2x better than the 35%.Add in that it will definitely have better IPC per core and per clock than 10h(quote me on that,i'll admit i was wrong if not true),add in it will have agressive Turbo mode,add in it will have 2x as powerful FPU thanks to FMAC units inside the dual thread SIMD unit,add in it will work at higher clock to begin with,and you end up at much better place than the one Mr. Goto is portraying in his blog.
Ehh .. now you contradict yourself:
If I use your argument from the first paragraph then it would be:
The cores are identical - Lisbon or Thuban.
If I reuse arguments from the 2nd paragraph to Shanghai / Deneb:
Once you punch out a wafer of Shanghais, you could put them in any package, but they would still be Shanghais, no Denebs. There is a feature set that makes them different from desktop. It's not a fusing recipe during APM that makes it a server die or a desktop die, it is the actual design.
The only way to explain that is, that you really mean the Thuban and Istanbul / Lisbon will be based on different designs and different masks.
That would be the 1st time in the whole AMD history. I can hardly believe that ... but I will trust you on that til Thuban's presentation.
If there are still 4 Hypertransport links on the Thuban die, be prepared for my complaint :D
Have a nice sunday
I dont see anything there.
If you refer to the 2pipe design, that does not mean much. K10 has 3pipes, yes, but the front end is inferior to Bulldozer's, i.e. K10's 3 pipes are used seldom.
If I have to choose between a 2pipe Design with an average load of 90% and a 3 pipe design with a useless 3rd pipe, I would choose the first one ;-)
It would save die space and power, too. Furthermore, due to simplicity of the design, it could increase clock headroom, too.
Actually I think that is BD's biggest advantage. Because of the shared Front-End, it could be more sophisticated and complex than usual.
That will yield in a better utilization of the back-end.
So far - with the litte available information - the design looks quite stream-lined to me, I like it. It looks efficient and fast. I just wonder how much of the "information" will be true in the end.
I wouldn't necessarily say that the integer comment is correct. For instance, MC has 12 cores, Interlagos has 16 cores. 33% more cores but more than 33% greater performance. That sounds like faster to me.
That is the whole idea of the architecture, flexibility for multiple designs. Not necessarily sure that the FPU goes by the wayside, at least not until the software situation changes. There will still be regular applications that need to access floating point for a cycle or two, and FPU will be necessary for that. However, I would expect, that over time, a GPU can plug into the architecture the same way as a bulldozer module.
JF: bulldozer is planned for many CPU generations? Thats mean about next 3-4 years? First gen. Zambezzi etc etc...?
Generally speaking we are 12 months from tapeout to final product. That is why commenting on things is very dangerous because it signals where we are in the process. That can have an impact (positively or negatively) on stock price, which is why I have to be careful.
When we give tapeout or sampling information we release through coordination with investor relations to make sure that we are in full compliance with the law.
Talking about BD tapeout or samples would be considered "material" and I could land in hot water (or legal issues) for making statements in public that were not cleared.
this means lower clockspeed and higher latency which is bad for floating point and simd which is the wrong direction.
if you read quarterly reports you will see R&D for nvidia is similar to ati, dont really see how that's relevant to this thread though. amd is out spent in r&d by about 4x by intel but a lot of that could be manufacturing process.
I'm just happy that AMD doesn't have to put R&D dollars into fab processes now; GLOFO will do that. :up: That's extra money that can be used for processor/GPU creation.
it is still going to be a joint project when it comes to process technology, it is just that amd will only have to burden like 30% or less of the cost.
only time will tell if hector ruiz's legacy is favorable or not ;p.
strategically it would be stupid to completely spin them off this soon after they were formed. until they understand their new relationship, and global foundries becomes a more competent player in the fab market.....amd is going no where fast.
edit: i should say "most likely" heh even i don't know the future.
That's the idea. In the past the competition could attack in 2 areas: product and process, and to compete you have to spend in both (fyi, process is a lot more expensive).
Now, all we have to spend on product, someone else can spend on process. It was a strategic move more than a financial move.
WOW! So GF giving you guys the chips for free? I know you want to make it sound like it's all positive to go fabless but reality is very different since you are paying per wafer, die whatever and at a premium and GF will now have their own priorities that may not go inline with Amd's so they could push out process shrinks out if they see it in their best interest. I have to wonder how much you are really saving by losing control of your manufacturing and I would love to hear what your old CEO/founder Mr. Sanders feels about Amd's new fabless model.
No, but I would imagine it to be common sense that they will put their interest in front of Amd's interest. No? Is it that hard to believe that they could push out a shrink for a quarter or two to milk out the current process? BTW i'm not speeking of AMD business model but GF business model.
I may have no clue of the AMD deal but I do understand how the industry works. Anyways, there is always another side of the story and I'm thinking that the fabless deal is not a win win situation that was done merely for strategic purposes. Giving control of manufacturing is more important then you think especially when it comes to cutting edge tech that is required for processors. http://www.reuters.com/article/idCNLDE60J1TQ20100120
to stay competitive in business GloFo must continue same pace of process development if not even more intense. So there's no real worry about AMD staying behind in process race!
And regarding the cost - it's not like AMD in the past had free of charge wafers and chips! There's no reason why should they pay them more than before!
Not to mention that AMD will be GloFo's primary customer...
Not necessarily. Foundries do not need to make high end products to make a profit. A foundry by nature is a high volume low margin semis manufacturer and the cost to shrink is expensive, and it's not in GF best interest to compete with Intel in process shrink fro the following reason. As you know the older the process the cheaper it is to manufacture since most of the kinks have already been worked out of it. Recipe's are shortened and proc steps are combined with wip turns and die yields high and of course margins are at their peek. So why not stay at the older process longer while charging the same per chip/wafer and having margins high? Not trying to discredit Mr. Fruhue, I'm just providing a counter point to his win win comment.
With Amd no longer having a say in GF decisions then there is the possibility that GF could change their focus away from Amd no matter how much Amd keeps GF full, GF will still make decisions based on whats best for GF. Some of their interest will be the same like full capacity but others like shrink cycles and the cost that comes with it might not be so. Again, not saying that this will happen but loosing control of manufacturing has its negatives and this is one of them.
Exactly.AMD designs and produces one of the most complex MPUs out there and the cost is proportionally high. Not to mention that the netbook business in on the rise and AMD's Bobcat is going to be a big hit with OEMs(you can quote me on this).NOt only that,but with the clear modular approach introduced with BD cores,AMD will be able to address all the segments in the near future which guarantees GloFo will keep churning out their wafers to their most valuable customer. Last but not the least,ATi division is moving out to GloFo's SOI(Llano) and bulk(Northern Islands) silicon with 28nm node and this will additionally strengthen AMD's place in customer priority list of GloFo!
And here is another problem. How much is Amd going to have to pay for what GF is calling "custom" manufacturing?
http://globalfoundries.com/technolog...nced_tech.aspx
As you see from the link, it looks like they are going to be doing a lot of processes at the same time.