Seriously guys. It's just an IGP on the CPU. AMD calls it an 'APU', that's just marketing to make it sound more fancy than it really is. Same as the graphics cards have 1600 and 512 'cores'.
Printable View
may BullDozer has a big leap in FPU performance due to the integrated GPU .....
Zambezzi is project Bulldozer, because Bulldozer is project codename K11, Orochi is CPU nick and Zambezzi is core (similary as K10-Phenom-Deneb)
Just FYI,AMD has stopped using Kx for their cores/designs some time ago.So K10 was in fact 10h. K8 was the last design that had a "starting" K letter in its name. Bulldozer is internal name for the uarch. project but I know there is one other name AMD uses for BD that is not known publicly(I don't know how they call it though).
Interesting thing about Bulldozer is that design was scheduled and targeted (originally) for 45nm node and 2009. But some thing(s) got in the way (die size at that node,AVX specs emerged in 2008,core rework in order to be competitive with intel's 2010/2011 cores etc.) so they postponed the core launch some 15 months from the original plan.That's not so bad since they had a backup plan in the form of San Marino platform and Magny Cours MPUs to tide them over until they "shrink" the BD to 32nm.So essentially BD is coming much improved and at smaller node in 2011,instead @ 45nm in 2009.
For a software developer it is MUCH more than "just an IGP". The magic happens here: "on the CPU".
If a CPU can manage for example 50 GFLOPS with two FPUs, an integrated GPU in a CPU with 120 SPs could do 80 GFLOPS, that alone would give HUGE floating point performance, the FPU part of the CPU could be dropped and all the floating point arithmetric moved to be handled by the APU.
...OR then use FPU and APU, voila, 130 GFLOPS on a (mainstream) CPU. The thing which interests me is the interface between the APU and OS. Is it just a graphics processor which needs drivers, or can the OS see it as a co-processor? Can the OS see it at all? What kind of ISA?
No new driver model obviously. So the whole thing is the most simpliest? Just a GPU moved to the same package and thats it?
For a few reasons it sounds somewhat strange. As the GPU has huge advantage over integrated FPU(parallelized workloads) of a CPU, what exactly is there which stops AMD from extending the ISA(for these parallel workloads) and use the GPU for floating point arithmetric? Of course, there is the issue between x86 and RVxxx ISA and defining a standard ISA for integrated GPUs would mean that future GPUs could not be used as APU due to differences in ISA. And as far as I can think, the ISA should be same between Intel and AMD integrated GPU for a standard ISA extension to happen.
Apart from that, in theory it should be very possible to run floating point operations on the dedicated cores of the GPU, granted that the ISA will be extended for this.
Even if nothing like that, ATI Stream on the APU and all GPGPU that way.
Whatever about whatever, but Zambezi is a very cool name for it! :p:
AMD needs a new(higher IPC) Uarch. badly. We all know this, this should be job one for the next latest Uarch. If BD isn't on par with INTC lastest, its a huge problem. And the seemingly delays to BD, Orochi or whatever is killing this company. I don't believe that AMD can withstand another Barcelona fiasco.
RussC
So AM3r2 (AM3+ or whatever you want to call it) probably won't work on an AM2+ socket like msi k9a2 plat? I would really enjoy that, but who knows if it will support ddr2 at that point. That would most likely be the real issue, that and bios support (I hope MSI steps up if that actually occurs).
i would rather stick with AVX for SIMD on cpu. putting a gpu on die is not that revolutionary.
i cant know this, but im 99% sure...
they dont need to change the machine code of their gpu, if they translate things efficiently from x86 to gpu machine code, thatll work fine... in the end all x86 cpus do is translate x86 to risc pretty much... so yes, that would work... but then how do you use the gpu as a gpu? youd have to talk x86 to get through to it... and that wont work, unless you come up with a compiler that translates gpu machine calls to x86... thats no weekend project! :D
in theory you could have two interfaces for the gpu, one through the x86 decoder and one through the traditional gpu call to machine code decoder... that would work, you couldnt use the gpu for both at the same time or would have to find some way to balance and prioritize work... but yes, this would work...
but while this is all possible, it wont JUST happen...
you have to walk before you run... amd is standing right now...
the first step is moving the gpu on the cpu package, the second is moving it into the cpu silicon, the third is properly merging them, and that takes several steps actually until it works really well i think...
while its possible to skip step1 and go integrated into the cpu directly, its VERY unlikely that amd, with all their castrated budgets and way overworked engineers jumps from 0 to step3 of having the gpu integrated and merged to the cpu...
Thuban is barely an improvement over current Phenom II. It has two extra cores but it still have the same 6MB L3 cache and it use 45nm
And Thuban is simply a Desktop version of AMD Istanbul which according to reviews performed worse than Quad-core Xeon based on Nehalem architecture
I guess AMD will use Rivers names for BD CPU's (Zambezi and Llano are also cities, but that wouldn't make sense).
Purely speculating: If BD actually needs more than the 938 pins (31 x 31 - 23) in AM3, is it possible that AM3r2 have more than that?
Let's say the AM3r2 socket would have 35 x 35 -(>23) holes.
- An AM3 CPU would fit in the center of the socket, surrounded by two rows of empty holes on each side.
- An AM3r2 CPU would have a larger package and all ~1200 pins, and use the whole socket.
Is this too crazy? The biggest problem would be people who doesn't understand how to mount the CPU.
I'm not saying that BD must have more pins, but it's quite strange that AM3 (K10) CPU's will get the newer socket too, according to the link.
Why would they do that? That's the most weird piece of info in that link. Unless Google translate got it all wrong . .
The R6XX uarch is a VLIW design iirc
what for? theres no game that benefits from double precision performance afaik...
and good point about the added pins...
they will need at least some extra pins for the video out signal...
hmmm but maybe there are enough unused pins in an am3 socket for that?
does anybody know how many unused NC pins there are in an am3 socket?
Can APU replace FPU x86 ? Can APU execute SSEx instructions ?
What is the point of this? All the goodness of SSEx (x86 fpu isa isn't used in 64-bit OSes any more) is that it is executed inside CPU's pipeline (speculative & out-of-order). If you send those instructions to the external execution unit you will probably meet serious performance penalties. Even if this external execution unit can execute much more instructions per cycle then internal fpu, it doesnt realy help because of very limited ability for the code parallelization in the current apps.
In my opinion APU is something like Cell's PPE - small processor with its own simple ISA and wide vector unit (or array of execution units). Then the purpose of main core will be to send tasks to those APUs and programmers will need to deal with writing small programs using some proprietary ISA for those APUs (not really a beloved job for a Cell programmers). Or it can be x86 compatable (something like Larrabee integrated in CPU core).
Here's the translation I was talking about.
Too early to see the whole picture here, but since they have everything from low end Llano CPU's to Zambezi, I just can't understand why they're keeping the K10 AND giving it the new socket, if it already works as it is in AM3r2?Quote:
In 2011, AMD plans to bring the power users, the beginning of Scorpius, which consists of 32 nm technology in the preparation of the Zambezi-koodinimellisestä processor, featuring four or more cores. At the same time the current position AM3 processor will update apparently second-generation version of R2.
A similar situation was when 65 nm Agena showed up. But back then, AMD needed their old K8's since they didn't have any new K10 dual cores yet. Still, they never gave the K8's the AM2+ specs, and nobody missed it anyway.