Quote Originally Posted by Hornet331 View Post
Well, I said i didn't knew how to compare it to others, so if the 280gtx hits 370Gflops in real, it doesn't look that bad. Any numbers on the HD4xxx/5xxx?
Well, I did a little google research and found that with all the "stream computing" hype from AMD there are almost no official performance numbers. Still there are some 4870 numbers flying around on AMD forums. So here is it:
Up to 200 GFLOPS using OpenCL:
http://forums.amd.com/devforum/textt...readid=120413&
540 GFLOPS using IL/Brook+ (from AMD official):
http://forums.amd.com/forum/messagev...hreadid=105221
Some guy stated he was able to extract 880 GFLOPS using L1 texture caches but no confirmation that this method is usable in general:
http://cerberus.fileburst.net/showthread.php?t=54842
Also nice sum by AMD guy:
Although 7XX has multiple methods to access memory(a lot more than 2 if you read the ISA doc). OpenCL currenly only has one as the OpenCL programming model is pointer based, so all data has to be fully coherent(this is ignoring images which is read_only or write_only, not both). This does not allow the use of the texture unit in the same way that brook+/IL can use the texture unit. Brook+ does not allow you to alias pointers(unless you explicitly allow it) and IL you do so at your own risk. Writing to memory and reading from that same memory with the texture unit does not produce deterministic behavior. OpenCL requires that all writes and reads to global memory are coherent, so this approach is not feasible. This is a performance hit compared to a streaming model because the GPU is natively a streaming device. There is another performance hit for the R7XX since it was not designed with OpenCL in mind, our new HD5XXX series was.
One of the goals of the Stream SDK is to provide a full software stack for many different types of programmers.
That means if you want performance, AMD provides CAL/IL to do that. If you want ease of programming to the streaming model, we also provide Brook+ to do that. If you want to program in the same language across multiple devices from the same source, OpenCL.
I think that the bigest advantage of Larrabee is its ISA so you don't need to deal with various proprietary APIs. Also its memory model (coherent caches, general purpose mem hierarhy) alows much higher flexability in code development.