The ATI Stream Team is proud to make available the fourth beta release of ATI Stream SDK v2.0 that provides the first complete OpenCL™ development platform. This release is certified fully compliant with OpenCL 1.0 by the Khronos Group and is supported on a wide range of AMD GPUs as well as any x86 multi-core CPU supporting SSE3. AMD offers the market both high-performance CPU and GPU technology, and as such we are delivering on this unique ability to provide an OpenCL platform that enables developers to create applications that run the way they were meant to be run, on all the available processors in the system! The beta is available for immediate download as part of our ATI Stream SDK beta program and we encourage you to take a look. For an introduction, see Simon Solotko on the impact of open, paralell computing.
Dealing With Reality | The Introduction | ATI Stream Technology and OpenCL | Part 1 by Simon Solotko
And for anyone who has writen in C or C++, I thought there was a gap between some of the deep dive technical material and the glossy material so I interviewed AMD's Ben Sander and we explain how OpenCL works and some of the basic development methodologies.
Dealing With Reality | The Introduction | ATI Stream Technology and OpenCL | Part 2 by Simon Solotko
Edit - Good discussion point that explains how some of this works:
Brace yourselves. The Kernel is not compiled until just prior to execution (hardware-specific native Kernels seem unlikely in practice since the whole idea is to be hardware independent but the HPC guys might do this) when it is directed to be built/compiled by the application. The compilation step is hardware dependent; if the application is executing on a platform with an x86 processor and an ATI GPU the Kernel will have either path available for compilation, potentially even determined at runtime! So if the GPU is busy, your applications could complile the Kernel for the CPU, execute it, and at a later time, compile it for the GPU, and execute it! You could hold an iner-application race, compiling separate instances for both, one for x86 and one for the GPU, and running them concurrently! I describe some of this in the interview: http://links.amd.com/openinterview
Bookmarks