MMM
Results 1 to 25 of 175

Thread: AMD does reverse GPGPU, announces OpenCL SDK for x86

Threaded View

  1. #11
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816
    Quote Originally Posted by Talonman View Post
    AMD does reverse GPGPU, announces OpenCL SDK for x86

    By Jon Stokes | Last updated August 6, 2009 6:15 AM CT

    http://arstechnica.com/hardware/news...dk-for-x86.ars

    "AMD has announced the release of the first OpenCL SDK for x86 CPUs, and it will enable developers to target x86 processors with the kind of OpenCL code that's normally written for GPUs. In a way, this is a reverse of the normal "GPGPU" trend, in which programs that run on a CPU are modified to run in whole or in part on a GPU.

    Why would you want to run GPU programs on a CPU? Debugging is one reason, if you don't have access to an OpenCL-compliant GPU. And for now, that's essentially what'll be doing, since the new SDK doesn't appear to be able to target GPUs, yet. But eventually, developers will be able to write in OpenCL and target multicore x86 CPUs alongside GPUs from NVIDIA, AMD, and Intel. Of course, when you can write once and target a variety of parallel hardware types, the fact that Larrabee runs x86 will be irrelevant; so Intel had better be able to scale up Larrabee's performance, because its x86 support will not be a selling point (at least for Larrabee as a GPU, though an HPC coprocessor might be a different story).

    Note that you can already write once, run anywhere for GPUs and multicore x86 already, but you'd have to use RapidMind's proprietary middleware layer. Because it's more than just an API—the middleware does just-in-time compilation targeting whatever hardware is in the system, dynamic load-balancing, and real-time optimization—an OpenGL vs. RapidMind comparison is a little bit apples-to-oranges, but only just a bit.

    In reality, few workloads are such that you can break them up in the design phase into parallel chunks so that a middleware layer can dynamically map them to hardware resources at run-time. Certainly there are some problem domains that this works for—finance is one that comes to mind at the moment—but these are very specialized (though profitable) niches. Most of the stuff that ordinary developers will want to do with GPGPU in the medium-term is more mundane and application-specific, like using the GPU to speed up some specific part of a common application in order to give a performance boost vs. the CPU alone. In other words, these common apps don't solve data-parallel, compute-intensive problems—rather, they have specific parts that need acceleration, and if there's a capable GPU available then they can use OpenCL to hand off that part to it.

    Note that Snow Leopard will come with an OpenCL implementation that works on both CPUs and GPUs. Ars will have a review when it launches, so stay tuned."
    __________________________________________________ ________

    I find this odd...

    Could this mean that ATI is having issues getting OpenCL to run on their GPU's?

    Note the last comment...
    "Snow Leopard will come with an OpenCL implementation that works on both CPUs and GPUs."
    it is because the GPGPU vs CPU paradox ... I tried to explain it in my blog ...
    At OpenCL : http://www.khronos.org/opencl/
    when you look at the OpenCL API, you ll understand very quickly that CPU has some skill today that the GPU does not have yet ... With many cores, you need alot of Caches, to store the data between the steps of your OpenCL procedure calls. If you go out of the Socket or your GPU, your performance sucks ... The tricks that nVidia use with CUDA only works if you touch your data 1 time. NV uses the thread scheduler to freeze a thread when you have a stall because memory access, and move to the next thread and come back to the 1st one when the memory request is done. They use an hardware scheduler.
    This works when your algorythm is not f(n-1), when each loop has no relation between loop n and loop (n-1)
    what ever you saw yet out of CUDA is bunch of cornet case algorithm, and to prove that it is not generic, you can not get any version of the Spec_int or Spec_fp on those GPUs , while some part of Spec are "very parralelizable"

    AMD found that x86 is very flexible for 128bits vector, with many loop using each other results ... And that is the base of x86 (via SSE2)! I am sure they will add more support for the GPU when it does make sense, when it is massively parallel, and take advantage of the GPU acceleration. With AVX coming, it will be a really big challenge to beat the processor at 256 bits FLOATs and Double processing.

    the other part is the memory size limit, the GPUs today are seriously limited, places were you need TFLOPS requires a lot of Memory ... right now, the GPGPUs are using their DDR5 as a cache to the main memory ... PCIexpress is a very poor cache protocol ...

    The GPUs are in a world that requires the programmer to make its code 100% perfect, load have to be aligned 100%, store too, vectorization has to be done perfectly ... It is a world of Pain, and if you don t have a programmer with a PhD in parralelism or SIMD, you will not see the end of the project ... If you want OpenCL to perform well on GPU, you still have to plan for all of those tricks.

    So, AMD will probably increase their support for GPGPU, over time, they got their own stable starting point, and it is always x86.
    you will see the CPU wining all the very iterative workloads on OpenCL.

    Francois
    Last edited by Drwho?; 08-10-2009 at 07:44 AM.
    DrWho, The last of the time lords, setting up the Clock.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •