OpenCL is a softare layer ... with drivers ... it has concequences ... and overhead .. Assemble code is known to be the fastest code on any architecture.
A Simple layer between LrB new instruction and the programmer will do. we all know that as soon as a desktop application need performance, it does down to ASM.
Last word ... Drivers seems to be a problem, there are a lot of empirical about this ... an healthy PC uses less driver in the long run.
Before people make fun of the intel drivers, some should clean up on the front of their doors ... ;-)
http://media.arstechnica.com/news.me...stacrash-1.jpg