CUDA 3.1 just released.
http://developer.nvidia.com/object/c...downloads.html

Support for 16-way concurrency allows up to 16 different kernels to run at the same time on Fermi architecture GPUs