Quote Originally Posted by Deimos
Therefore, doesn't matter if you got Fermi, G200, or G92, if "only" 90% of code can be parallelized, anything beyond 10 SP or CPU wont be faster. Thus with 512 "cores", maximum efficiency requires 99.8% parallelization.
The proportion is also affected by the speed of the CPU. i.e the faster the CPU, the smaller the serial part becomes. Yes, it's just a fancy way of saying that as you remove the CPU bottleneck the proportion of parallelizable work tends toward 100% of the total. But even on the GPU a lot of stuff is serialized - geometry setup for example. Hence the multiple setup engines in Fermi. The more stuff you parallelize, the easier it is to scale performance.