Quote Originally Posted by trinibwoy View Post
The proportion is also affected by the speed of the CPU. i.e the faster the CPU, the smaller the serial part becomes. Yes, it's just a fancy way of saying that as you remove the CPU bottleneck the proportion of parallelizable work tends toward 100% of the total. But even on the GPU a lot of stuff is serialized - geometry setup for example. Hence the multiple setup engines in Fermi. The more stuff you parallelize, the easier it is to scale performance.
Well good at least nVidia is working on parallelizing more of the pipeline.
Serial would be better, but clocks have been stuck in 600-700 range for half-a-decade, so like CPU, only way around it to ensure GPU is not starved, is parallel setup engines.. and all the coherency issues assossiated with it.

Quote Originally Posted by Chumbucket843 View Post
not much, gpu applications like graphics are embarassingly parallel and this parallelism increases with problem size. i.e. if you double the pixel count you double parallelism. this law sounds very grim but truthfully its not. gpu's are already running thousands of threads to hide latency.
How easy is to to scale from thousands of threads to millions? Buffers and register files are already enormous.

Problem is that graphics is assumed to be infinitely parallel. But you can only work on about 2 million pixels at a time, before starting next frame. What happens with 20 million triangle tessellation demo running at 800x600... 40 triangles/pixel on avg - which you cant distinguish. Of course we're still some years away.

But perhaps the greatest challenge is the efficiency ceiling. Some shader applied to block of 15 of 16 pixels is ~94% resource efficient. But extrapolate to dozens of shaders, and 0.94^12 doesn't look so good. GPUs already use an enormous amount of optimizations, from non-sqrt Z calc, to buffer compressions, to normal/bump/stencil maps, mip maps and LOD. It seems only natural that its more and more difficult to improve this fine tuned model.