Well good at least nVidia is working on parallelizing more of the pipeline.
Serial would be better, but clocks have been stuck in 600-700 range for half-a-decade, so like CPU, only way around it to ensure GPU is not starved, is parallel setup engines.. and all the coherency issues assossiated with it.
How easy is to to scale from thousands of threads to millions? Buffers and register files are already enormous.
Problem is that graphics is assumed to be infinitely parallel. But you can only work on about 2 million pixels at a time, before starting next frame. What happens with 20 million triangle tessellation demo running at 800x600... 40 triangles/pixel on avg - which you cant distinguish. Of course we're still some years away.
But perhaps the greatest challenge is the efficiency ceiling. Some shader applied to block of 15 of 16 pixels is ~94% resource efficient. But extrapolate to dozens of shaders, and 0.94^12 doesn't look so good. GPUs already use an enormous amount of optimizations, from non-sqrt Z calc, to buffer compressions, to normal/bump/stencil maps, mip maps and LOD. It seems only natural that its more and more difficult to improve this fine tuned model.
Bookmarks