Quote:
But perhaps the greatest challenge is the efficiency ceiling. Some shader applied to block of 15 of 16 pixels is ~94% resource efficient. But extrapolate to dozens of shaders, and 0.94^12 doesn't look so good. GPUs already use an enormous amount of optimizations, from non-sqrt Z calc, to buffer compressions, to normal/bump/stencil maps, mip maps and LOD. It seems only natural that its more and more difficult to improve this fine tuned model.
they have already fixed that problem. pixel shaders run in 32 wide vectors on nvidia and ATi(although ATi can do 5 madds per pixel compared to 1 on nv). you dont have to double the vector width and its actually invisible to the programmer. all they have to do is double the number of SIMDs.