Quote Originally Posted by Solus Corvus View Post
Actually this slide makes the point even better:



Basically the FP unit can process 2 128-bit instructions per cycle, whether they are FMA, SSE, or AVX (yes, there are some 128-bit AVX instructions). But a 256-bit instruction requires the whole module.
When you consider that few apps will take advantage of 256-bit AVX, why would you want lots of die space and power budget dedicated to 256-bit AVX? Especially in client applications.

You'll see 256-bit AVX in some HPC apps, a few financial apps. Mostly they will be custom apps, not commercial apps.

128-bit AVX will be far more common because there is a greater likelihood of 128-bit pipes being filled. Today's 128-bit SSE is probably only partially filled on most cycles (when it is actually being utilized) so the thought that suddenly it can be filled with 256-bit on applications, especially on the consumer side, is a bit far-fetched.

90% of the processing that happens in most environments is integer today, not FP. That means a lot of empty cycles for the FP unit.