Like I said. Duplicating all functionality to allow AVX on both halves simultaneously would be a waste of die space and power. IMO, it's better this way.
Actually this slide makes the point even better:
Basically the FP unit can process 2 128-bit instructions per cycle, whether they are FMA, SSE, or AVX (yes, there are some 128-bit AVX instructions). But a 256-bit instruction requires the whole module.
Bookmarks