
Originally Posted by
Hans de Vries
It's not that "crippled", not by a factor 2 (=256/128). For example:
If an SIMD FP add takes 4 clock cycles then:
128 bit: A+B+C takes 8 clock cycles.
256 bit: A+B+C takes 9 clock cycles. (using pipelined 128 bit hardware)
128 bit: A+B+C+D takes 9 clock cycles.
256 bit: A+B+C+D takes 11 clock cycles. (using pipelined 128 bit hardware)
Bookmarks