
Originally Posted by
Chumbucket843
IPC isnt the same across different architectures. for example a single SSE instruction can do 4 multiplies on 32bit floating point numbers in one instruction (mulps). fmul can do only one. yes, sse is explicitly data parallel but that is part of the weakness of ipc measurements.
a better example would be a sine function. you can use the taylor series to get a good estimate. modern x86 cpu's take ~40-100 cycles to execute the fsin instruction.
Bookmarks