Quote Originally Posted by Chumbucket843 View Post
IPC isnt the same across different architectures. for example a single SSE instruction can do 4 multiplies on 32bit floating point numbers in one instruction (mulps). fmul can do only one. yes, sse is explicitly data parallel but that is part of the weakness of ipc measurements.

a better example would be a sine function. you can use the taylor series to get a good estimate. modern x86 cpu's take ~40-100 cycles to execute the fsin instruction.
Aren't modern FPUs implementing trigonometric functions like sinus with internal lookup tables?