IPC isnt the same across different architectures. for example a single SSE instruction can do 4 multiplies on 32bit floating point numbers in one instruction (mulps). fmul can do only one. yes, sse is explicitly data parallel but that is part of the weakness of ipc measurements.
a better example would be a sine function. you can use the taylor series to get a good estimate. modern x86 cpu's take ~40-100 cycles to execute the fsin instruction.
taylor series approximation:
x - (x^3)/3! + (x^5)/5! - (x^7)/7!
2 subtractions
30 multiplies
3 divide
1 add
36 arithmetic operations in a RISC processor is equal to 1 (very slow)instruction in x86. this is a select case. normally risc uses 30% more code space.
this algorithm has room for improvement actually. we can store the value of x to a power and save many redundant multiplications with a look up table. i.e. compute x^3 then multiply by x^2 or add the exponents. evenutually algebra will give you a nice shortcut.




Reply With Quote
Bookmarks