Quote Originally Posted by Chumbucket843 View Post
IPC isnt the same across different architectures. for example a single SSE instruction can do 4 multiplies on 32bit floating point numbers in one instruction (mulps). fmul can do only one. yes, sse is explicitly data parallel but that is part of the weakness of ipc measurements.

a better example would be a sine function. you can use the taylor series to get a good estimate. modern x86 cpu's take ~40-100 cycles to execute the fsin instruction.

taylor series approximation:
x - (x^3)/3! + (x^5)/5! - (x^7)/7!

2 subtractions
30 multiplies
3 divide
1 add

36 arithmetic operations in a RISC processor is equal to 1 (very slow)instruction in x86. this is a select case. normally risc uses 30% more code space.

this algorithm has room for improvement actually. we can store the value of x to a power and save many redundant multiplications with a look up table. i.e. compute x^3 then multiply by x^2 or add the exponents. evenutually algebra will give you a nice shortcut.
Yeah, but approximation mean it's not the good real result of the function. So it's a mistake to use it.


---------

About BD @ hotchips, what about that was said ? Now we have slides, but someone read them, or someone talked about BD in same time ????

No other information ?