Just to formalize what you two are at:
- Speedup = Time_Old / Time_New
- Time = CPI * Instr_in_prog / Clk_freq
(CPI = cycles per instruction, the inverse of IPC)
To start from the beginning, additions to the ISA are will most-always require updates to the compiler and hence re-compile of the code. By using an applicable optimizing compiler (like Intel's ICC or AMD's x86 Open64), just compiling with the new instruction MIGHT be enough to exploit the instruction set well enough, heavily depending on how the original code was written.
I'm not quite an expert when it comes to modern compilers and what commonly is necessary for the programmer to prod at... but in the case of FMA4 (and FMA3), the target programs should at least already be written using intrinsics (a way to express SIMD structures to the compiler) and already have the related operations part of the algorithm. This is more applicable to software using image/signal processing methods.
To summarize, compilers that are updated to use (and hopefully optimize) for the new instructions are all that's needed to use them with your code. However, it's usually the case that the code has to be written (from an algorithmic and somewhat syntax perspective) such that the compiler knows what to do.








Reply With Quote

Bookmarks