I hope Intel will implement FMA in Haswell in the right way, keeping a throughput of two instruction per cycle (FMA+FMA or FMA+MUL/ADD). One FMA has no advantage over separated MUL & ADD (except a bit better accuracy), but some serious disandvatages.



Reply With Quote

Bookmarks