This is simple:
Theoretical max. 128b FMUL+FADD throughput of Zambezi w/o using FMA is the same as of a X4 per clock. So based on this it should perform lower. But CB is no synthetic benchmark (FMUL+FADD loop) and depends on a lot of other components. And as it is known it isn't that dependent on memory throughput due to data locality. So Zambezi's IMC shouldn't have much influence here.





.
Reply With Quote

Bookmarks