Quote Originally Posted by Dresdenboy View Post
This is simple:
Theoretical max. 128b FMUL+FADD throughput of Zambezi w/o using FMA is the same as of a X4 per clock. So based on this it should perform lower. But CB is no synthetic benchmark (FMUL+FADD loop) and depends on a lot of other components. And as it is known it isn't that dependent on memory throughput due to data locality. So Zambezi's IMC shouldn't have much influence here.
Thanks for the input. If this is the case,why is then C10 version behaving differently? In this test we see a massive gain .And I doubt that Maxon guys completely rewrote the benchmark code. If you take a look at the link i posted(HW canucks),you can see that any perf. difference between ,say, 2600K and i7-875 is transferred from C10 to C11.5,by the digit(25%). I would expect similar behavior to be seen on Bulldozer too.
But who knows,maybe C11.5 is hitting some limitation in Bulldozer so that we have such a behavior in that test.