I don't know if you have followed bulldozer trheads but actually bulldozer has teh same throughput in all 3 modes: legacy SSE,AVX 128bit and AVX 256bit. This is because the way AMD designed their FPU(or FlexFP as they call it). You have 8 of these FMACs in 8 core chip. All of them are 128bit wide. 128bit AVX usually carries very little to no performance benefit over standard SSE(think 5-10%). This is even seen in Zambezi leaked Sisoft numbers:
Attachment 119979
As you can see 11% faster in 256bit AVX mode than in legacy SSE (128bit) mode.
With bulldozer,when you go to 256bit AVX you may even incur a small penalty ,but this is not the norm(compiler patches state up to 3% penalty and AMD encourages devs to use AVX 128 instead the 256bit one).
So point is: AVX(both 128 and 256bit) brings nothing or close to nothing since Bulldozer has same peak flops in all 3 modes I listed.
The only difference is FMA recompiled software which can bring additional 2x performance over AVX 128.At least this is what AMD listed in their HPC documents from last year. I can't find the pdf but I can link to a recent presentation which included a slide on FlexFP.A picture is worth a thousand words:
Attachment 119978
As you can see,same peak flops in all 3 cases. I rest my case.
BTW the leak that I linked above showed that Zambezi @ 2.8Ghz had 132mpix/s for SSE score and 147 for AVX.I already showed that Opterons score better than this(10% higher than Zambezi). There is no Turbo in heavy FP/SIMD mode mind you. If you use 132 score as base and not 147 (AVX one),you get for 3.6Ghz : 132x3.6/2.8=170mpix/s vs 115 for 1100T. That is 48% better and based on Zambezi leak(not Opteron's score). 1.48x 5.91pts (Thuban score) =8.74pts. This is still miles ahead of what you claim and Chinese show. Again,remember that these numbers are based on SSE score I linked above (so legacy SSE code that Cinebech uses too).
Bookmarks