4.7 to 5x multiprocessor speedup is simple to explain. In single thread test you have one core using one 256bit FPU. In MT you have 8 cores using 4 256bit FPUs. From one to 4 FPUs you have 4-5x speedup (more than four due to SMT mode in the FlexFP which adds 20% on top of 4x).
The problem is one 256bit double-sized FlexFP is slower than old Thuban core.At least in these benchmarks/conditions. That's what's illogical.
Bookmarks