The real world strange performances of BD seems to come from major changes in instructions throughput. Some of them are much (much) faster than Thuban, some other are much (much) slower. So depending on which instructions are used by the benchmark/software, you will get various results. Use FDIV and you'll see major gain, use FCOS and that will sux more than a Prescott. Something is wrong in µops decoding on many instructions.
Bookmarks