
Originally Posted by
kl0012
In fact, Bulldozer's module max floating point instruction throughput on current software (without AVX and FMA) is equal to thus of one Phenom's core - two 128-bit fp ops per cycle. Bulldozer module is more flexible - it can start any combination of ops per cycle (such as MUL+MUL or ADD+ADD) while Phenom core is tied to MUL+ADD. On the other hand Bulldozer has higher latencies for fp ops and various FP-pack/blend/copy ops are executed on one of two fp-pipes while Phenom has special unit (fp-misc) for such type of instructions. So it is possible that 6-core bulldozer will have equal performance to 3-core Phenom on the same freq in apps with many fpu code.
Bookmarks