Bulldozers first screens

**informal** · 04-27-2011, 03:05 AM

Originally Posted by kl0012

In fact, Bulldozer's module max floating point instruction throughput on current software (without AVX and FMA) is equal to thus of one Phenom's core - two 128-bit fp ops per cycle. Bulldozer module is more flexible - it can start any combination of ops per cycle (such as MUL+MUL or ADD+ADD) while Phenom core is tied to MUL+ADD. On the other hand Bulldozer has higher latencies for fp ops and various FP-pack/blend/copy ops are executed on one of two fp-pipes while Phenom has special unit (fp-misc) for such type of instructions. So it is possible that 6-core bulldozer will have equal performance to 3-core Phenom on the same freq in apps with many fpu code.

My prediction is that one FMAC will have around 20-30% higher performance than one Thuban core ,in non recompiled MT software.Single thread fp performance should be a lot higher than that(2xFMAC in this case).In FMA optimized code, there should be substantial jump,maybe up to 50%.
You can see from leaked donanimhaber slide that 8 core (probably <3.5Ghz) model has approx. 1.88x the performance of 1100T in Cinebench 11.5.That's non recompiled legacy fp workload in which you have 8 128bit FMACs working versus 6 Thuban cores(each of which is Mul+Add). This roughly corresponds to 1.3x the fp power of Tuban core,roughly at the same clock.

edit: Someone asked about stepping or revision of BD ES in question. It has W8K44 at the end so it is a B0 for sure. Since Charlie wrote pre-B1 was useless for benchmarking you can now see why the scores are the way they are.

Thread: Bulldozers first screens

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions