Originally Posted by
Drwho?
I propose you take a simple linear algorithm and run it on one thread, then, count the number of instructions retired, and then, divide by the number of clock ticks ... You ll be surprise ;-)
( make sure your code is totally compute, with 1 to 2 instructions dependancy ... )
Power point are one thing, but measuring and checking yourself is much better ... Otherwise , at 4.2ghz, how could you explain the poor performance of BD on superPI? Low IPC ... Then, ask yourself, if you measure the IPC for each thread, why it never goes about 2 on a single thread ... Please experiment before trying to correct me. I did my homework ;-)
Then , for your intel diagram, you forgot to count code fusion ... SandyB is 4 large + Fusion ... That gives you up to 5!
We saw a lot of powerpoint slide, but the measurement don t match what is showed in the ppt, sorry, you assume the marketing slide are correct, this is where is the gap. I looked for everywhere, I could not find anywhere clearly said that it will decode more than 2 per threads, and match it with an ASM code doing more than 2 IPC , did you try?
Hehe ...
Francois