Quote Originally Posted by haylui View Post
how do you found that?
since BD is sharing 4 instruction fetch/decoder across 2 cores in a module, and the fetch/decoder is shared at alternate clock. Effectively per core has just 4/2 fetch/decoder's peak performance.
By experimenting with a couple synthetic benchmarks and assembly programs.

here is the complete list:
AAA - ASCII adjust AL after addition
AAD - ASCII adjust AX before division
AAM - ASCII adjust AX after multiplication
AAS - ASCII adjust AL after subtraction

Which Ironically are only valid in in Compatibility mode and Legacy mode; rarely used and thus probably a non-issue.

Now testing to see if it is just a Bulldozer or general x86 issue.