Well, actually, let s readjust this one ... Since the beginning, SSE is 128bits, but since AMd had 64 bits, they got the /flavor:AMD to compile with 64 bits loads and stores with the Athlon64 ... when they moved to Phenom, they were victime of their short vision and the 128bits execution unit is used only used half.(because their code path detection) AMD is now using Intel flavor.
On few test, you can expect the Phenom II performance to increase a little when AMD is done fixing the mess they did build for themselve.
Using movhps and movlps instead of movaps was not very smart, it was a nice way to slow down the Pentium 4, but that 's about it.
Give them the credit to lose few % with Phenom II because of this, it is not going to change the overall picture, but know that the processor Phenom I and II is not responsible for the short coming on SSE, it is AMD software enabling who messed up.
next time, when an instruction set is 128bits, don t use 64
This is my personal opinion.
Bookmarks