SSE4.1 and SSE4a are different instruction sets.
Intel has SSE4.1.
AMD has SSE4a.
In my opinion, there's nothing in the SSE4a instruction set that is useful for this program.
EDIT:
And even it did, I don't have access to a K10 machine with enough ram to properly test it. (Since an SSE4a version won't run on my Xeon workstation.)
EDIT 2:
The AMD optimized (Kasumi) binary was also tested on my Xeon workstation for correctness. (along with all the other x64 binaries)
For correctness testing, it doesn't matter that I'm using an Intel machine to test an AMD binary.
Only the performance tuning had to be done on an AMD machine - which was a Phenom II X3 unlocked to 4 cores.
And yes, Happy New Year!!!![]()




Reply With Quote


Bookmarks