Not just Core i7.
There will be 5 binaries in this version.
x86
x86 SSE3
x64 SSE3
x64 SSE4.1 ~ Ushio (tuned for my LanBox)
x64 SSE4.1 ~ Nagisa (tuned for my workstation)
So two SSE4.1 versions tuned for Core i7 and Harpertown.
x64 SSE3 has been re-tuned for a smaller cache than the previous versions.
This will help out AMD chips and any non-12MB cache Core 2 Quads.
Prior to v0.4.3, x64 and x64 SSE3 were both tuned for 12MB cache... (for my workstation)...
Turns out that this was the culprit that was hurting virtually all other processors including all AMD - since nothing else had 3MB cache/thread... And not surprisingly, it hurt i7 the most.
EDIT 1: The two SSE4.1 versions are fully compatible with each other and should theoretically run on Bulldozer as well. The only difference between them is the tuning.
EDIT 2:
The speedup via SSE4.1 is very small (a fraction of a %). So non-12MB Yorkfields will use x64 SSE3 instead of x64 SSE4.1 ~ Nagisa because of the more favorable tuning.
And it is possible to override whatever the auto-selector chooses.





Reply With Quote


Bookmarks