I'm getting speedups of up to 30% using numactl interleave on Linux.

The speedup depends on the size - for smaller sizes, it sometimes backfires.
I'll accept any settings as valid benchmarks. Tuning the OS like this is part of the game, lol.
As far as I can tell, only Windows Server has any sort of NUMA-awareness support. But you need Win Server anyway to get more than 2 sockets in the first place.
There might be a setting somewhere in the OS that can be set to force interleaved memory allocation. If there isn't, maybe there's a special WinAPI malloc() function that does interleaved allocation. If I find it, I'll try it out.
But in any case, interleaved memory allocation isn't a "solution" to NUMA.
It lets the program get the full memory bandwidth, but it doesn't get rid of the interconnect contention and latency.
The program will still need to be redesigned to run well on NUMA.
EDIT:
Got my B3 stepping today. I finally get my SATA ports back.

(I still don't get how I managed to kill them in only 2 months...)
Gonna have to re-test my OC tomorrow after the TIM sets. That thermal pad on the H50 was great, but it's only good for the first use, afterwards it's too uneven so I had to scrape it all off and use some of my leftovered Arctic Silver.
Bookmarks