Been playing with Linux and NUMA for some time now on the rig Skycrane sent me.

If anyone (with a NUMA machine) is interested, you might be able to get a speedup if you install the NUMA control package (numactl) and play with the settings.

Try running y-cruncher like this:

numactl --interleave=all "./x64 SSE3.out"

or whatever version is compatible.
I haven't checked if there is a Windows equivalent.

It's doesn't "make" the program NUMA aware, but it does seem to distribute the memory traffic more evenly across the cores.
y-cruncher has a problem of having a huge amount of memory/interconnect traffic concentrated on the "primary" NUMA node. Interleaving the memory seems to spread it out a bit.