Please, please, please tell me all (or at least some) of this testing is with 64-bit applications on a 64-bit OS.

The application I am working on, the 64-bit version is 40% faster than the 32-bit version. There are twice as many general purpose registers in 64-bit mode. My application makes heavy use of 64-bit integers and no floating point. It is like Fritz except it is a breadth first search instead of a depth first search. It will use all the memory that you can throw at it and thrash it hard.

It wouldn't be that hard to add a dialog box to let you set the threads to run on the desired cores.

My next machine was going to be a dual socket C32 machine, I need the memory slots more than I need processing cores. But after reading Anands "Rendering and HPC Benchmark Session Using Our Best Servers", I have concerns about the memory performance of BD in dual socket boards.