Quote Originally Posted by SexyMF View Post
I'm not really seeing any detailed rebuttal to the Real World Tech article within the blog. The RWT at least had some analysis to show the use of x87 instructions.

The rebuttal eludes to other bottle necks which limit the effectiveness of using SSE over x87, but doesn't start down the path of identifying these bottlenecks what is involved to overcome them. Or did I miss that?
The premise of the original RWT article is that, after some profiling, a definite pattern emerged -- PhysX was using legacy x87 as opposed to SMID SSEx for the heavy math lifting, this is besides the fact that PhysX is also not multithreaded on multicore CPUs by default. Kanter then goes on to question why this is the case, and makes his point around the concept that Intel and AMD both are deprecating x87 and focusing on improving/developing SSE.

There are two points of contention -- a) Kanters claims that using properly compiled and vectorized SSE can speed up throughput up to a theoretical 4x and probably, in reality, 2x and b) Kanter makes an argument that there is no technical reason why PhysX should not compiled SSE rather than x87. In this it is implied nVidia did this intentionally in order to make PhysX on the GPU look that much better. -- this seems to have pissed a few people off.