I'm not really seeing any detailed rebuttal to the Real World Tech article within the blog. The RWT at least had some analysis to show the use of x87 instructions.

The rebuttal eludes to other bottle necks which limit the effectiveness of using SSE over x87, but doesn't start down the path of identifying these bottlenecks what is involved to overcome them. Or did I miss that?