Just for clarifications: SSE3 always has 128 bit instructions. On earlier platforms they might need more cycles to execute but the 128 bit instructions as such are present in any processor supporting SSE3.Originally Posted by agenda2005
I don't think it is quite correct to say that two 64 bit SSE operations will now require half the time. This is only the case when ...
- ... there are indeed independent units to compute. Doing two instructions at the same time requires that non depends on the outcome of the other. That is frequently not the case
- .... and unless it is hand-coded assembly the compiler has to be sure about the previous fact. It can be nontrivial for the compiler to figure this out in a bulletproof way. If the compiler is not entirely sure it will default to be conservative
- ... to be most effective the compiler has to be able to do out-of-order processing to scrap two 64 bit operations into one 128 bit one even if they are at different places in the source code. Prooving that this is safe is nontrivial, too, in particular in languages like C/C++ where there is a lot of aliasing going on
Bookmarks