Originally Posted by
JF-AMD
It's about cache efficiency. Today, when there is a cache miss, the thread stalls while the core waits for the data to be fetched from memory. While that thread is stalled, SMT will dump the cache, insert a new thread, run that, then return the cache contents for the old thread (that just got the memory data.)
I know that is a REALLY simplistic description but should help you visualize.
HT originally came about in P4 because they had a very long pipeline and one cache miss had lots of penalty associated. But as they shortened the pipeline (i.e. Core2) they tossed out HT because they no longer needed that band-aid.
If you take that same logic and extend it, as a microarchitecture, you should always be striving to reduce cache misses as much as possible. As you reduce the misses, you increase the efficiency. That is good. But the cache misses give you the "opportunity" that you need for SMT to work. So as primary core efficeincy goes up, the SMT efficiency generally goes down.
The ability for parallelism to increase has more to do with the OS schedulers for the most part. OS's deployed 3 years ago were written when single cores ruled the earth. OS's deployed today were focused more on dual core and even to a small extent quad core, so they do a better job of scheduling. OS's that you will use in 3 years will do much better than today's. It is all a progression. Saying you don't need more cores in the future because today's OS's don't utilize all of the cores is like saying that a 1TB drive is too big. Give people enough storage space and they will fill it. Give them enough cores and they will figure out how to use them.
My notebook probably has 50 different services running (and 3-4 actual programs). There is always a use for more cores, the OS just needs to come along for the ride - and that will be happening.