Quote Originally Posted by ajaidev View Post
Ahhhh you should also do a little checking....

Thread scheduler issue's priority levels yes but those 32's can overflow quite easily and then what? The second core is used thats is one of the reasons HT has a negative impact on some programs. The virtual core has no resources of its own, so it shares the real cores resources. Now when a specific amount of resources are in use the virtual thread can not be initialized until a resources are free.

Do you have some data to show where Nehalem HT actually has a negative impact ? And I don't mean single or pseudo-threaded game engines.

HT allows first and utmost an increase in throughput. You can have a core with HT disabled which does 100 work units per thread in a given time frame. You enable 2 thread HT and now you do 70 work units per thread in the same amount of time.

Does HT has a negative impact ? From a thread point of view yes, you're 30% slower per thread. But from a workload point of view ? No, you've done 40% more work ( 2x70=140 work units ).

Especially in Nehalem ( in P4 HT did not have that many units to start with, its main task was to hide memory latency ), HT is a definite plus.

AMD's approach in BD is totally different.A BD module is basically a souped up core with double the INT units or conversely, it's a module with 2 INT cores and a shared FP unit.
Their ideea is that it's not worth tinkering with the core itself, but simply cramming more cores ( or clusters if you want ) on the same die. Improvements in process tech allows you to put 6-10 cores in a reasonable die area, next process is 12-16, than 30 and so on. Why bother with SMT when you'll end up with dozens of "real" cores, as many as the number of threads today ? When you're resources are more limited, it's not worth doing SMT. Simply do CMT, copy and paste as many cores as possible on a die and you're done.