This particular example is of course an ideal case for 256 bit hardware
and now Sandy Bridge's 48 byte/cycle versus Llano's 32 byte/cycle
L1 cache bandwidth will determine the throughput.
(Note that in this kind of cases there is no advantage from HT for Sandy
Bridge since a single thread already utilizes 100% of the resources)
Regards, Hans




Reply With Quote

Bookmarks