Fixed that for you.
The problem of parallelism has been around a long time. CPU manufacturers had to decide if they were going to keep their chips scalar and just make those execution units faster or if they were going superscalar and add redundant execution units. Obviously the later course has won historically for general purpose cpus. But they can't go wild adding tons of redundant units because there are limitations and you get diminishing returns. Sometimes there simply is only so much instruction level parallelism (ILP) to be extracted from any given segment of code. It also takes quite a bit of extra silicon to calculate dependencies and it increases rapidly with the number of extra units.
Instead of fighting a game of diminishing returns with Intel to extract maximum ILP, it seems that AMD is simply refocusing on thread level parallelism. Even Intel with their renowned branch prediction, OoO engine, hyperthreading, etc still has execution units idle for a significant portion of time. So AMD removed one of the ALUs and all of the expensive logic required to check dependencies. They can use that freed transistor/power budget on branch prediction, cache, etc to keep the remaining units fed. IPC could still increase despite being less superscalar than their previous arch and the upshot is that the previously redundant execution units are freed to work on separate threads.
I think both AMD and Intel are working towards eager execution. In essence, instead of trying to predict which path to take when you come to a branch, take both. IMO, this is the next leap both companies need to take in single threaded performance. Bulldozer would give AMD a significant lead on developing this, IMO.
For one thing, AMD didn't make much of a dent in Intel market share even when they did have the faster chip in both single and multi threaded apps. So there is indeed more going on than simply who has the faster chip for any given application.
Of course my scenario doesn't ring true, because it hasn't been the case in recent memory. Before conroe AMD had the stronger performance in single and multi-threaded performance. And since conroe AMD hasn't had the faster chip for single or multi-threaded performance. And in that time they lost the small market share gains they did make. But if they do have the better multi-threaded performance I expect that they will regain ground in the segments I mentioned, or at least the server segment. Single thread performance doesn't mean much of anything in the server world. What matters is total throughput and power consumption while doing it. Why do you think the market was salivating at the thought of clustered ARM chips for server applications?





Reply With Quote

Bookmarks