Originally Posted by
informal
I doubt the core(s) would be able to execute 4 threads(maybe Mechromancer meant 4 instructions per clock?). The patent simply describe a 4-way design(4-way as 4 way decoding ;in K8/K10 we are stuck at 3 way). The new approach is flexible as it allows the superior efficiency of 2-way decoding and combines these 2 int pipelines to achieve as much as possible efficiency from integer code(as close as possible to ideal 4-way execution).Similarly,in FP/SSE case we have a 4-way,although "single", SEE unit which is "super" wide(256bit and supporting AVX and FMA4 extensions) and which is able of splitting in many various ways(1x256,2x128 or even 4x64bit),similarly to "Itanium way",which in turns could make it much more efficient then the present ,as Hans call the,dumb way of doing things.
The described design would still receive many more improvements to other aspects of the core and uncore parts,but the underlying uarchitecture is pretty well presented in dresdenboy's blog. The described design should be very efficient in both multithreading and singlethreading,relying on the split int pipelines for great efficiency and even possible speculative execution(it could be useful with branch prediction and data reliability). There are also many patents on improvements in the area of power management,GPU/CPU integration(2nd gen. of Fusion in ~2012) etc.
Also the design can be always extended in future,by adding one more int "cluster" and thus making a possible efficient 3x2-way integer "super cluster"(a real unified 6-way design,ie. the natural extension of what we have today,would be a power hog and much less efficient). The FP/SSE part would need appropriate rework and this could be a challenge since in present day patents the SSE unit is still unified and not split in smaller clusters.