Unfortunatly there is only wishful thinking in your post.
You are basicly saying, compared to a big Fermi, the mid Kepler GK104 drops die size by about a thrid ( ca. 510mm^2 is 50% more than ca. 340m^2), drops TDP by around a thrid (ca. 270W real peak TDP is 50% more than ca. 180W real peak TDP), increases transistor amount only a little bit (The rumours mention 3.5 Billions trannies, maybe up to 4. Billion) and the performance goes up by 50% too??? I can only see three possibilities how this could happen:
1) Fermi is the least efficient chip ever. Hot, broken, unfixable. Kepler is a heavily reworked architecture, picking up some low hanging fruits.
2) nV engeneers put physics and TSMC engeeners to shame, by doubling the gains from going to 28nm process TSMC would ever admit were possible.
3) What you say is wrong.
So what is going on, if there is indeed 3x times more Cuda Cores? Where does that 50% more perf come from? Dropping hot clock looks almost certain by now, but there seems to be still a lot more raw GFLOPs available. Will they translate to more performance?
It depends. My educated guess is ... there will be no SFUs anymore in a SM. That's were the space will come from, to fit all the extra CCs. Special functions will be done on the CCs in multiple clock cycles, just like... in GCN! There are a few advantages from this approach. First, you can reduce the data movement inside a SM, registers can be kept closer to the SIMDs. Data moving is expensive, so you save power by avoiding that. Furthermore, SFUs do nothing for linpack numbers, they don't increase the FLOP count. And nVidia promised to deliver 3 times more GFLOP/Watt with Kepler and HPC is a very important market for them. So if you exchange "useless" SFUs for shaders, saving some power by doing that, this goal becomes possible to achieve!
So if you look at artificial, "canned" benchmarks that rely on raw GFLOp power... yes, there is going to be 50% more performance. But if you look at others, ones requiring special functions... performance will start to tank, likely under the performance of a GTX580. Games require a mix of both, so who knows how it will balance itself out. Overall faster than a GTX580 is very likely IMO, but not by much.
Bookmarks