It is important to understand the architectural development on Fermi and Kepler and how they differ.
Essentially, Fermi was die size limited which meant NVIDIA had to add performance without increasing the transistor count. In order to do this, they ran the shader domain at double the speed of the other processing stages. Unfortunately doing so increased heat production and power consumption.
GK104 meanwhile doesn't have the same limitation partially due to the 28nm manufacturing process and partially due to optimizations within the architecture that limit the number of transistors needed for certain processing stages.
This has allowed for a drastic increase in the core count but in order to keep power consumption to reasonable levels, NVIDIA is now running the clocks at a 1:1 ratio. There are some other changes like PolyMorph Engine reductions (even though they do run at close to double the speed) and caching changes that can also count towards the non-linear performance increase from Kepler GPC to Fermi GPC.
If you have any particular questions. Let me know.

Bookmarks