(Contact : leedaeguen [at] kaist.ac.kr)


AMD and NVIDIA compete a balanced fight since Q4 2013 with their newest flagship silicons namely Hawaii and GK110. And this kind of frame might last a couple of quarters since transition to 20nm fabrication process, which is possibly a key to next generation flagship GPU such as Maxwell, is being delayed by TSMC. So it is more likely that each company's dual GPU SKU will be the closest 'balance breaker' of their competition rather than next generation SKUs. Yesterday, NVIDIA engage their first hook for that 'balance breaker' : GeForce GTX 790, a dual-GK110 SKU replacing NVIDIA's current king, GTX 690.

According to VideoCardz, GTX 790 includes:
- 2 x 2496 CUDA Cores (13 SMX per each GK110 GPU, 1 shorter than TITAN / 1 more than GTX 780)
- 2 x 320bit GDDR5 memory interface (highly implies that each GPU only has 40 ROP, not 48)
- 10GB of memory
(Source : http://goo.gl/pwiybI)

In this posting I'll calculate its speculative performance via 'VGA calculator' I designed. The calculator is actually a simple multivariable fractional equation whose variables are SP/TMU/ROP count and GPU/VRAM frequency. Each term represents the 'simulated' GPU's shader/texture mapping/rendering performance via harmonically (which means each term employs '1/n' form) thus we can easily see not only specification-wise bluff but also a true fact on each GPU's real performance. For example, Radeon HD 5830 has more SP/TMU and same amount of ROP than 4890 while their clock rates are 800 and 850MHz respectively, so it is quite natural that 5830 seems faster than 4890. But it's not. This is because of 5830's (a bit) slower ROP partition than 4890 which affects gaming performance despite entire dominance on SP/TMU. By this calculator, however, I successfully speculated that 5830 won't overwhelm 4890. Indeed, across more than 3 year, it still works for recent GPUs such as Volcanic Islands/Kepler family so that I speculated that Hawaii will faster than GTX TITAN/780 but not compete with a full-blown GK110, which now released as GTX 780 Ti, prior to Hawaii's release. (See that : http://udteam.tistory.com/535)


Well, let this lecture get finished. The results are as below:
(Assume that GPU/VRAM clock rate remains unchanged. GPU clock reflects the max boost frequency)



▲ It's obvious that the new card bests any of other predecessors including GTX 690, a dual-GK104 SKU. Roughly the margin between GTX 690 seems to be about 25%, and almost a half more than GTX 780 Ti. Speaking of SLI scale, however, it's also a bit disappointing when we compare this to actual SLI configurations. Let's see 2 x GTX 780 Ti config.



▲ See what I mean? GTX 790 actually doesn't exhaust a full potential of two monstrous GPU. Let's figure this out component by component. First, let's compare a half of GTX 790 to other single-GK110 SKUs.



▲ It seems like a half of GTX 790 doesn't even compete GTX 780 though it has 1 more SMX. The only difference is ROP count and memory interface wihch bonds together(8 ROP and 64bit interface are blocked together in GK110) so it is rational that the performance gap is originated from that. Let's try to prove this.



▲ The result above is given from a half of GTX 790 plus 1 more ROP-IMC cluster(means +8 ROP and +64bit GDDR5 memory controller). Another ROP-IMC cluster features almost 12% increase in performance so that it can overcome GTX 780 and goes very close to GTX TITAN.

Let's see the contrary : A full-blown SP/TMU (same amount as GTX 780 Ti) and a flawed ROP-IMC.



▲ It becomes obvious that ROP-IMC part affects more on performance than SP/TMU.

So, my conclusion is as follow:
- GTX 790 will gain the crown but not faster than SLI config. of today's highest end single-GPU SKU.
- GTX 790 will actually be slower than 2 x GTX 780, not 780 Ti nor TITAN because of its lack on ROP.

Well, the post is over. Thanks for reading. Have a nice day :-)