I was thinking about the new dual core (Graphics engines) and the 4-way VLIW architecture in Cayman,
Starting with the front end, the ILP (Instruction Level Parallelism) must be the same or better than Barts (6800 series) and the new dual core design seems better in TLP (Thread Level Parallelism). Doubling the Graphics engines in order to have a second Tessellator unit will increase the transistor count and that means it will take more space. So the new front end will be bigger than Barts in size.
Because the new 4-way VLIW don’t have a specialized T (Transcendental) unit, its job is done by the 3 out of 4 shaders insight the SP in the new VEC-4. I believe that when they calculate T(sin, cos etc) the new 4-way VLIW will be slower that the previous 5-way VLIW by up to 10-20%. FP performance could be almost the same as VEC-5 (5 VLIW) per SP but at some instances it could be slower by up to 10%. The new VEC-4 saves them 10% space per SP so in theory they could put 10% more SPs and keep the same size.
One more problem with size will be the Texture units. Because Tex Units are part of the SIMD, and we will have more SIMDs than Cypress (20) they will increase the size of the die. I have no idea how big the Tex Units are but if Cayman have 30 SIMDs (120 TMUs) then they will increase the die size a lot.
Another problem will be the need to connect all those SIMDs between them and that’s the job for the Crossbar. If we have 30 SIMDs and 1920 shaders, Cayman has 20% more shaders than Cypress and it will need more connecting lines and bigger crossbar. That will increase the complexity of the chip and will increase the size.
One more side effect of this complexity is that it could effect the yields because more broken lines could happen in manufacturing and more copper could effect the thermal characteristics of the chip.
From my point of view 120 TMUs are too much and if AMD wants to keep the die size close to Cypress then they will need to cut the SIMD count to 24 SIMDs and 96 TMUs. If the new 4-way VLIW have the same performance with the old VEC-5 and they will increase the ILP and TLP in the front end, then with 24 SIMDs and a dual Core architecture for 2x Tessellation performance, they will have a small chip at 360-380mm2 with 20% more performance in DX-9-10 but 2x in DX-11 Tessellation.
Bookmarks