The 14nm Vega 10 GPU allegedly offers up 64 NCUs and as much as 12 TFLOPS of single precision and 750 GFLOPS of double precision compute performance respectively. Half precision performance is twice that of FP32 at 24 TFLOPS (which would be good for things like machine learning). The NCUs allegedly run FP16 at 2x and DPFP at 1/16. If each NCU has 64 shaders like Polaris 10 and other GCN GPUs, then we are looking at a top-end Vega 10 chip having 4096 shaders which rivals that of Fiji. Further, Vega 10 supposedly has a TDP up to 225 watts.
For comparison, the 28nm 8.9 billion transistor Fiji-based R9 Fury X ran at 1050 MHz with a TDP of 275 watts and had a rated peak compute of 8.6 TFLOPS. While we do not know clock speeds of Vega 10, the numbers suggest that AMD has been able to clock the GPU much higher than Fiji while still using less power (and thus putting out less heat). This is possible with the move to the smaller process node, though I do wonder what yields will be like at first for the top end (and highest clocked) versions.
Vega 10 will be paired with two stacks of HBM2 memory on package which will offer 16GB of memory with memory bandwidth of 512 GB/s. The increase in memory bandwidth is thanks to the move to HBM2 from HBM (Fiji needed four HBM dies to hit 512 GB/s and had only 4GB).
The slide also hints at a "Vega 10 x2" in the second half of the year which is presumably a dual GPU product. The slide states that Vega 10 x2 will have four stacks of HBM2 (1TB/s) though it is not clear if they are simply adding the two stacks per GPU to claim the 1TB/s number or if both GPUs will have four stacks (this is unlikely though as there does not appear to be room on the package for two more stacks each and I am not sure if they could make the package bit enough to make room for them either). Even if we assume that they really mean 2x 512 GB/s per GPU (and maybe they can get more out of that in specific workloads across both) for memory bandwidth, the doubling of cores and at least potential compute performance will be big. This is going to be a big number crunching and machine learning card as well as for games of course. Clockspeeds will likely have to be much lower compared to the single GPU Vega 10 (especially with stated TDP of 300W) and workloads wont scale perfectly so potential computer performance will not be quite 2x but should still be a decent per-card boost.
Bookmarks