You're mixing up theoretical FLOP/S and real FLOP/S. In fact theoretical SP perf for LRB should be arround 2 TFLOP/S (32 cores * 32 ips * 2 GHz). But in real life you're some times limited by other factors (such as mem bandwidth) especially in apps such matrix multiplication.





Reply With Quote

Bookmarks