Originally Posted by
hlopek
Thank you Hans, nice work :up: (seems like jour PS using GPU acceleration judging by tiling on zoomed core :D)
Looking on your numbers of L3 cache consuming only 3,85mm˛/MB i must say how pleasantly surprised i am with small amount of space needed for L3 per MB.
But them i'm in fact wondering why AMD decided to go with only 4x 2MB L3 cache when 4x 4MB L3 cache on already huge 320mm˛ wouldnt take up much space ~355mm˛ nor require much more power on chips that probably already consume >140W TDP chips, and extra 8MB might raise that number for only 8-13W while giving more breathing room to IMC.
16MB L3 + 8MB L2 cache seems easily reachable for 32nm 300mm˛+ die and in server loads, which for Bulldozer is designed, extra 8MB (or just extra 4MB L3) could be of much use. Guessing in some special cases 16 vs 8MB L3 could provide more than 20% boost while, yet again, only consuming modest amount of power which in server load could be more beneficial if cores arent throttling waiting for data to came few jumps away.
And 12MB-16MB L3 cache could "present in specs" as more competitive product to server SandyBridges with 20 L3 caches and enormous 384mm˛ (if i read correctly somewhere)
Does these new distributed L3 also support power down L3 cache feature?
And does Bulldozer came with yet another separate power plane and disconnected IMC from L3 so that we now have independent IMC, and independent L3 power planes? If that is supported now i really dont se reason why AMD went only with 8MB L3 cache.
Or maybe better "directory table" (DT) and complexity allowing seamless relaying data on dedicated HT link in separated L3 caches? That's why i'm asking why so small amount of L3 cache. 1MB L3 for dedicated HT link DT, plus 1MB for swapping and needs for local core.
Simply too small. 1MB reserved for HTlink DT, 1MB swap/relaying-routing and at least 1MB local core "reservation" would be much saner approach imho.