On K10.5 6MB L3 cache consumes almost same space as four cores with their proprietary 512kB L2 cache
Bulldozer cores will be lighter (maybe smaller L1 cache per "core" that will total up to same size of 64+64kB inside dual core module as in K7-K10.5 releases And L2 will be same or 1024kB per module so they could easily squeeze up to 6 that kind of modules and again double L3 cache to 12MB (originally they mentioned that 16MB L3 is projected for Bulldozer CPU afair) and still be inside same dimensions as previous CPU generations, K10@65nm/K10.5@45nm ~250mm2. And above all that to mention HKMG which supposedly should serve as huge MHz jump and they even manage to squeeze 4x3.4MHz inside 90W TDP on active 45nm process
Yep two separate cores are always better than two threads inside one core considering power/performance ratio and better utilization and easier optimization for simpler core than to proprietary derived HyperThreading which evolved from HTT(1) inside P4-HT to HTT(2) inside Nehalem, and probably to some variation of HTT(3) in Sandy Bridge. So previous optimizations usually doesnt work and you need to recompile your work yet again and optimize for HTT beside SSE/AVX native code optimizations. But in the end SMT should serve intel as much as CMT to AMD. just CMT has brighter future regarding power wise orientation (according to AMDs bragging)
SSE5 is part of CPU "module" and until GPU part of CPU doesnt get inside "module" it wouldn't serve as GPU optimization and that will probably never happen. Integrated GPU (which is not part of Bulldozer btw) will communicate over PCIe (and i hoped for HT/HTX bus) and that way will only serve for better integration and better HTPC (low end server design?). Only performance boost that "SSE5" could done would be some packing that shrink bandwidth needed for PCIe communication or something but that would benefit to any device connected to that PCIe(3.0?) bus (ex. discrete GPU card) and dont think they even think about that kind of tweaks when they designed SSE5.Originally Posted by madcho
And what about 6-core revisions of K10(.5) CPUs, couldn't every core use ondie L3 cache and it's still 48-bit wide as in quad-core (Deneb, PII X4)?
I think L3 sharing is pretty easy to upgrade to more than 4-cores when TLB works properly in the first place (famous pre-B3 K10 revisions).
Excellent two hits w/o misshope more of it will come, it's refreshing to see someone on forums that knows the real matter behind all HT mess mixups
![]()




Reply With Quote

Bookmarks