Bulldozers to the FX-8120P [B2 latest version of the core differences between the measured Full CPU_NB]
http://forum.coolaler.com/showthread.php?t=273986
Bulldozers to the FX-8120P [B2 latest version of the core differences between the measured Full CPU_NB]
http://forum.coolaler.com/showthread.php?t=273986
Extremely poor L3 cache performance. That's why whe should take all these pre-release benchmarks with a grain of salt...
Main: Windows 10 Core i7 5820K @ 4500Mhz, Corsair H100i, 32GB DDR4-2800, eVGA GTX980 Ti, Kingston SSDNow 240GB, Crucial C300 64GB Cache + WD 1.5TB Green, Asus X99-A/USB3.1
ESXi Server 6.5 Xeon E5 2670, 64GB DDR3-1600, 1TB, Intel DX79SR, 4xIntel 1Gbps
ESXi Server 6.0 Xeon E5 2650L v3, 64GB DDR4-2400, 1TB, Asrock X99 Xtreme4, 4xIntel 1Gbps
FreeNAS 9.10 x64 Xeon X3430 , 32GB DDR3-1600, 3x(3x1TB) WD Blue, Intel S3420GPRX, 4xIntel 1Gbps
You cannot blame the l3 cache for that.... the l2 cache is even slower.. Actually if it didn't have a better latency than the main memory, they would be far better of removing those caches.
(so yeah i agree, these samples are crippled to the bone or something is terribly wrong...)
as xsecret said, the BD we have seen have serious cache issues. Even brazos has higher l2 bandwidth at half the core speed...?
Last edited by flyck; 09-19-2011 at 11:01 PM.
And check out the L1 write. If those caches were where they should the performance would be much better.
That is totally ok, BD's L1 is write through, i.e. writes to the L1 go directly to the L2, thus the L1 and L2 write performance should be more or less the same.
However, I wonder what is happening with the L2 read performance, for some strange reason it seems to depend on uncore clock:
2.0GHz: 11.9 GB/s
2.2GHz: 35.8 GB/s
2.4GHz: 12.5 GB/s
2.6GHz: 36.8 GB/s
That's a big difference ...
Looks like a bug. And I guess the will be some interesting results if you starts playing with the core frequency as well. Could it be problems with the sync?
And the whole write through idea looks like crap to me. What is the advantage? I think I can see some drawbacks.
I don't think so, it would hold back execution way too much. There is a buffer here called coalescing cache, which seems to be disabled here.
http://www.realworldtech.com/page.cf...2610181333&p=9To alleviate the write-through bandwidth requirements on the L2, each Bulldozer module includes a write coalescing cache (WCC), which is considered part of the L2. At present, AMD has not disclosed the size and associativity of the WCC, although it is probably quite small. Stores from both L1D caches go through the WCC, where they are buffered and coalesced. The purpose of the WCC is to reduce the number of writes to the L2 cache, by taking advantage of both spatial and temporal locality between stores. For example, a memcpy() routine might clear a cache line with four 128-bit stores, the WCC would coalesce these stores together and only write out once to the L2 cache.
Bookmarks