Quote Originally Posted by Opteron146 View Post
That is totally ok, BD's L1 is write through, i.e. writes to the L1 go directly to the L2, thus the L1 and L2 write performance should be more or less the same.
I don't think so, it would hold back execution way too much. There is a buffer here called coalescing cache, which seems to be disabled here.

To alleviate the write-through bandwidth requirements on the L2, each Bulldozer module includes a write coalescing cache (WCC), which is considered part of the L2. At present, AMD has not disclosed the size and associativity of the WCC, although it is probably quite small. Stores from both L1D caches go through the WCC, where they are buffered and coalesced. The purpose of the WCC is to reduce the number of writes to the L2 cache, by taking advantage of both spatial and temporal locality between stores. For example, a memcpy() routine might clear a cache line with four 128-bit stores, the WCC would coalesce these stores together and only write out once to the L2 cache.
http://www.realworldtech.com/page.cf...2610181333&p=9