AMD Zambezi news, info, fans !

**Oese** · 09-13-2011, 10:25 AM

Originally Posted by xsecret

WCC is a joke in the current BD implementation and is not able to catch up with the massive loss that comes from the L1D. The entire caching-system is lowering the performance of the µarch. The L3 is a non-inclusive victim cache (L2 data are evicted to the L3) with data transfered from L3 to the L1D of the expected core without being copied to the L2. That mean high snoop traffic in order to keep the coherency correct. And snoop traffic is something really unwanted from a bandwidth/performance pov. There is a pardox here : The L1 is in Write-through, but you're not sure a data not in L2 is not the L1D of another core.

Interesting reads here, but 3 questions:

1.) why would AMD do such a thing (you wrote for better clock-scaling, that means higher clocks or better scaling?)

2.) can it be optimized further

3.)

Write Trough means that every write to the cache causes a synchronous write to the backing store. Because L2 is slower than L1, L1 must wait for L2 to write out data.

this contradicts somewhat

The L3 is a non-inclusive victim cache (L2 data are evicted to the L3) with data transfered from L3 to the L1D of the expected core without being copied to the L2.

or you mean this can be due to WCC? I am not a professional, but I would guess WCC doesnt take that long to preserve coherence... If anything at all, I'd guess the waiting of L1D for L2 can be a problem, but if stuff is written to L1D fast, and then some cycles later WCC writes to L2, I guess there will only be rare cases where

The L1 is in Write-through, but you're not sure a data not in L2 is not the L1D of another core.

really matters...

I guess

data transfered from L3 to the L1D of the expected core without being copied to the L2.

will first speed up things, then

snoop traffic in order to keep the coherency correct

will need some time. All, to my eyes, depends on how this stuff is used, it can be faster in one case, and slow in the other... Mhm somewhere we came to that conclusion before

Maybe the problem is because of

The L1 is in Write-through, but you're not sure a data not in L2 is not the L1D of another core.

, one core might need to wait for WCC to complete to ensure coherency?

Ahhhh btw... CONGRATZ FOR NEW WR

Thread: AMD Zambezi news, info, fans !

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions