I'm not saying the L3 central isn't worthwhile and certainly to have a central pool and still have the individual cores fed properly you had to replicate to the L2. But seems to me there is no room left in the L2 to conduct a single core's chores (to keep the example simplified). You could have doubled or quadrupled the L2 with not a large hit on cost or really complexity since you've already designed the basic working format. Even then you'd still have room leftover in the L3 pool and each core would work faster, utilizing L2, and still have an excess of L3 to tap for the next tasks to assign to a core.

How I would have envisioned the layout:
1mb L2/Core
8mb of L3 shared

The cost reduction of the smaller L2 than in prior series would have covered the addition of the L3 and probably a mem controller of significant capability in the bargain.

Now I say this and I have to believe you gave this a shot. I honestly think you didn't draw up the current ratios without experimenting. So yes I'll wait for the tests, but it "feels" to me like Intel took the L2 down more for cost to benefit ratio than anything purely performance oriented. Meaning there was an advantage to the additional L2 (as surely there would be) but it was judged that the additional benefit didn't warrant the cost involved. I'm not baiting you to argue this, just explaing a "feeling" on this topic and any commentary you may feel is worthy is fine or none is fine too.

Given where the 9770 landed in cost, perhaps bringing things under control wasn't all bad. And perhaps what was sacrificed won't be all that noticeable. So I'll wait and see. But I do feel "some" additional L2 would have yielded a slightly better chip.