Flanker thanks for posting the relevant news mate. I'm still reading it, looks interesting. Will comment later
edit:
Wow ,BSN found some massive gold mine of info,some of which he haven't seen before.
What Flanker quoted above was unknown before.
The document lists the following changes to improve instructions per clock (IPC):
Store to load forwarding optimization <- big improvement(store handling sucked in BD/PD)
Dispatch and retire up to 2 stores per cycle <- same as above
Improved memfile, from last 3 stores to last 8 stores, and allow tracking of dependent stack operations. <-complements above
Load queue (LDQ) size increased to 48, from 44. <-solid improvement to load subsystem
Store queue (STQ) size increased to 32, from 24. <-complements above mem. store subsystem changes
Increase dispatch bandwidth to 8 INT ops per cycle (4 to each core), from 4 INT ops per cycle (4 to just 1 core). 4 ops per cycle per core remains unchanged. <-massive improvement in MT workload
Accelerate SYSCALL/SYSRET. <- I have no idea how much faster this change makes the syscall/sysret,probably noticeable improvement
Increased L2 BTB size from 5K to 10K and from 8 to 16 banks. <-solid improvement
Improved loop prediction. <- solid improvement (don't know how good though)
Increase PFB from 8 to 16 entries; the 8 additional entries can be used either for prefetch or as a loop buffer. <- prefetch was already solid in BD/PD, making it better cannot hurt
Increase snoop tag throughput. <-no clue
Change from 4 to 3 FP pipe stages. <- don't know what to think of this. It's listed as improvement so less stages is good(shorter pipeline usually means better IPC).



. I'm still reading it, looks interesting. Will comment later 
Reply With Quote

Bookmarks