AMD "Steamroller/Excavator" -info, speculations and experience

**informal** · 03-07-2013, 02:31 PM

Flanker thanks for posting the relevant news mate

. I'm still reading it, looks interesting. Will comment later

edit:
Wow ,BSN found some massive gold mine of info,some of which he haven't seen before

.
What Flanker quoted above was unknown before.

The document lists the following changes to improve instructions per clock (IPC):

Store to load forwarding optimization <- big improvement(store handling sucked in BD/PD)
Dispatch and retire up to 2 stores per cycle <- same as above
Improved memfile, from last 3 stores to last 8 stores, and allow tracking of dependent stack operations. <-complements above
Load queue (LDQ) size increased to 48, from 44. <-solid improvement to load subsystem
Store queue (STQ) size increased to 32, from 24. <-complements above mem. store subsystem changes
Increase dispatch bandwidth to 8 INT ops per cycle (4 to each core), from 4 INT ops per cycle (4 to just 1 core). 4 ops per cycle per core remains unchanged. <-massive improvement in MT workload
Accelerate SYSCALL/SYSRET. <- I have no idea how much faster this change makes the syscall/sysret,probably noticeable improvement
Increased L2 BTB size from 5K to 10K and from 8 to 16 banks. <-solid improvement
Improved loop prediction. <- solid improvement (don't know how good though)
Increase PFB from 8 to 16 entries; the 8 additional entries can be used either for prefetch or as a loop buffer. <- prefetch was already solid in BD/PD, making it better cannot hurt
Increase snoop tag throughput. <-no clue
Change from 4 to 3 FP pipe stages. <- don't know what to think of this. It's listed as improvement so less stages is good(shorter pipeline usually means better IPC).

Thread: AMD "Steamroller/Excavator" -info, speculations and experience

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions