-
I did some digging in the AMD 10h Programmer's Manual and Intel's Programmer Manual.
Barcelona shuffle latency (PSHUFHW): 2 cycles latency
Conroe shuffle latency (PSHUFHW): 2 cycle latency
Penryn shuffle latency: 1 cycle (from super shuffle engine)
Barcelona 128bit division latency (DIV*): 16-20 cycles latency
Yonah 128bit division latency (DIVPD, DIVSD): 63 cycles latency (not a typo!)
Conroe 128bit division latency (DIVPD, DIVSD): 18 cycles latency
Penryn 128bit division latency (DIVPD, DIVSD): ~5 cycles latency (estimated from Radix-16 divider)
Barcelona 128bit square root latency (SQRT*): 21-27 cycles
Yonah 128bit square root latency (SQRT*): 118 cycles (not a typo!)
Conroe 128bit square root latency (SQRT*): 29 cycles
Penryn 128bit square root latency (SQRT*): ~9 cycles latency (estimated from Radix-16 divider)
Conclusion: Shuffle speed improvement for Penryn shouldn't make too big of a difference... 2 cycles latency was already very low. Division/sqrt isn't that common (but is common in F@H) and performance will improve accordingly.
Last edited by Shadowmage; 05-11-2007 at 10:40 PM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
Bookmarks