Results 1 to 25 of 1008

Thread: Official AMD Barcelona Thread

Threaded View

  1. #11
    Xtreme Addict
    Join Date
    Aug 2004
    Location
    Austin, TX
    Posts
    1,346
    I did some digging in the AMD 10h Programmer's Manual and Intel's Programmer Manual.

    Barcelona shuffle latency (PSHUFHW): 2 cycles latency
    Conroe shuffle latency (PSHUFHW): 2 cycle latency
    Penryn shuffle latency: 1 cycle (from super shuffle engine)

    Barcelona 128bit division latency (DIV*): 16-20 cycles latency
    Yonah 128bit division latency (DIVPD, DIVSD): 63 cycles latency (not a typo!)
    Conroe 128bit division latency (DIVPD, DIVSD): 18 cycles latency
    Penryn 128bit division latency (DIVPD, DIVSD): ~5 cycles latency (estimated from Radix-16 divider)

    Barcelona 128bit square root latency (SQRT*): 21-27 cycles
    Yonah 128bit square root latency (SQRT*): 118 cycles (not a typo!)
    Conroe 128bit square root latency (SQRT*): 29 cycles
    Penryn 128bit square root latency (SQRT*): ~9 cycles latency (estimated from Radix-16 divider)

    Conclusion: Shuffle speed improvement for Penryn shouldn't make too big of a difference... 2 cycles latency was already very low. Division/sqrt isn't that common (but is common in F@H) and performance will improve accordingly.
    Last edited by Shadowmage; 05-11-2007 at 10:40 PM.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •