I posted this over at GPUGrid ...
http://www.gpugrid.net/forum_thread....rap=true#17824

My first return on a full size 3.1 is slower ... very disappointing.
No changes to system setup: GTX480 on WinXP
Examples are from the same WU type: TONI_CAPBIND*

Old version average runtime was 6550 (very little delta between runs)
# Time per step (avg over 650000 steps): 10.065 ms
# Approximate elapsed time for entire WU: 6541.937 s

New version 1 result runtime was 7768 seconds.
# Time per step (avg over 650000 steps): 11.947 ms
# Approximate elapsed time for entire WU: 7765.391 s