I'll do it tonight. I don't have access to my webserver right now.
Interesting, your 32M time (9.45s) is slower than the W5580s (9.30s)... even though your memory is probably faster. Seems like that person did some serious tweaking.
You might have to do the same to beat those numbers. But for something larger like 256M, 512M, or 1G, your clock and memory speed advantage should beat any tweak.
For something as small as 32M, there's a lot of thread-creation/destruction overhead. So you might want disable HT or use the Custom Compute mode to override the thread settings to use fewer threads. At these sizes, the program probably spends a significant amount of time creating and destroying threads... bleh... Then I again, I never optimized the program for small computations.
Another possible reason is that since your memory is faster, your timings are more relaxed. I found that Nehalems have more memory bandwidth than the program needs. So tighter timings and slower memory might be better.
@ El Greco
I think we have a winner here! The first person to show up with a non-power-of-two cores!
You didn't unlock the 4th right?







Reply With Quote

Bookmarks