here a core 2 quad at stock. One of the reasons for poor scaling is core 2's cache. it has no shared l3 so the cores only get 2mb l2 each.
![]()
here a core 2 quad at stock. One of the reasons for poor scaling is core 2's cache. it has no shared l3 so the cores only get 2mb l2 each.
![]()
Those were done using processor affinity right? Because there's no 3-thread mode, and 2-thread mode will use up to 4 threads.
Also I think a big reason is that the program was tuned with 3MB of cache per thread. So massive cache spilling on a bandwidth-limited system will have major penalties.
Here's Q9400 scaling... Also very bad. I also don't know what was causing all of the variation in the benchmarks.
Main Page
Here's some very old results with version 0.2.1 on my workstation:
Scaling seems to hit a wall at 7x. I'm almost certain it's the memory bandwidth.
Main Page
I need to integrate a bulk-bench option that will generate the data for these graphs without having to do each benchmark by hand.
I already have an automated benchmark add-on (hence how I did all these runs), but it has no interface yet - all options are set in the source code and I have to recompile it everytime I change a setting.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
Bookmarks