spdycpu
08-16-2006, 06:28 PM
Hello everyone. I was wondering if you guys would like to help me gather some results for these two benchmarks. I have various Athlons here that I've tested, however, I haven't had the opportunity to test much of anything else. My only P4 box is missing a PSU and some ram and the DEC Alpha 21164-500 needs ram as well. I am most interested in the stuff I have yet to test, which would be Conroes, Pentium 4 (Northwood and Prescotts), Pentium-M, and the various VIA chips. Any results are welcome of course, I'll continue to update this with the rest of my systems later on.
Shift64 download (32 & 64 bit versions): http://chess.homelinux.com/shift64.zip
Shift64 source code: http://chess.homelinux.com/shift64.c
I wrote this mainly because I wanted to see something that took advantage of 64 bit mode on the AMD64/EM64T chips. It should induce register pressure, thus gaining the benefit of the extra 8 GP registers in 64 bit mode. Not only that, the 64 bit shifts (which can also be done in 32 bit mode) should be able to be done much quicker, with less instructions to get the "job" done. Here is a good example:
32 bit mode, 64 bit shift:
mov edi, eax
mov ebx, ecx
shld ebx, edi, 3
add edi, edi
add edi, edi
add edi, edi
mov DWORD PTR _l$[esp+336], edi
mov DWORD PTR _l$[esp+340], ebx
64 bit mode, 64 bit shift:
mov rax, rdx
shl rax, 11
mov QWORD PTR l$[rsp+80], rax
So, as you can see, 64 bit mode helps cut down the number of things the CPU has to do during a left/right shift (larger register, doesn't have to swap stuff in and out). Also, shift64 is small enough to completely fit in the L1 cache. Bus/memory speed, L2 size/speed shouldn't matter at all. Here are some of the results I have so far for Shift64.
Athlon 64 (San Diego) 2.72GHz 64 bit mode: 3189.7927 million shifts per second
Celeron (Northwood) 2.7 @ 3.4GHz: 857.5870 million shifts per second
Athlon 64 (San Diego) 2.72GHz 32 bit mode: 694.2034 million shifts per second
Pentium 4 (Northwood) 2.4GHz 621.7214 million shifts per second
Athlon XP (Barton) 2.5GHz: 615.0298 million shifts per second
Athlon XP (Tbred-B) 2.08GHz: 507.1958 million shifts per second
Athlon (Tbird) 1.20GHz: 292.9974 million shifts per second
Alpha 21164a 500MHz (64 bit): 215.0242 million shifts per second
Celeron (Mendocino) 552MHz: 157.6106 million shifts per second
6x86MX 300MHz(PR400?): 30.5344 million shifts per second
So far it looks like P4's aren't doing too bad in 32 bit mode with Shift64. It would be interesting to see if it can get a speed boost similar to the Athlon 64's in 64 bit mode.
Grav download: http://chess.homelinux.com/grav.zip
Grav Source Code: http://chess.homelinux.com/grav.c
Grav tests raw FPU power. It consists of double precision instructions, 50% of which are multiplication, 16.667% addition, 16.667% subtraction, and 16.667% division. The calculations done are on two spherical objects about the same size and weight as a regulation baseball, placed 200 meters (center to center) apart from each other. The only force acting on the two "objects" is the force of gravity. The simulation calculates how long it would take for both objects to collide from the force of their own gravity.
Like Shift64, Grav shouldn't be influced by L2 cache or system memory. Here are some results from it so far:
Athlon 64 (San Diego) 2.72GHz: 11.156 seconds
Athlon XP (Barton) 2.5GHz: 12.109 seconds
Athlon XP (Tbred-B) 2.08GHz: 14.821 seconds
Celeron (NW) 2.7@3.4GHz: 15.906 seconds
Pentium 4 (NW) 2.4GHz: 22.515 seconds
Athlon (Tbird) 1.20GHz: 25.446 seconds
Celeron (Mendocino) 552MHz: 71.140 seconds
Alpha 21164a 500MHz: 75.150 seconds
6x86MX 300MHz(PR400?): 130.670 seconds
After seeing some cpu brand bias comments in other threads, I'd like to comment on something. No matter how the scores go (relative to the two main processor manufacturers today), neither of my programs are "optimized" for any one particular brand (or type) of CPU. Visual Studio 2005 doesn't even offer cpu specific flag optimizations, even though the binary it does produce is quite fast. Not only that, I don't think this code is even able to be optimized for any particular brand of CPU, due to the code being rather simplistic. This is one of the reasons I included the source, I do not wish to hide anything. Everyone can see exactly what is being done.
Again, all results are welcome. Thanks to those that contribute their results, I'm looking forward to seeing how all the different core types match up.
Shift64 download (32 & 64 bit versions): http://chess.homelinux.com/shift64.zip
Shift64 source code: http://chess.homelinux.com/shift64.c
I wrote this mainly because I wanted to see something that took advantage of 64 bit mode on the AMD64/EM64T chips. It should induce register pressure, thus gaining the benefit of the extra 8 GP registers in 64 bit mode. Not only that, the 64 bit shifts (which can also be done in 32 bit mode) should be able to be done much quicker, with less instructions to get the "job" done. Here is a good example:
32 bit mode, 64 bit shift:
mov edi, eax
mov ebx, ecx
shld ebx, edi, 3
add edi, edi
add edi, edi
add edi, edi
mov DWORD PTR _l$[esp+336], edi
mov DWORD PTR _l$[esp+340], ebx
64 bit mode, 64 bit shift:
mov rax, rdx
shl rax, 11
mov QWORD PTR l$[rsp+80], rax
So, as you can see, 64 bit mode helps cut down the number of things the CPU has to do during a left/right shift (larger register, doesn't have to swap stuff in and out). Also, shift64 is small enough to completely fit in the L1 cache. Bus/memory speed, L2 size/speed shouldn't matter at all. Here are some of the results I have so far for Shift64.
Athlon 64 (San Diego) 2.72GHz 64 bit mode: 3189.7927 million shifts per second
Celeron (Northwood) 2.7 @ 3.4GHz: 857.5870 million shifts per second
Athlon 64 (San Diego) 2.72GHz 32 bit mode: 694.2034 million shifts per second
Pentium 4 (Northwood) 2.4GHz 621.7214 million shifts per second
Athlon XP (Barton) 2.5GHz: 615.0298 million shifts per second
Athlon XP (Tbred-B) 2.08GHz: 507.1958 million shifts per second
Athlon (Tbird) 1.20GHz: 292.9974 million shifts per second
Alpha 21164a 500MHz (64 bit): 215.0242 million shifts per second
Celeron (Mendocino) 552MHz: 157.6106 million shifts per second
6x86MX 300MHz(PR400?): 30.5344 million shifts per second
So far it looks like P4's aren't doing too bad in 32 bit mode with Shift64. It would be interesting to see if it can get a speed boost similar to the Athlon 64's in 64 bit mode.
Grav download: http://chess.homelinux.com/grav.zip
Grav Source Code: http://chess.homelinux.com/grav.c
Grav tests raw FPU power. It consists of double precision instructions, 50% of which are multiplication, 16.667% addition, 16.667% subtraction, and 16.667% division. The calculations done are on two spherical objects about the same size and weight as a regulation baseball, placed 200 meters (center to center) apart from each other. The only force acting on the two "objects" is the force of gravity. The simulation calculates how long it would take for both objects to collide from the force of their own gravity.
Like Shift64, Grav shouldn't be influced by L2 cache or system memory. Here are some results from it so far:
Athlon 64 (San Diego) 2.72GHz: 11.156 seconds
Athlon XP (Barton) 2.5GHz: 12.109 seconds
Athlon XP (Tbred-B) 2.08GHz: 14.821 seconds
Celeron (NW) 2.7@3.4GHz: 15.906 seconds
Pentium 4 (NW) 2.4GHz: 22.515 seconds
Athlon (Tbird) 1.20GHz: 25.446 seconds
Celeron (Mendocino) 552MHz: 71.140 seconds
Alpha 21164a 500MHz: 75.150 seconds
6x86MX 300MHz(PR400?): 130.670 seconds
After seeing some cpu brand bias comments in other threads, I'd like to comment on something. No matter how the scores go (relative to the two main processor manufacturers today), neither of my programs are "optimized" for any one particular brand (or type) of CPU. Visual Studio 2005 doesn't even offer cpu specific flag optimizations, even though the binary it does produce is quite fast. Not only that, I don't think this code is even able to be optimized for any particular brand of CPU, due to the code being rather simplistic. This is one of the reasons I included the source, I do not wish to hide anything. Everyone can see exactly what is being done.
Again, all results are welcome. Thanks to those that contribute their results, I'm looking forward to seeing how all the different core types match up.