PDA

View Full Version : I need testers for the 2 small benchmarks I wrote.


spdycpu
08-16-2006, 06:28 PM
Hello everyone. I was wondering if you guys would like to help me gather some results for these two benchmarks. I have various Athlons here that I've tested, however, I haven't had the opportunity to test much of anything else. My only P4 box is missing a PSU and some ram and the DEC Alpha 21164-500 needs ram as well. I am most interested in the stuff I have yet to test, which would be Conroes, Pentium 4 (Northwood and Prescotts), Pentium-M, and the various VIA chips. Any results are welcome of course, I'll continue to update this with the rest of my systems later on.

Shift64 download (32 & 64 bit versions): http://chess.homelinux.com/shift64.zip
Shift64 source code: http://chess.homelinux.com/shift64.c
I wrote this mainly because I wanted to see something that took advantage of 64 bit mode on the AMD64/EM64T chips. It should induce register pressure, thus gaining the benefit of the extra 8 GP registers in 64 bit mode. Not only that, the 64 bit shifts (which can also be done in 32 bit mode) should be able to be done much quicker, with less instructions to get the "job" done. Here is a good example:

32 bit mode, 64 bit shift:
mov edi, eax
mov ebx, ecx
shld ebx, edi, 3
add edi, edi
add edi, edi
add edi, edi
mov DWORD PTR _l$[esp+336], edi
mov DWORD PTR _l$[esp+340], ebx


64 bit mode, 64 bit shift:
mov rax, rdx
shl rax, 11
mov QWORD PTR l$[rsp+80], rax

So, as you can see, 64 bit mode helps cut down the number of things the CPU has to do during a left/right shift (larger register, doesn't have to swap stuff in and out). Also, shift64 is small enough to completely fit in the L1 cache. Bus/memory speed, L2 size/speed shouldn't matter at all. Here are some of the results I have so far for Shift64.

Athlon 64 (San Diego) 2.72GHz 64 bit mode: 3189.7927 million shifts per second
Celeron (Northwood) 2.7 @ 3.4GHz: 857.5870 million shifts per second
Athlon 64 (San Diego) 2.72GHz 32 bit mode: 694.2034 million shifts per second
Pentium 4 (Northwood) 2.4GHz 621.7214 million shifts per second
Athlon XP (Barton) 2.5GHz: 615.0298 million shifts per second
Athlon XP (Tbred-B) 2.08GHz: 507.1958 million shifts per second
Athlon (Tbird) 1.20GHz: 292.9974 million shifts per second
Alpha 21164a 500MHz (64 bit): 215.0242 million shifts per second
Celeron (Mendocino) 552MHz: 157.6106 million shifts per second
6x86MX 300MHz(PR400?): 30.5344 million shifts per second

So far it looks like P4's aren't doing too bad in 32 bit mode with Shift64. It would be interesting to see if it can get a speed boost similar to the Athlon 64's in 64 bit mode.

Grav download: http://chess.homelinux.com/grav.zip
Grav Source Code: http://chess.homelinux.com/grav.c

Grav tests raw FPU power. It consists of double precision instructions, 50% of which are multiplication, 16.667% addition, 16.667% subtraction, and 16.667% division. The calculations done are on two spherical objects about the same size and weight as a regulation baseball, placed 200 meters (center to center) apart from each other. The only force acting on the two "objects" is the force of gravity. The simulation calculates how long it would take for both objects to collide from the force of their own gravity.

Like Shift64, Grav shouldn't be influced by L2 cache or system memory. Here are some results from it so far:

Athlon 64 (San Diego) 2.72GHz: 11.156 seconds
Athlon XP (Barton) 2.5GHz: 12.109 seconds
Athlon XP (Tbred-B) 2.08GHz: 14.821 seconds
Celeron (NW) 2.7@3.4GHz: 15.906 seconds
Pentium 4 (NW) 2.4GHz: 22.515 seconds
Athlon (Tbird) 1.20GHz: 25.446 seconds
Celeron (Mendocino) 552MHz: 71.140 seconds
Alpha 21164a 500MHz: 75.150 seconds
6x86MX 300MHz(PR400?): 130.670 seconds

After seeing some cpu brand bias comments in other threads, I'd like to comment on something. No matter how the scores go (relative to the two main processor manufacturers today), neither of my programs are "optimized" for any one particular brand (or type) of CPU. Visual Studio 2005 doesn't even offer cpu specific flag optimizations, even though the binary it does produce is quite fast. Not only that, I don't think this code is even able to be optimized for any particular brand of CPU, due to the code being rather simplistic. This is one of the reasons I included the source, I do not wish to hide anything. Everyone can see exactly what is being done.

Again, all results are welcome. Thanks to those that contribute their results, I'm looking forward to seeing how all the different core types match up.

Avman
08-16-2006, 06:35 PM
Grav - A64 X2 3800+ @ 3050mhz

10.062 Seconds

cadaveca
08-16-2006, 06:47 PM
Grav - Opteron 170 @ 2800mhz


Days until collision:8336.636944
Collision velocity: 0.000022716218313m/s (0.000011358109157m/s per object)
Runtime: 10.828000 seconds

s64 32-bit


Loop overhead: 1.44 seconds
Total time: 23.77 seconds
64-Bit Shift operation time: 22.33 seconds
716.5569 million shifts per second

spdycpu
08-26-2006, 01:21 AM
Ok, more results added. Woohoo @ Alpha 21164a-500MHz :)

Still needing Conroe 32/64 bit results as well as Prescott 32/64 bit results. It doesn't matter to me if your box is running at 1.8GHz or 5GHz, I'd just like to compare core types (the results should be perfectly linear to clock rate). I'm real eager to see how Conroe chips do in Grav, testing their "regular" FPU.

Due to the lack of people submitting results here makes me wonder if anyone is worried about this being virii or anything of that nature. If that is a concern, upload whatever to http://virusscan.jotti.org and they'll scan it with *15* different virus scanners.

Ibaun
08-26-2006, 03:33 AM
Intel Pentium M (Banias) 1.5 ghz

s64-32bit:
Loop overhead: 2.82 seconds
Total time: 35.16 seconds
64-Bit Shift operation time: 32.34 seconds
494.8046 million shifts per second

Grav:
Days until collision:8336.636944
Collision velocity: 0.000022716218313m/s (0.000011358109157m/s per object)
Runtime: 26.407000 seconds

Will do the same tests on my E6600 conroe when my memory is back (sometime next week) if nobody else did.

taemun
08-26-2006, 08:49 AM
Conroe E6600 @ stock (yeah yeah its being oc'd soon :P) G.Skill 6400HZ @ 800/4-4-4-12

-32bit- shift64
Loop overhead: 1.70 seconds
Total time: 16.52 seconds
64-Bit Shift operation time: 14.81 seconds
1080.2052 million shifts per second

-2nd run of shift-
Loop overhead: 1.69 seconds
Total time: 16.47 seconds
64-Bit Shift operation time: 14.78 seconds
1082.3975 million shifts per second

-grav-
Runtime: 16.421000 seconds
(ofc the results are identical to Ibaun's...)

-2nd run grav-
Runtime: 16.343000 seconds

Will be OC'ing tomorrow, may post back then :)

Here are some more results, same system, running Windows XP x64 edition in VMWare 5.5:
-shift x32 in VM-
Loop overhead: 1.77 seconds
Total time: 17.06 seconds
64-Bit Shift operation time: 15.30 seconds
1045.8884 million shifts per second

-shift x64 in VM-
Loop overhead: 1.52 seconds
Total time: 7.30 seconds
64-Bit Shift operation time: 5.78 seconds
2767.2086 million shifts per second

-grav in VM-
Runtime: 16.984000 seconds

I'm quite astonished at how efficient VMWare is these days :D

t

Mad_Man
09-06-2006, 06:03 AM
1st rig in sig
cpu 2600MHz ram at 1040 5-5-5-8 cpu-z validated (http://valid.x86-secret.com/show_oc.php?id=113924)

shift 32bit - 640.7946 milion
shift 64bit - 2976.7442 milion
grav - rutime: 12.14sec

if you need some other info... dont hesitate to ask :)

SKiLL3D
09-06-2006, 09:58 AM
Rig@Sig - all stock

shift 32bit
Loop overhead: 1.80 seconds
Total time: 16.42 seconds
64-Bit Shift operation time: 14.63 seconds
1093.9423 million shifts per second

grav
Object stats (both are identical):
Diameter: 0.0737762739 meters
Weight: 0.1425400000kg
Initial distance between objects (ctc): 200.000 meters
Initial velocity: 0.000m/s
Calculating...

Days until collision:8336.636944
Collision velocity: 0.000022716218313m/s (0.000011358109157m/s per object)
Runtime: 16.421000 seconds

i hope this helps u out

iLL

spdycpu
09-07-2006, 06:01 AM
Great stuff so far guys, thanks. It looks like the Core 2 Duo chips just have the Pentium-M FPU (or very similar) from the Grav results. The shift stuff is interesting, I'm wondering how the C2D will do in true 64 bit windows. Looks like it is about on par in 64 bit mode with an A64 I believe. A friend of mine (guru programmer) might be working on something in the near future to test the instruction latency of all of the chips to see how fast they do each particular instruction. That should be extremely useful in seeing where a chips strong/weak points are.

Another thing for you C2D guys, have you ever tried or heard of distributed.net? I know tons of people use to run it back in the RC5-64 days, but, seems pretty much dead now with all of the Boinc stuff. If anyone wants to test it, try downloading this:
ftp://ftp.distributed.net/pub/dcti/v2.9012/dnetc497-win32-x86.zip

To run it, extract it somewhere. Run dnetc.exe, it'll pop up a config screen. Hit zero and then enter, it'll close. Rerun dnetc.exe and it'll download some RC5-72 packets to crunch on, ignore that. Right click on the window and do "benchmark->All Projects All cores". Since the Conroe chip didn't exist at the time of that client, it can't auto-detect your CPU type and choose the fastest one, so, to find out which is fastest you'll need to run the entire test set. :) You don't have to paste the entire thing out, just the core that yields the fastest result for RC5-72 and OGR-P2.

Here are my results in Windows 2k on an A64-2.72GHz:

[Sep 07 13:56:45 UTC] RC5-72: Benchmark for core #6 (GO 2-pipe)
0.00:00:17.32 [11,526,718 keys/sec]
[Sep 07 13:56:45 UTC] OGR-P2: using core #3 (GARSP 6.0-asm-rt1-mmx).
[Sep 07 13:57:04 UTC] OGR-P2: Benchmark for core #3 (GARSP 6.0-asm-rt1-mmx)
0.00:00:16.57 [33,399,337 nodes/sec]


In Fedora Core 4 x86_64 my OGR is just over 37 million, RC5-72 is the same however.