PDA

View Full Version : New Benchmark supporting FPU/SSE2/Multiple Cores



Kuemmel
06-15-2006, 10:25 AM
Hi guys,

with the help of lots of friends I developed a benchmark totally focused on brute CPU core force and core efficiency. Basically it just draws a bunch of Fractals, what is not so new maybe, but the implementation is may be different to others:

- As usual the Mandelbrot algoritm doesn't do nearly any memory access, so memory speed doesn't matter all
- Due to pure assembler code the size is so small that it runs purely in the 1st level cache of the CPU
- There are two versions, one using SSE2, one the FPU to see the difference of these two different floating point units.
- The use of DirectDraw reduces the overhead to draw the fractal to 1 or 2 percent of total time, so influence of graphics card is neglectable
- Multi-Threading is heavily used, 16 Threads are running, so up to 16 real or virutal cores (hyper threading) are used with almost perfect paralelism

So tell me what you think about it and send some results if your like !

Source code is available, too. There are plenty of results on my page. Beside the normal result I also did some efficiency calculations per MHz per core like some do for SuperPI. Really impressing the upcoming Intel Merom for example thanks to coolaler :)

File:
http://www.mikusite.de/x86/KMB_V0.53_MT.zip
Homepage
http://www.mikusite.de
Results:
http://www.mikusite.de/pages/list.gif
Cheers,

Kümmel

Big SturL
06-15-2006, 10:37 AM
I got 163.846secs on the FPU-test. CS, VT, MSN and such were running as well. What surprises me is that I get exactly the same result on each run. I'll shut some progs and run it again.

Update:
Got only ten more points when I shut the above programs, got 173.381. On the SSE2 I got 317.860. Like the pretty colors though :)

NightCrawler™
06-15-2006, 10:44 AM
Fpu: 341.766
Sse2: 471.019

3oh6
06-15-2006, 11:24 AM
i'm at work but i VNCd into my workstation at home and ran it (no benching rigs are turned on at the moment):

Opteron 146 @ 2800Mhz w/various services running and some programs in the background:
FPU: 186.084
SSE2: 264.256

little programs to gauge increases are always welcome around here, i will do some more testing when i get home but it sounds like a nice way to bench just the CPU, it appears that dual cores definitely perform better but based on NightCrawlers results it looks to scale appropriately vs single core results by pretty much doubling me at similar frequencies.

RTB
06-15-2006, 01:12 PM
Venice E6
Mem at 2-3-2-5, 166 divider
mostly clean WinXP install

Mhz FPU SSE2
2000 131.969 181.568
2100 138.025 189.786
2200 144.528 198.910
2300 151.177 208.117
2400 157.685 217.040
2500 164.257 225.980
2600 170.860 235.287
2700 177.634 244.143
2800 183.985 253.434
2900 190.573 262.405

avg 5.8604 8.0837
At the same clock scores varied about 0.2 per run.

nn_step
06-15-2006, 01:19 PM
I request a version with 256 Threads and if it isn't too much work to have it be output in a .txt file

Charles Wirth
06-15-2006, 01:37 PM
383/841

Allendale @ 3Ghz

3oh6
06-15-2006, 02:26 PM
and if it isn't too much work to have it be output in a .txt file
yeah...a text output would be a very nice touch,

NickS
06-15-2006, 02:32 PM
FPU = 137.658

This CPU doesn't have SSE2 tho. Rig is Ownage-rly? Jr. in sig. :)

Nick

Kuemmel
06-15-2006, 03:35 PM
I request a version with 256 Threads and if it isn't too much work to have it be output in a .txt file
Hi ! A .txt output would be fine, yep, just I still think it's not too much work also to write down the result or do a screendump. But I put it on my 'to do' list for sure !

But why 256 threads !? So many threads are slowing down results on single core CPU's...

Kuemmel
06-15-2006, 03:39 PM
I got 163.846secs on the FPU-test. CS, VT, MSN and such were running as well. What surprises me is that I get exactly the same result on each run. I'll shut some progs and run it again.

Update:
Got only ten more points when I shut the above programs, got 173.381. On the SSE2 I got 317.860. Like the pretty colors though :)

Yup, I forgot to mention may be that you should not run any other programs while benchmarking and may be do the run like 3 times and write down the best of the time. The Thread Priority is already set to Maximum but of course the Multi-Threading nature of Windows is still handing time to other running Applications...using DOS would be more nice ;-) ...but I guess that time is over and multi-threading on DOS isn't really supported that nicely or at all ;-)

I made the benchmark looping totally 5 times, that makes it really stable for the results !

nn_step
06-15-2006, 03:51 PM
Hi ! A .txt output would be fine, yep, just I still think it's not too much work also to write down the result or do a screendump. But I put it on my 'to do' list for sure !

But why 256 threads !? So many threads are slowing down results on single core CPU's...
Because that is the most Threads supported by XS OS;)
and I am trying to find benchmarks that actually use the full potential of our operating system.

Kuemmel
06-16-2006, 12:39 AM
Because that is the most Threads supported by XS OS;)
and I am trying to find benchmarks that actually use the full potential of our operating system.
XS OS !? What's that ? Any links on this ?

The problem with the threads is that if you set up 256 threads basically then you should have also 256 virtual (hyper threading) or real CPU cores otherwise there is a small but visible speed penalty due to the extra thread administration overhead. I was thinking in detecting number of cores and hyperthreading and depending on that to set up the number of threads. That's may be tricky but most efficient.

nn_step
06-16-2006, 12:47 AM
XS OS is an EXTREMELY custom Bake of 2003 Enterprise Server.
http://www.xtremesystems.org/forums/showthread.php?t=99710
Made by a small team, supporting 64 Physical Cores, 128 Logical cores, 256 Concurrent Threads, 128Gb of Ram, and a Laundry list of minor features, All while only having a 40Mb Ram footprint and taking up less than 500Mb of Hard drive space.

Kuemmel
06-16-2006, 12:53 AM
little programs to gauge increases are always welcome around here, i will do some more testing when i get home but it sounds like a nice way to bench just the CPU, it appears that dual cores definitely perform better but based on NightCrawlers results it looks to scale appropriately vs single core results by pretty much doubling me at similar frequencies.
Yep, that's true...also what you can see that the AMD CPU's didn't really change for the core itself since the introduction of the Athlon, still the same efficiency for the FPU all the time...for Intel it's a more confusing story.

So I'll stay tuned for more results ! Turn off as much of the other programs as possible while benchmarking !

somwhere_there
06-17-2006, 12:07 AM
SSE2 = 180
FPU = 131
Think this really shows that AXP and AM64 wasnt a real change in what Kuemmel stated about FPU efficiency (if we compare AXP@2,1GHz Vs mine A64@2,0GHZ).

Kuemmel
06-17-2006, 04:48 AM
SSE2 = 180
FPU = 131
Think this really shows that AXP and AM64 wasnt a real change in what Kuemmel stated about FPU efficiency (if we compare AXP@2,1GHz Vs mine A64@2,0GHZ).
Yup, exactly, I don't know if the AM2 has different performance but as far as I know it's only about memory bandwidth, so I don't expect it...anybody here with a AM2 system to prove it !?

The results of Conroe/Merom show this huge improvement for the SSE2 unit, but not for the FPU (compared to Pentiun-M)...so when next year the AMD K8L will come the competition starts again :-)

wtfdc
06-22-2006, 07:30 PM
Welp,
I got : SSE2 : 1636.496
and FPU : 748.435

This on a Dual Woodcrest 3.0 Ghz

nn_step
06-23-2006, 03:17 AM
Welp,
I got : SSE2 : 1636.496
and FPU : 748.435

This on a Dual Woodcrest 3.0 Ghz
blame your FSB

NickS
06-23-2006, 08:41 AM
212.268 SSE2
154.842 FPU

Venice 3200+ @ 2.4GHz, 1:1 ram @ 2.5-3-3-6. Low clockspeed because of DFI instabilities at any higher.

Kuemmel
06-24-2006, 04:53 PM
Welp,
I got : SSE2 : 1636.496
and FPU : 748.435

This on a Dual Woodcrest 3.0 Ghz
Woooooow...new record...can't wait to see a quad core running next year !

Your system already should get a SSE2 level of 2000.000 when clocked at around 3570 MHz...may be overclocking to this level is even possible by air cooling...!?

Praxis1452
06-25-2006, 08:37 PM
212.268 SSE2
154.842 FPU

Venice 3200+ @ 2.4GHz, 1:1 ram @ 2.5-3-3-6. Low clockspeed because of DFI instabilities at any higher.
Hmm p4 seems to do well in this test...
3ghz stock p4 SSE2 318.5 w/ backround apps
FPU 174.425 w/ backround apps.


I do have Ht though...

NickS
06-25-2006, 09:06 PM
Yeah. In the upper part of the thread there was mentioning of how the Athlon XP & Athlon 64 have almost identical FPU performance.

Nick

Praxis1452
06-26-2006, 02:37 PM
Does HT help alot? jw

Kuemmel
06-27-2006, 08:11 AM
Does HT help alot? jw
Yup ! Compare the results for a Intel Dual Xeon Prestonia:
The results recalculated for Efficiency, like Result/Mhz/Core:
HT on: FPU 36,6 / SSE2 83,9
HT off: FPU 58,8 / SSE2 102,2
So the result for the FPU almost double, for SSE2 the effect is not so drastic...it shows very good how bad the P4 were before introduction of HT...and HT is only used by multithreading software...and efficiency of a P3 and AMD in general for the FPU is still even higher than a P4/Xeon with HT. Just they can't clock that high...

Dualist
06-28-2006, 12:10 PM
On my Sony lappy I got...
FPU...219.144
SSE2...289.353

Dual Optie I got...
FPU...312.913
SSE2...427.963 :D

Both stock.

May test this on my OC'ed Xeon systems soon.

3oh6
06-28-2006, 09:02 PM
Yup, exactly, I don't know if the AM2 has different performance but as far as I know it's only about memory bandwidth, so I don't expect it...anybody here with a AM2 system to prove it !?
just ran 939 vs AM2:

929 146 (1MB) @ 2.7GHz DDR540 (2GB 3-3-3-8) = 177.533 FPU / 244.527 SSE2
AM2 3500+ (512KB) @ 2.7GHz DDR900 (2GB 4-4-3-5) = 178.219 FPU / 245.830 SSE2

of course i had to then drop the memory multipliers and run again with exact same results so yeah it appears as if AM2 and 939 are the same here. the cahce may have played a role but the 512K on the AM2 scored higher...maybe a slight advantage for AM2.

Kuemmel
06-29-2006, 09:07 AM
...just updated my result list...
http://www.mikusite.de/pages/x86.htm
...waiting for more :)

@3oh6: Thanx for the AM2 comparison...quite small difference...

3oh6
06-29-2006, 09:53 AM
@3oh6: Thanx for the AM2 comparison...quite small difference...
yup, i ran the tests 5 times each and those are the averaged results. very little if any gain. would the cache difference have any influence?

Kuemmel
06-29-2006, 11:36 AM
yup, i ran the tests 5 times each and those are the averaged results. very little if any gain. would the cache difference have any influence?
Hm, cache size definitely not, as my code should fit totally in the 1st level cache...so if the 1st level cache doesn't run at different speed, what I wouldn't think, then the cache has also no influence...

pete5990
07-04-2006, 01:17 PM
Celeron D 2.8GHz
88.982FPU
207.32SSE2

Celeron D 3.14GHz
99.943FPU
232.737SSE2

Celeron D 3.36GHz
106.526
247.934

Celeron D 3.5GHz
111.018
258.541

Celeron D 3.7GHz
117.346
273.468

Will edit in the results as I go.

Kuemmel
07-14-2006, 11:11 PM
I put the results table in the start of the thread for more convenience !

sephiroth8748
07-20-2006, 09:54 PM
354.712fpu
488.957sse2

opty 170 @ 2.7ghz

Paladin
07-21-2006, 09:09 PM
AthlonXP 2400+ @ 13x166 (2166)
FPU: 142.022

AthlonXP 1700+ @ 11x200 (2205)
FPU: 124.939

m-P4 3.2 (HT on)
FPU: 186.772
SSE2: 344.692

P4-1.6 (Willamette) @ 14x115 (1610)
FPU: 58.128
SSE2: 139.773

2xXeon 2.4 (Prestonia) @ 16x200 (3215) (HT on)
FPU: 397.636
SSE2: 694.575

shogan191
07-23-2006, 06:33 AM
Conroe E6600@ 2.8 air

SSE2= 764
FPU = 342

Conroe E6600@ 3.8 air
SSE2= 960
FPU = 436

Using the (1067/4)=266.7x1.46x9=3.5

Kuemmel
07-24-2006, 01:59 PM
Conroe E6600@ 2.8 air
SSE2= 764
FPU = 342
Conroe E6600@ 3.8 air
SSE2= 960
FPU = 436
Thanks your Conroe values !!! Though they are a little lower compared to the other results I have from Core 2 Duo CPU's. Did you run any other applications ? Could you also post the full numbers...as it's Millions of Iterations, they numbers after the dot are also quite important ;)

STEvil
07-24-2006, 09:51 PM
any chance adding an ALU and SSE3 portion to this? :D

edit - or SSE1?

Kuemmel
07-25-2006, 09:43 AM
any chance adding an ALU and SSE3 portion to this? :D
edit - or SSE1?

An SSE3 version is included as compiler/assembler option in the source code. Though the enhancement is very low, as the SSE3 extensions provides only little optimization potential for this kind of code, the additional commands don't help too much.

SSE1 version would be possible, but due to only single precision floating point values usage of SSE1, it's not usefull for the 'deep' fractals I use. Double precision is needed when you go so deep.

ALU would need a lot of work. And as I'm not a mathematician I wouldn't know what kind of fixed point integer math could match double precision FPU math. I think the benchmark results wouldn't be comparable like now between FPU/SSE2 and I expect bad results for ALU math due to the optimized FPU/SSE units...

Camride
07-26-2006, 07:47 AM
I will get some E6600 results for you tonight on my system at home to see how they compare to shogan191's. I will post up tomorrow with the results. I'll just be running it at stock and 3.8Ghz(highest stable OC right now until I get a better board). If you want any other speeds done just let me know.

Kuemmel
07-26-2006, 02:06 PM
If you want any other speeds done just let me know.
Thanks a lot ! Just the highest speed you can get and the normal clock rate of the CPU, if you got time. Also please run the benchmark 3 times and note the best value with all digits...and of course no other applicatian should run beside.

Camride
07-27-2006, 07:34 AM
Conroe E6600 ES @ 3.8Ghz

SSE2: 1056.297 it/s average over 3 runs(same each time actually)

FPU: 481.591 it/s average over 3 runs(same each time actually)

I haven't done stock speeds yet, I'll do it soon.

hausner
08-03-2006, 08:51 AM
i will try it with my new conroe e6600

Gorod
08-10-2006, 11:48 PM
Conroe E6600 @ 3.0

SSE2= 834.701
FPU = 380.041

http://img95.imageshack.us/img95/8456/834701hq2.th.gif (http://img95.imageshack.us/my.php?image=834701hq2.gif)http://img216.imageshack.us/img216/8661/380041yb8.th.gif (http://img216.imageshack.us/my.php?image=380041yb8.gif)

Paladin
08-13-2006, 06:57 PM
Core2Duo (Allendale) E6400 @ 3440 (8x430)
FPU: 436.215
SSE2: 946.333

DEVIL K-ce
08-14-2006, 11:45 AM
C2D E6400@2.5GHz

FPU : 321
SSE2 : 705

Tommy L
08-14-2006, 01:13 PM
celeron D 346 @ 4255

sse2 : 311,967
fpu : 134,696