A brand new PI benchmark

**YoupY** · 03-02-2008, 05:54 AM

In the past I've allways favoured pi benchmark. Over the past years unfortunatly I haven't found the time (and space) I needed to really get going with it. Since last summer I've picked up my favorite past time activity again and started to bench with my new dual core cpu and cascade. However I didn't really liked calculating pi as I did before, mainly because the calculations were only using 50% of the resources available, and the other core was doing nothing at all

After playing around a bit with S-Pi my cascade sprung a small leak making it impossible to create a nice score. So it went out for maintenance. The leak is going to be fixed and an additional stage is going to be added, leaving me with some free time. Since that time quad cores have been released and 6 and 8 core cpu's are planned for later this year, early next year. Because I have some programming skilzz I decided to spend my time to see if there is a way to create a true multithreaded Pi benchmark application. Somewhere on the internet I found some nice algorithms I used in this app.

It took me some time, but finally I have something ready, though it is still in a beta phase. Here it is....

About the settings:

The algorithms are designed to optimize the calculations based on your system configuration. Therefore I've thrown in a few options to optimize the result of the calculation, giving you some more options to tweak the result.

The calulation thread are the number of threads used for the actual calulation. Besides this number of threads two more threads are created controlling the interface and the calculation, but those aren't doing a lot during the calculation.

The cache options are meant to optimeze L1 en L2 cache hits increasing the calculations performance. Setting them to low or high will have a negative effect on the benchmark result.

The block size options are fixed at 64M at the moment. Some work still has to be done in oder to get these options really going well. Like the cache options the blocksize options also influence the decisions for algorithms to be used only in a different part of the calculation

Memory Threshold is the maximum value for memory to be used for parts of the calculation. Going above this value means disk storage is going to be used for calculations. While this allows for calculating a huge number of digits for pi, you don't it to happing during the not so huge pi calculations.

Not available yet in this beta:

Use of disk storage for the really large calculations, or systems with "only" 512MB of RAM
Calculations optimized with block sizes.
Validation of the calculated result. Is the number calculated really Pi?
And also very important; the generation of the validation code.

The validation code will only be implemented in the final release

Feedback

I hope you all like this new program, but any feedback is welcome. At the moment I am still working on it and if you really like some extra feature let me know and I'll see what I can do. Also please report crashes in the program, instabilities or unexpected results.

I am hoping to see some nice screenshots now

Have fun!

Beta 1.3 is available, see a few posts below for the changes...

**Krelin** · 03-02-2008, 06:01 AM

512M!

**dinos22** · 03-02-2008, 06:02 AM

will try it sometime

**Knight** · 03-02-2008, 08:24 AM

Seems to work just fine.

4 Threads

1 Threads

**Extera** · 03-02-2008, 09:07 AM

So finally, its beta.
I will continue testing your benchmark soon youp.

**Kasparz** · 03-02-2008, 10:46 AM

Originally Posted by Krelin

512M!

So what? On fast quad it would take what? 30-35 minutes?

**Extera** · 03-12-2008, 03:23 PM

I'v tried your bench on my york today Youp.
I'll love it.

@ Kasparz. tried 512M @ 5300mhz, but after 2 liters of LN2, It was at about 20%....

**JeffBea** · 03-13-2008, 09:51 AM

Nice! I've been waitng for somthing like this. Now my q6600 @ 3.2 Should beat up those e8400 @ 4.0+.

**AresOlSkool** · 03-13-2008, 11:10 PM

My SuperPi 1M score of 5.547 seconds.

**YoupY** · 03-14-2008, 01:09 PM

Thanx for the results so far guys. Great to see you like the program.

I am still very interested in crashes though, because these need to be gone before making a final release. At the moment I'm still working on version 1.2 which allows using of the other options, and should be capable to run the 512M.

The version after that will probably contains some bug fixes and validation of the benchmark results

**64dragon** · 03-14-2008, 06:47 PM

Extera, thats one hell of an OC

Youp, i'm not getting consistent results between m-pi and super pi. i ran m-pi first and when i went to run super pi it took me a few tries for it to finish. in super pi i got a not exact in round error. not saying that a super pi error is related to your program but thought i'd mention. OC'ed to 2.6 i cant get lower than 31sec on super pi but i got 31 in m-pi at 2.2

**[XC] NetburstXE** · 04-03-2008, 05:33 AM

@YoupY - Can you PM me the source code for M-PI? I'm an amateur programmer, and I would like to play around with it and try changing some things.

Thanks

**OC4/3** · 04-05-2008, 11:20 AM

My bench on E3110@4.05ghz or so:

**YoupY** · 04-06-2008, 02:39 PM

Version 1.2 is ready for beta testing.

New in this version:

Extended computation thread selection
Possibility to use disk in calculations
Additions to the calculation algorithms
Transparent Pi logo

What's comming next

Performance improvements in disk data storage
Improved error handling
Less memory leaks
Possibility to abort a run
Checksum validation of calculated number
anything YOU request

With the disk data mode introduced in this version and additions to the algoritms it should be possible now to calculate the huge numbers. Before you might have run out of memory for the calculation. In order to get the disk storage to work I had to make changes to the original code creating some overhead. Therefore this version might be a bit slower then it's predecessor

Memory consumption is the main reason data has to be stored on the disk at one point. With more digits to calculate more memory is required to store the data. The amount of data increased with the amount of threads because of parallel nature of the application.

All the configuration settings should be working now. The bottom three options have a relation with the disk data storage. The memory threshold is the maximum bytes a number of bytes that can be stored for a number on disk. When calculating 4M and having a threshold of 2M at some point data will be stored on disk instead of memory, which has a huge performance impact. The memory block size has a relation with the memory threshold. Based on this value a newly added algorithm may be used in multiplications which improves performance when disk data storage is used, but it may have a negative effect on in memory calculations.
The I/O block size is the number of bytes read from disk to memory when performing calculations. It's also used for some internal copy operations.

So here it is:

**YoupY** · 04-06-2008, 02:50 PM

@64dragon

I couldn't reproduce the situation you described. M-Pi en super-pi are not sharing any resources, it's using it's own codebase alone. The error in super-pi could be explained by an instable system. It can very well be M-Pi also had calculation errors, however improved error handling will come in the next release...

@NetburstXE

Because M-Pi is based on a open source library I will release the modifications to that source code together with the final release. I'm not sure exactly what you'd like to modify but since I am a bit of a perfectionist I'd like to clean up the code before it's released to anyone. If you'd like to know about techniques used please PM me again and I'll be happy to share that information.

**Eppikk** · 04-13-2008, 09:23 AM

looking sweet man, nice job.

Btw, i always wanted to do this: can ya make it be able to calculate + digits than the Spi record, like 8 billion digits

That would be a heck of a stability test, just leave it on for a day or two and w8

**YoupY** · 04-13-2008, 02:12 PM

Thanx,

I'll see what I can do about entering a number of digits of choice for the calculation. The program will accept anything thrown at it with an upper limit of 9223372036854775807 digits, however I'll have to fit into the interface somewhere.

Today I added a 1G option, build the abort function and expanded some selection options. Currently validation works for most of the options, but still have to generate hash codes for the 256M and up, which is taking a lot of time.

Probally next weekend beta version 1.3 will be released...

**CrazyNutz** · 04-14-2008, 08:28 AM

Beta 1.2 working nicely, however may I suggest options for 6144kb, and 12288kb in the L2 cache size menu. Great Work

**KTE** · 04-17-2008, 09:19 AM

Thanks for the bench

Great core and MEM usage though, gives much more reliable results across CPUs.
Last Sqrt only loads 3/4 of core max though and one problem occurs in respect of:- it's inconsistent with 0.2-1.1 second variations if repeated at the same settings

Just tried a quick run on Q6600 and 9850BE

**CrazyNutz** · 04-17-2008, 11:50 AM

Originally Posted by KTE

Thanks for the bench

Great core and MEM usage though, gives much more reliable results across CPUs.
Last Sqrt only loads 3/4 of core max though and one problem occurs in respect of:- it's inconsistent with 0.2-1.1 second variations if repeated at the same settings

Just tried a quick run on Q6600 and 9850BE

I have that inconsisent time problem with Super PI with same settings,
however never > a whole second.
Could be background task robbing ticks from one or more cores at different
times.

**YoupY** · 04-18-2008, 03:24 AM

Version 1.3

Changed in this version:

Result validation (up to 128M)
Possibility to abort a run.
Improvements to the error handling mechanisms
Added a 1GB calculation option (Why?, because I can, and the number is kinda magical...)
Added L2 Cache options at CrazyNutz's request
Added additional Memory threshold options in order to be able perform a memory only calculation (You'll need over 20 GB of memory for that...)
Added Memory block sizes as well for same reason
Changed the output of the program (looks a bit like PiFast now...)

I have changed my mind about adding a possibility to let you select the number of digits to calculate. I would like to see the feature as well, however I am not able to make any kind of validation for the calculation. Currently the validation is based on a hash code algorithm. I one digit isn't correct the computation has failed. Since it's a 64-bit code there's only a very small chance you'll get the right hash with a faulty computation.

Unfortunately I discovered a bug while generating hash codes for the large numbers. It seems computations from 128M might run into a situation where the program crashes. Haven't been able to find out exactly what is causing this.

Plans for version 1.4 beta

Finding and fixing the large computations error
Improvements to the algorithms (avoid RTTI in several structures)
Reducing memory leaks if possible (these might be related to inconsistent times)
Calculate the hashes for 256M, 512M and 1 GB

KTE:

Can you easily reproduce this time difference? I am a bit worried about it because you mention over 1 second differences. That is way to much for me to find acceptable. If you can please let me know. The inverse SQRT using about 75% cpu is normal behavior, I cannot have it utilize more cpu because of the way the algorithm is put together

**KTE** · 04-18-2008, 04:09 AM

CrazyNutz true, I get the same since its inception. With Spi, you have to try and optimize fully for the quickest time you can manage. Only then does it become reproducible within 0.0xs of measurement degree[for 1M], i.e. most P35 MBs with C2 can break under 14s 1M at 3600MHz but will you achieve that on every run? No way, even if you try and control as much as possible.

YoupY your bench is far more consistent than Spi is, like wPrime, but I'm not sure why the variation exists. I'll test 1.3 just now and see if it differs - variables are as contolled as I can have them without switching all basic Windows services off.

**Spectrobozo** · 04-18-2008, 04:32 AM

**KTE** · 04-18-2008, 06:03 AM

New vesion has less variance for some reason.
I've ran a quick test - 5 runs, one after another, all equal, no net/FW/AV/extra running, same everything - the variance is not much now, only slight ->

Run1:
5.422 - Compute T
1.000 - Inverse T
6.422 - Total T

Run2:
5.391 - Compute T
0.984 - Inverse T
6.375 - Total T

Run3:
5.359 - Compute T
1.000 - Inverse T
6.359 - Total T

Run4:
5.406 - Compute T
1.000 - Inverse T
6.406 - Total T

Run5:
5.343 - Compute T
1.000 - Inverse T
6.343 - Total T

Averages:
5.3842s - Compute T
0.9968s - Inverse T
6.381s - Total T

Max Variance:
±0.079s - Compute T
±0.016s - Inverse T
±0.079s - Total T

Looks like you need to empty your MEM/Cache and run it for more accurate results - if I run it after using PC for a long time, Std starts to increase by much.

**YoupY** · 05-28-2010, 11:50 AM

If anybody is interested in taking over the source code (VC++) and develop this into a final release please let me know...

Thread: A brand new PI benchmark

Thread Tools

Search Thread

Rate This Thread

Display

A brand new PI benchmark

Bookmarks

Bookmarks

Posting Permissions