Results 1 to 25 of 25

Thread: A brand new PI benchmark

  1. #1
    Xtreme Member
    Join Date
    Jul 2003
    Location
    The Netherlands
    Posts
    389

    A brand new PI benchmark

    In the past I've allways favoured pi benchmark. Over the past years unfortunatly I haven't found the time (and space) I needed to really get going with it. Since last summer I've picked up my favorite past time activity again and started to bench with my new dual core cpu and cascade. However I didn't really liked calculating pi as I did before, mainly because the calculations were only using 50% of the resources available, and the other core was doing nothing at all

    After playing around a bit with S-Pi my cascade sprung a small leak making it impossible to create a nice score. So it went out for maintenance. The leak is going to be fixed and an additional stage is going to be added, leaving me with some free time. Since that time quad cores have been released and 6 and 8 core cpu's are planned for later this year, early next year. Because I have some programming skilzz I decided to spend my time to see if there is a way to create a true multithreaded Pi benchmark application. Somewhere on the internet I found some nice algorithms I used in this app.

    It took me some time, but finally I have something ready, though it is still in a beta phase. Here it is....





    About the settings:

    The algorithms are designed to optimize the calculations based on your system configuration. Therefore I've thrown in a few options to optimize the result of the calculation, giving you some more options to tweak the result.

    The calulation thread are the number of threads used for the actual calulation. Besides this number of threads two more threads are created controlling the interface and the calculation, but those aren't doing a lot during the calculation.

    The cache options are meant to optimeze L1 en L2 cache hits increasing the calculations performance. Setting them to low or high will have a negative effect on the benchmark result.

    The block size options are fixed at 64M at the moment. Some work still has to be done in oder to get these options really going well. Like the cache options the blocksize options also influence the decisions for algorithms to be used only in a different part of the calculation

    Memory Threshold is the maximum value for memory to be used for parts of the calculation. Going above this value means disk storage is going to be used for calculations. While this allows for calculating a huge number of digits for pi, you don't it to happing during the not so huge pi calculations.

    Not available yet in this beta:


    • Use of disk storage for the really large calculations, or systems with "only" 512MB of RAM
    • Calculations optimized with block sizes.
    • Validation of the calculated result. Is the number calculated really Pi?
    • And also very important; the generation of the validation code.


    The validation code will only be implemented in the final release

    Feedback

    I hope you all like this new program, but any feedback is welcome. At the moment I am still working on it and if you really like some extra feature let me know and I'll see what I can do. Also please report crashes in the program, instabilities or unexpected results.

    I am hoping to see some nice screenshots now

    Have fun!

    Beta 1.3 is available, see a few posts below for the changes...
    Attached Files Attached Files
    Last edited by YoupY; 04-18-2008 at 03:26 AM. Reason: new version

  2. #2
    Registered User
    Join Date
    Aug 2007
    Location
    Latvia
    Posts
    87
    512M!
    I void warranties...

  3. #3
    Xtreme Legend
    Join Date
    Mar 2005
    Location
    Australia
    Posts
    17,242
    will try it sometime
    Team.AU
    Got tube?
    GIGABYTE Australia
    Need a GIGABYTE bios or support?



  4. #4
    Xtreme Addict
    Join Date
    Sep 2006
    Posts
    1,038
    Seems to work just fine.

    4 Threads


    1 Threads
    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ Intel i7 3770k
    ░░░░░░▄▄▄▄▀▀▀▀▀▀▀▀▄▄▄▄▄▄░░░░░░░░░ ASUS GTX680
    ░░░░░░█░░░░▒▒▒▒▒▒▒▒▒▒▒▒░░▀▀▄░░░░░ ASUS Maximun V Gene
    ░░░░░█░░░▒▒▒▒▒▒░░░░░░░░▒▒▒░░█░░░░ Mushkin 8GB Blackline
    ░░░░█░░░░░░▄██▀▄▄░░░░░▄▄▄░░░█░░░░ Crucial M4 256GB
    ░░░▀▒▄▄▄▒░█▀▀▀▀▄▄█░░░██▄▄█░░░█░░░ Hitachi Deskstar 2TB x2
    ░░█▒█▒▄░▀▄▄▄▀░░░░░░░░█░░░▒▒▒▒▒█░░ FSP 750W Gold
    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ Fractal Arc Mini

  5. #5
    Xtreme Enthusiast
    Join Date
    Feb 2006
    Location
    The Netherlands - Rotterdam
    Posts
    583
    So finally, its beta.
    I will continue testing your benchmark soon youp.

  6. #6
    Xtreme Addict
    Join Date
    May 2005
    Location
    [EU] Latvia, Jelgava
    Posts
    1,689
    Quote Originally Posted by Krelin View Post
    512M!
    So what? On fast quad it would take what? 30-35 minutes?

  7. #7
    Xtreme Enthusiast
    Join Date
    Feb 2006
    Location
    The Netherlands - Rotterdam
    Posts
    583
    I'v tried your bench on my york today Youp.
    I'll love it.



    @ Kasparz. tried 512M @ 5300mhz, but after 2 liters of LN2, It was at about 20%....

  8. #8
    Registered User
    Join Date
    Aug 2007
    Posts
    2
    Nice! I've been waitng for somthing like this. Now my q6600 @ 3.2 Should beat up those e8400 @ 4.0+.

  9. #9
    Registered User
    Join Date
    Feb 2008
    Location
    Nor Cal
    Posts
    38
    My SuperPi 1M score of 5.547 seconds.

    Couple of different rigs for many different purposes.

  10. #10
    Xtreme Member
    Join Date
    Jul 2003
    Location
    The Netherlands
    Posts
    389
    Thanx for the results so far guys. Great to see you like the program.

    I am still very interested in crashes though, because these need to be gone before making a final release. At the moment I'm still working on version 1.2 which allows using of the other options, and should be capable to run the 512M.

    The version after that will probably contains some bug fixes and validation of the benchmark results

  11. #11
    Xtreme Cruncher
    Join Date
    Sep 2007
    Location
    PA, USA
    Posts
    1,504
    Extera, thats one hell of an OC

    Youp, i'm not getting consistent results between m-pi and super pi. i ran m-pi first and when i went to run super pi it took me a few tries for it to finish. in super pi i got a not exact in round error. not saying that a super pi error is related to your program but thought i'd mention. OC'ed to 2.6 i cant get lower than 31sec on super pi but i got 31 in m-pi at 2.2


    XS WCG Rules: #1: don't pull fart_plume's finger #2: Dave aka Movieman, don't give him your phone number if you like your hearing
    XS WCG Note: There are 2 sets of points, WCG and Boinc. WCG = 7x Boinc

    Project: Dark Matter (<- link) - Asus Maximus II Formula, Intel X3330 3.4ghz @1.32v under load, corsair ddr2 1066 8gigs, evga gtx260 core 216, pc p&c 750W, EK Supreme HF Nickel, iandh 175 res, Swiftech MCP355, Black Ice GTX G2 240, Lian Li v1200b

    silverstone tj07 build log


  12. #12
    Xtreme Cruncher
    Join Date
    Feb 2008
    Location
    C:\WINDOWS\system32\
    Posts
    1,451
    @YoupY - Can you PM me the source code for M-PI? I'm an amateur programmer, and I would like to play around with it and try changing some things.

    Thanks

  13. #13
    Xtreme Member
    Join Date
    Mar 2008
    Posts
    278
    My bench on E3110@4.05ghz or so:

  14. #14
    Xtreme Member
    Join Date
    Jul 2003
    Location
    The Netherlands
    Posts
    389
    Version 1.2 is ready for beta testing.

    New in this version:
    • Extended computation thread selection
    • Possibility to use disk in calculations
    • Additions to the calculation algorithms
    • Transparent Pi logo


    What's comming next
    • Performance improvements in disk data storage
    • Improved error handling
    • Less memory leaks
    • Possibility to abort a run
    • Checksum validation of calculated number
    • anything YOU request


    With the disk data mode introduced in this version and additions to the algoritms it should be possible now to calculate the huge numbers. Before you might have run out of memory for the calculation. In order to get the disk storage to work I had to make changes to the original code creating some overhead. Therefore this version might be a bit slower then it's predecessor

    Memory consumption is the main reason data has to be stored on the disk at one point. With more digits to calculate more memory is required to store the data. The amount of data increased with the amount of threads because of parallel nature of the application.

    All the configuration settings should be working now. The bottom three options have a relation with the disk data storage. The memory threshold is the maximum bytes a number of bytes that can be stored for a number on disk. When calculating 4M and having a threshold of 2M at some point data will be stored on disk instead of memory, which has a huge performance impact. The memory block size has a relation with the memory threshold. Based on this value a newly added algorithm may be used in multiplications which improves performance when disk data storage is used, but it may have a negative effect on in memory calculations.
    The I/O block size is the number of bytes read from disk to memory when performing calculations. It's also used for some internal copy operations.

    So here it is:
    Attached Files Attached Files
    Last edited by YoupY; 04-06-2008 at 02:53 PM.

  15. #15
    Xtreme Member
    Join Date
    Jul 2003
    Location
    The Netherlands
    Posts
    389
    @64dragon

    I couldn't reproduce the situation you described. M-Pi en super-pi are not sharing any resources, it's using it's own codebase alone. The error in super-pi could be explained by an instable system. It can very well be M-Pi also had calculation errors, however improved error handling will come in the next release...


    @NetburstXE

    Because M-Pi is based on a open source library I will release the modifications to that source code together with the final release. I'm not sure exactly what you'd like to modify but since I am a bit of a perfectionist I'd like to clean up the code before it's released to anyone. If you'd like to know about techniques used please PM me again and I'll be happy to share that information.

  16. #16
    Registered User
    Join Date
    Apr 2007
    Posts
    90
    looking sweet man, nice job.

    Btw, i always wanted to do this: can ya make it be able to calculate + digits than the Spi record, like 8 billion digits

    That would be a heck of a stability test, just leave it on for a day or two and w8

  17. #17
    Xtreme Member
    Join Date
    Jul 2003
    Location
    The Netherlands
    Posts
    389
    Thanx,

    I'll see what I can do about entering a number of digits of choice for the calculation. The program will accept anything thrown at it with an upper limit of 9223372036854775807 digits, however I'll have to fit into the interface somewhere.

    Today I added a 1G option, build the abort function and expanded some selection options. Currently validation works for most of the options, but still have to generate hash codes for the 256M and up, which is taking a lot of time.

    Probally next weekend beta version 1.3 will be released...

  18. #18
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Earth
    Posts
    1,787
    Beta 1.2 working nicely, however may I suggest options for 6144kb, and 12288kb in the L2 cache size menu. Great Work
    Sandy Bridge 2500k @ 4.5ghz 1.28v | MSI p67a-gd65 B3 Mobo | Samsung ddr3 8gb |
    Swiftech apogee drive II | Coolgate 120| GTX660ti w/heat killer gpu x| Seasonic x650 PSU

    QX9650 @ 4ghz | P5K-E/WIFI-AP Mobo | Hyperx ddr2 1066 4gb | EVGA GTX560ti 448 core FTW @ 900mhz | OCZ 700w Modular PSU |
    DD MC-TDX CPU block | DD Maze5 GPU block | Black Ice Xtreme II 240 Rad | Laing D5 Pump

  19. #19
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    Thanks for the bench
    Great core and MEM usage though, gives much more reliable results across CPUs.
    Last Sqrt only loads 3/4 of core max though and one problem occurs in respect of:- it's inconsistent with 0.2-1.1 second variations if repeated at the same settings

    Just tried a quick run on Q6600 and 9850BE


  20. #20
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Earth
    Posts
    1,787
    Quote Originally Posted by KTE View Post
    Thanks for the bench
    Great core and MEM usage though, gives much more reliable results across CPUs.
    Last Sqrt only loads 3/4 of core max though and one problem occurs in respect of:- it's inconsistent with 0.2-1.1 second variations if repeated at the same settings

    Just tried a quick run on Q6600 and 9850BE

    I have that inconsisent time problem with Super PI with same settings,
    however never > a whole second.
    Could be background task robbing ticks from one or more cores at different
    times.
    Sandy Bridge 2500k @ 4.5ghz 1.28v | MSI p67a-gd65 B3 Mobo | Samsung ddr3 8gb |
    Swiftech apogee drive II | Coolgate 120| GTX660ti w/heat killer gpu x| Seasonic x650 PSU

    QX9650 @ 4ghz | P5K-E/WIFI-AP Mobo | Hyperx ddr2 1066 4gb | EVGA GTX560ti 448 core FTW @ 900mhz | OCZ 700w Modular PSU |
    DD MC-TDX CPU block | DD Maze5 GPU block | Black Ice Xtreme II 240 Rad | Laing D5 Pump

  21. #21
    Xtreme Member
    Join Date
    Jul 2003
    Location
    The Netherlands
    Posts
    389
    Version 1.3

    Changed in this version:

    • Result validation (up to 128M)
    • Possibility to abort a run.
    • Improvements to the error handling mechanisms
    • Added a 1GB calculation option (Why?, because I can, and the number is kinda magical...)
    • Added L2 Cache options at CrazyNutz's request
    • Added additional Memory threshold options in order to be able perform a memory only calculation (You'll need over 20 GB of memory for that...)
    • Added Memory block sizes as well for same reason
    • Changed the output of the program (looks a bit like PiFast now...)



    I have changed my mind about adding a possibility to let you select the number of digits to calculate. I would like to see the feature as well, however I am not able to make any kind of validation for the calculation. Currently the validation is based on a hash code algorithm. I one digit isn't correct the computation has failed. Since it's a 64-bit code there's only a very small chance you'll get the right hash with a faulty computation.

    Unfortunately I discovered a bug while generating hash codes for the large numbers. It seems computations from 128M might run into a situation where the program crashes. Haven't been able to find out exactly what is causing this.

    Plans for version 1.4 beta

    • Finding and fixing the large computations error
    • Improvements to the algorithms (avoid RTTI in several structures)
    • Reducing memory leaks if possible (these might be related to inconsistent times)
    • Calculate the hashes for 256M, 512M and 1 GB




    KTE:

    Can you easily reproduce this time difference? I am a bit worried about it because you mention over 1 second differences. That is way to much for me to find acceptable. If you can please let me know. The inverse SQRT using about 75&#37; cpu is normal behavior, I cannot have it utilize more cpu because of the way the algorithm is put together
    Attached Files Attached Files
    Last edited by YoupY; 04-18-2008 at 03:39 AM.

  22. #22
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    CrazyNutz true, I get the same since its inception. With Spi, you have to try and optimize fully for the quickest time you can manage. Only then does it become reproducible within 0.0xs of measurement degree[for 1M], i.e. most P35 MBs with C2 can break under 14s 1M at 3600MHz but will you achieve that on every run? No way, even if you try and control as much as possible.

    YoupY your bench is far more consistent than Spi is, like wPrime, but I'm not sure why the variation exists. I'll test 1.3 just now and see if it differs - variables are as contolled as I can have them without switching all basic Windows services off.

  23. #23
    Xtreme Member
    Join Date
    Jan 2004
    Posts
    393

  24. #24
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    New vesion has less variance for some reason.
    I've ran a quick test - 5 runs, one after another, all equal, no net/FW/AV/extra running, same everything - the variance is not much now, only slight ->



    Run1:
    5.422 - Compute T
    1.000 - Inverse T
    6.422 - Total T

    Run2:
    5.391 - Compute T
    0.984 - Inverse T
    6.375 - Total T

    Run3:
    5.359 - Compute T
    1.000 - Inverse T
    6.359 - Total T

    Run4:
    5.406 - Compute T
    1.000 - Inverse T
    6.406 - Total T

    Run5:
    5.343 - Compute T
    1.000 - Inverse T
    6.343 - Total T

    Averages:
    5.3842s - Compute T
    0.9968s - Inverse T
    6.381s - Total T

    Max Variance:
    &#177;0.079s - Compute T
    &#177;0.016s - Inverse T
    &#177;0.079s - Total T

    Looks like you need to empty your MEM/Cache and run it for more accurate results - if I run it after using PC for a long time, Std starts to increase by much.

  25. #25
    Xtreme Member
    Join Date
    Jul 2003
    Location
    The Netherlands
    Posts
    389
    If anybody is interested in taking over the source code (VC++) and develop this into a final release please let me know...

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •