MMM
Page 9 of 33 FirstFirst ... 678910111219 ... LastLast
Results 201 to 225 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #201
    Xtreme Addict
    Join Date
    Jun 2007
    Posts
    1,442
    just found the spi part...again ~18-19C ambient, 1M and 32M.

    1M


    32M


    You said to post error messages, here is one, booted at 4.76, used set fsb to go to 4.85, ran spi 1m 3x in a row, same error message each time.

  2. #202
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by rge View Post
    just found the spi part...again ~18-19C ambient, 1M and 32M.

    1M

    32M

    You said to post error messages, here is one, booted at 4.76, used set fsb to go to 4.85, ran spi 1m 3x in a row, same error message each time.

    Right, it's been a while since I mentioned that the program doesn't like it when the FSB or bclk is messed with.
    It's a side-effect of the anti-cheat protection.

    Part of the protection is specifically targeted to help guard against the "time-slowing" cheats in this thread:

    http://www.xtremesystems.org/forums/...ad.php?t=46926

    Those cheats are obviously a highly guarded secret, but one of the mods a while ago was gracious enough show me one so that I can "try" to counter it.


    So no, it's not a hardware error. If you tried booting at 4.85, it should be fine. I do realize this is more than a minor inconvenience, but I have no ideas on an alternate approach.
    And I AM curious as to how SuperPi XS Mod 1.5 is immune to the cheat.

    For the next version, I guess I should change the error message for this particular check to

    "Abnormal Frequency Measurement - Unable to Validate Benchmark
    Note that modifying your Bus Speed is known to cause this error.
    If you used SetFSB or a similar tool to get to this frequency, try rebooting at this speed."


    EDIT: And nice records btw
    And new world speed record for 1M among any computer. (using a publicly available program*)

    *My latest build caps the % status printing to once per second, so less time is spent printing.... lol
    So I actually have a slightly faster time than that @ only 4.2 GHz (0.326654 secs)
    Last edited by poke349; 09-07-2009 at 11:09 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  3. #203
    Xtreme Member
    Join Date
    Aug 2008
    Location
    In a Digital World...
    Posts
    123
    Previous score
    250,000,000 digits:
    263.165 - v0.4.1 x64 SSE3 - El Greco - AMD Phenom II X3 720 @ 3.5 GHz
    New one with unlocked core @ default
    234.131 ...

    "friends don't let friends run RAID-0"

  4. #204
    Xtreme Member
    Join Date
    Aug 2008
    Location
    In a Digital World...
    Posts
    123
    and another one @ 100m
    96.384 - v0.4.1 x64 SSE3 - El Greco - AMD Phenom II X3 720 @ 3.5 GHz
    now 83.5
    "friends don't let friends run RAID-0"

  5. #205
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Just a little heads up for v0.4.3

    Automatic version detection.



    As far as optimizations go...

    I mentioned earlier in the summer that I had been playing with the Intel Compiler. (since enough people had suggested that I make the switch)
    It took a bit of tweaking, but I finally got the Intel Compiler to produce a worthwhile speedup over Visual Studio for all SSE versions.

    So thanks for that suggestion everybody.

    I'll probably be complementing that with a couple of algorithmic improvements sometime in the next few weeks.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #206
    Registered User
    Join Date
    Sep 2009
    Location
    Poland, Golub-Dobrzyń
    Posts
    4
    Version v0.4.2.
    25,000,000 digits after OC to 3.7Ghz


  7. #207
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by dranzi666 View Post
    Version v0.4.2.
    25,000,000 digits after OC to 3.7Ghz
    Nice, haven't seen too many dual cores lately...

    I suppose most of the Pi benchers are still running C2Ds, but were scared away by all the quad-core and multi-socket ownage in this thread.


    Aside from that:

    Now about those SetFSB and validation errors.

    I've noticed that some people were mistakening sanity errors for hardware errors.
    So to clear it up, Sanity Check Errors are NOT computation errors. They pop up when the program detects some abnormal situations such as clock-tampering, system speed tampering...

    The (unwanted) side-effect is that SetFSB and similar tools will also trigger the error.
    Different versions of the program have different levels of sensitivity. (I've been re-tuning it between versions to find a good balance between being able to catch cheats vs. minimizing false-positives.)

    I've changed the error message in v0.4.3 to hopefully clear up some of the confusion... since few people will read this post after it gets buried.

    Boot up Frequency: 167 MHz bclk


    TurboV to: 168 MHz bclk



    *More changes to come... More optimizations...

    Not like anyone cares about x86 anymore (I don't either), but I've re-tuned the multi-threading settings in the x86 binaries and now they put a much better load on the cores.
    So x86 is a lot faster now. (almost as fast as x64 in some cases)

    If you haven't already noticed from the screenies, there's support for SSE4.1.
    (I know... I'm not being fair to AMD... But it was a little too tempting. Bulldozer will solve that. )

    I've also added checksums to batch benchmarks
    Last edited by poke349; 09-23-2009 at 10:43 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  8. #208
    Registered User
    Join Date
    Sep 2009
    Location
    Poland, Golub-Dobrzyń
    Posts
    4
    I will add more PI benchmarks but first I need to buy new MobO... Bios my evga nforce 650i ultra board died after this OC.
    I will add new results soon

    Sorry for my english.

  9. #209
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by dranzi666 View Post
    I will add more PI benchmarks but first I need to buy new MobO... Bios my evga nforce 650i ultra board died after this OC.
    I will add new results soon

    Sorry for my english.

    Sorry hear that. Is that the second mobo that has fried on this thread?

    The program can be more stressful to the system than prime95 so I hope that everyone will keep that in mind when pushing the limit with this program.

    *And your english is fine. I didn't notice anything until you mentioned it.



    Now for some good news...

    As of Friday afternoon, I finished the last of the optimizations that I had planned for v0.4.3 and now I've locked it down for beta-testing...

    Spent much of yesterday night and this morning running benchmarks on all the computers I had easy access to.
    And I've updated the first post with the results...


    Several things:

    First, I want to make it clear that I'm not being fair (nor am I trying to be fair) to AMD by supporting SSE4.1. I optimized for whatever machines I had access to.

    Second... The speedups in this version are rather astounding - especially for x86 and Core i7.

    Single-threaded x86 (no SSE) is now comparable to PiFast 4.3.
    Throw in SSE3 and it beats PiFast 4.3.
    The specially optimized binary for Core i7 is amazingly fast compared to v0.4.2 x64 SSE3...
    Just look at the results on the first post.
    This took me completely by surprised when I finally ran the numbers.

    So it's safe to say that all results obtained with v0.4.3 are NOT comparable to v0.4.2 and earlier...

    Third... Batch mode now has checksums that are "fully compatible" with the regular benchmark checksums. When v0.4.3 is released, feel free to enter results from either normal or batch modes.
    And for that matter, batch mode also gives you thread-control so you can reduce the # of threads for those smaller benchmarks where too many threads will hurt.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  10. #210
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    sweet. so the core i7 gets special treatment.

    now you just need to implement this through openCL.

  11. #211
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Chumbucket843 View Post
    sweet. so the core i7 gets special treatment.

    now you just need to implement this through openCL.

    Not just Core i7.

    There will be 5 binaries in this version.

    x86
    x86 SSE3
    x64 SSE3
    x64 SSE4.1 ~ Ushio (tuned for my LanBox)
    x64 SSE4.1 ~ Nagisa (tuned for my workstation)

    So two SSE4.1 versions tuned for Core i7 and Harpertown.

    x64 SSE3 has been re-tuned for a smaller cache than the previous versions.
    This will help out AMD chips and any non-12MB cache Core 2 Quads.

    Prior to v0.4.3, x64 and x64 SSE3 were both tuned for 12MB cache... (for my workstation)...
    Turns out that this was the culprit that was hurting virtually all other processors including all AMD - since nothing else had 3MB cache/thread... And not surprisingly, it hurt i7 the most.

    EDIT 1: The two SSE4.1 versions are fully compatible with each other and should theoretically run on Bulldozer as well. The only difference between them is the tuning.

    EDIT 2:
    The speedup via SSE4.1 is very small (a fraction of a %). So non-12MB Yorkfields will use x64 SSE3 instead of x64 SSE4.1 ~ Nagisa because of the more favorable tuning.
    And it is possible to override whatever the auto-selector chooses.
    Last edited by poke349; 09-26-2009 at 04:36 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  12. #212
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Here's a few screenies of v0.4.3 on my three fastest machines: (click to enlarge)




    And here's something VERY interesting that I totally wasn't expecting to see...

    Temperatures!!!

    prime95 - small FFTs:
    69/63/67/64

    IntelBurnTest - 4096MB + 8 threads:
    68/62/66/63

    y-cruncher v0.4.3 - x64 SSE4.1 stress test-7GB
    70/65/69/65



    I don't recall v0.4.2 running hotter than prime95...
    Did those optimizations really make things a bit more toasty?

    Screenies here: (click to enlarge)


    Temps on Core 2 were evenly matched between prime95 and y-cruncher. LinPack still owns @ several degrees hotter.


    Anyways... I promise that v0.4.3 won't be beta-testing for 2 months like v0.4.1 was...
    Assuming no problems, I might release it in a week.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  13. #213
    Xtreme Member
    Join Date
    Dec 2008
    Posts
    177
    Is it possible to get speed-up from running this on a GPU? It seems that this algorithm lends itself well to being parallelized efficiently and if each computation is independent from one another then it will work well on a GPU. But it seems this program requires a large amount of memory and I don't know if the GPU can access CPU memory.

    Have you looked into this at all?

  14. #214
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Firestrider View Post
    Is it possible to get speed-up from running this on a GPU? It seems that this algorithm lends itself well to being parallelized efficiently and if each computation is independent from one another then it will work well on a GPU. But it seems this program requires a large amount of memory and I don't know if the GPU can access CPU memory.

    Have you looked into this at all?

    Oh hey! If I'm not mistaken, you were one of the first to post a benchmark. (but lost due to the rollback )

    It seems that this algorithm lends itself well to being parallelized efficiently and if each computation is independent from one another then it will work well on a GPU
    This is partly true. Although the algorithm is far from perfectly parallelizable, it should (in theory) be good enough to scale well into hundreds of cores.

    But, there is one big problem:
    GPUs have very poor support for double-precision floating-point - which is what this program relies on. A GPU crunching DP-FP isn't much better than a CPU. There is no way to efficiently use single-precision FP instead.

    But it seems this program requires a large amount of memory
    Very True.
    The larger the computation, the better is scales.
    When you have many cores, the minimum computation size needed to achieve decent multi-core scaling is MASSIVE - which means... it'll need a LOT of memory.

    Also, the speed of the program is nearing the point where the minimum computation size that will make a "sufficiently long" benchmark is more than the total memory in the average computer.
    (Right now, you can't do a ram-only Pi benchmark that lasts more than 10 minutes on an i7 machine with 6GB of ram.)

    and I don't know if the GPU can access CPU memory
    It can, but it must go through the PCIe bus - a sure bottleneck.
    Bandwidth is already a problem on Core 2. A GPU would be faster computationally, but PCIe bandwidth will probably kill it.


    Have you looked into this at all?
    Pretty much the moment I got it to scale on my mom's Q6600. (back in December 2008...)
    But after a bit of research I decided that it isn't the time yet.

    The hardware isn't ready. (poor DP-FP, not enough memory, not enough bandwidth, no set programming standard)

    The algorithm isn't ready.
    After I got my Harpertown rig up and running with 8-cores and 64GB ram, I started to notice some weaknesses in the algorithm on large computations... Bad enough to prevent scaling into hundreds of threads.
    As of now, these problems have only been partially solved.

    And "I'm" not ready either...
    I have no experience with CUDA or OpenCL...
    Also, I'm still an undergrad... So wth do I know about parallel programing? lol
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  15. #215
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    214
    Hi poke this is as far as i've got with 1B and 1G digits.

    Vista HP64


    500,000,000 digits

    248.709 - v0.4.2 x64 SSE3 - cheapseats - Intel Core i7 920 @ 4.294 GHz (204.5x21) - 6 GB DDR3





    1,000,000,000 digits

    548.419 - v0.4.2 x64 SSE3 - cheapseats - Intel Core i7 920 @ 4.294 GHz (204.5x21) - 6 GB DDR3





    SuperPi-size 1G digits

    591.198 - v0.4.2 x64 SSE3 - cheapseats - Intel Core i7 920 @ 4.294 GHz (204.5x21) - 6 GB DDR3





    Code:
    y-cruncher v0.4.2 Build 7438            ( www.numberworld.org )
    Copyright 2008-2009 Alexander J. Yee   ( a-yee@northwestern.edu )
    
    Distribute Freely - Please Report any Bugs
    
    Version: x64 SSE3
    
    
      0         Benchmark Pi
      1         Validate a Pi Benchmark
      2         Batch Benchmark Pi  (run multiple benchmarks)
      3         Stress Test (beta)
    
      4         Custom Compute a Constant
                    -   Compute other constants (e, Golden Ratio, etc...)
                    -   Choose your own settings
    
      5         Digit Viewer        (view digits from .txt and .ycd files)
      6         Compare Digits      (compare digits from different runs)
    
      7         About
      8         A Word of Warning...
    
    Enter your choice:
    option = 0
    
    
    Benchmark Pi:
    
    Select a Benchmark Type:
    
      0     Single-Threaded
      1     Multi-Threaded
    
    option = 1
    
    
    Select a Benchmark Size:
    
    Option      Decimal Digits      Approx. Memory Needed
    
      1             25,000,000             117 MB
      2             50,000,000             253 MB
      3            100,000,000             458 MB
      4            250,000,000            1.19 GB
      5            500,000,000            2.39 GB
      6          1,000,000,000            4.79 GB
      7          2,500,000,000            11.5 GB
      8          5,000,000,000            23.0 GB
      9         10,000,000,000            46.0 GB
     10         25,000,000,000             116 GB
     11         50,000,000,000             250 GB
     12        100,000,000,000             467 GB
    
      0     I prefer SuperPi sizes... (1M, 2M, 4M...)
    
    
    option = 5
    
    Threads = 8
    Allocating and Reserving Memory...      2.39 GB
    Constructing FFT lookup tables...
    
    
    Compute: Pi
    
    Decimal Digits    :   500,000,000
    Hexadecimal Digits:   415,241,012
    
    Mode: Ram Only
    
    
    Begin Computation:
    
    Computing: Pi
    
    Algorithm: Chudnovsky Formula
    
    Summing Series:  35,256,838 terms
    Time:    183.983 seconds  ( 0.051 hours )
    InvSqrt...
    Time:    6.479 seconds  ( 0.002 hours )
    Final Multiply...
    Time:    3.589 seconds  ( 0.001 hours )
    
    Compute Pi Time: 194.061 seconds  ( 0.054 hours )
    
    Constructing Base Conversion Table:
    Time:    9.623 seconds  ( 0.003 hours )
    Base Converting (Primary Cutting Parameters):
    Time:    44.989 seconds  ( 0.012 hours )
    
    Writing Decimal Digits:   500,000,001  digits written
    
    
    Total Computation Time:  248.709 seconds  ( 0.069 hours )
    
    Total Time (including writing digits):  258.591 seconds  ( 0.072 hours )
    
    
    
    Benchmark Successful. The digits appear to be OK.
    
    Program Version:    0.4.2 Build 7438 (x64 SSE3)
    Processor(s):       Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    CPU Frequency:      4,089,823,801 Hz  (frequency may be inaccurate)
    Thread(s):          8
    Digits:             500,000,000
    Total Time:         248.709 seconds
    Checksum:           928615c626ff9b210085a5e9d5f7efbd
    
    
    
    Press any key to continue . . .
    
    
    --------------------------------------
    
    
    y-cruncher v0.4.2 Build 7438            ( www.numberworld.org )
    Copyright 2008-2009 Alexander J. Yee   ( a-yee@northwestern.edu )
    
    Distribute Freely - Please Report any Bugs
    
    Version: x64 SSE3
    
    
      0         Benchmark Pi
      1         Validate a Pi Benchmark
      2         Batch Benchmark Pi  (run multiple benchmarks)
      3         Stress Test (beta)
    
      4         Custom Compute a Constant
                    -   Compute other constants (e, Golden Ratio, etc...)
                    -   Choose your own settings
    
      5         Digit Viewer        (view digits from .txt and .ycd files)
      6         Compare Digits      (compare digits from different runs)
    
      7         About
      8         A Word of Warning...
    
    Enter your choice:
    option = 0
    
    
    Benchmark Pi:
    
    Select a Benchmark Type:
    
      0     Single-Threaded
      1     Multi-Threaded
    
    option = 1
    
    
    Select a Benchmark Size:
    
    Option      Decimal Digits      Approx. Memory Needed
    
      1             25,000,000             117 MB
      2             50,000,000             253 MB
      3            100,000,000             458 MB
      4            250,000,000            1.19 GB
      5            500,000,000            2.39 GB
      6          1,000,000,000            4.79 GB
      7          2,500,000,000            11.5 GB
      8          5,000,000,000            23.0 GB
      9         10,000,000,000            46.0 GB
     10         25,000,000,000             116 GB
     11         50,000,000,000             250 GB
     12        100,000,000,000             467 GB
    
      0     I prefer SuperPi sizes... (1M, 2M, 4M...)
    
    
    option = 6
    
    Threads = 8
    Allocating and Reserving Memory...      4.79 GB
    Constructing FFT lookup tables...
    
    
    Compute: Pi
    
    Decimal Digits    :   1,000,000,000
    Hexadecimal Digits:   830,482,024
    
    Mode: Ram Only
    
    
    Begin Computation:
    
    Computing: Pi
    
    Algorithm: Chudnovsky Formula
    
    Summing Series:  70,513,673 terms
    Time:    410.583 seconds  ( 0.114 hours )
    InvSqrt...
    Time:    13.291 seconds  ( 0.004 hours )
    Final Multiply...
    Time:    7.488 seconds  ( 0.002 hours )
    
    Compute Pi Time: 431.371 seconds  ( 0.120 hours )
    
    Constructing Base Conversion Table:
    Time:    19.610 seconds  ( 0.005 hours )
    Base Converting (Primary Cutting Parameters):
    Time:    97.374 seconds  ( 0.027 hours )
    
    Writing Decimal Digits:   1,000,000,001  digits written
    
    
    Total Computation Time:  548.419 seconds  ( 0.152 hours )
    
    Total Time (including writing digits):  568.351 seconds  ( 0.158 hours )
    
    
    
    Benchmark Successful. The digits appear to be OK.
    
    Program Version:    0.4.2 Build 7438 (x64 SSE3)
    Processor(s):       Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    CPU Frequency:      4,089,857,858 Hz  (frequency may be inaccurate)
    Thread(s):          8
    Digits:             1,000,000,000
    Total Time:         548.419 seconds
    Checksum:           de9afc254d422ce2181755d0eae3b9fd
    
    
    
    Press any key to continue . . .
    
    
    -----------------------------------------
    
    
    y-cruncher v0.4.2 Build 7438            ( www.numberworld.org )
    Copyright 2008-2009 Alexander J. Yee   ( a-yee@northwestern.edu )
    
    Distribute Freely - Please Report any Bugs
    
    Version: x64 SSE3
    
    
      0         Benchmark Pi
      1         Validate a Pi Benchmark
      2         Batch Benchmark Pi  (run multiple benchmarks)
      3         Stress Test (beta)
    
      4         Custom Compute a Constant
                    -   Compute other constants (e, Golden Ratio, etc...)
                    -   Choose your own settings
    
      5         Digit Viewer        (view digits from .txt and .ycd files)
      6         Compare Digits      (compare digits from different runs)
    
      7         About
      8         A Word of Warning...
    
    Enter your choice:
    option = 0
    
    
    Benchmark Pi:
    
    Select a Benchmark Type:
    
      0     Single-Threaded
      1     Multi-Threaded
    
    option = 1
    
    
    Select a Benchmark Size:
    
    Option      Decimal Digits      Approx. Memory Needed
    
      1             25,000,000             117 MB
      2             50,000,000             253 MB
      3            100,000,000             458 MB
      4            250,000,000            1.19 GB
      5            500,000,000            2.39 GB
      6          1,000,000,000            4.79 GB
      7          2,500,000,000            11.5 GB
      8          5,000,000,000            23.0 GB
      9         10,000,000,000            46.0 GB
     10         25,000,000,000             116 GB
     11         50,000,000,000             250 GB
     12        100,000,000,000             467 GB
    
      0     I prefer SuperPi sizes... (1M, 2M, 4M...)
    
    
    option = 0
    
    Option    Decimal Digits             Approx. Memory Needed
    
     20      1 M   -         1,048,576            10.8 MB
     21      2 M   -         2,097,152            13.8 MB
     22      4 M   -         4,194,304            22.5 MB
     23      8 M   -         8,388,608            44.1 MB
     24     16 M   -        16,777,216            84.3 MB
     25     32 M   -        33,554,432             160 MB
     26     64 M   -        67,108,864             332 MB
     27    128 M   -       134,217,728             631 MB
     28    256 M   -       268,435,456            1.29 GB
     29    512 M   -       536,870,912            2.46 GB
     30      1 G   -     1,073,741,824            5.21 GB
     31      2 G   -     2,147,483,648            9.83 GB
     32      4 G   -     4,294,967,296            20.9 GB
     33      8 G   -     8,589,934,592            39.3 GB
     34     16 G   -    17,179,869,184            84.1 GB
     35     32 G   -    34,359,738,368             157 GB
     36     64 G   -    68,719,476,736             338 GB
     37    128 G   -   137,438,953,472             629 GB
    
    
    option = 30
    
    Threads = 8
    Allocating and Reserving Memory...      5.21 GB
    Constructing FFT lookup tables...
    
    
    Compute: Pi
    
    Decimal Digits    :   1,073,741,824
    Hexadecimal Digits:   891,723,283
    
    Mode: Ram Only
    
    
    Begin Computation:
    
    Computing: Pi
    
    Algorithm: Chudnovsky Formula
    
    Summing Series:  75,713,479 terms
    Time:    444.407 seconds  ( 0.123 hours )
    InvSqrt...
    Time:    13.980 seconds  ( 0.004 hours )
    Final Multiply...
    Time:    7.747 seconds  ( 0.002 hours )
    
    Compute Pi Time: 466.143 seconds  ( 0.129 hours )
    
    Constructing Base Conversion Table:
    Time:    21.508 seconds  ( 0.006 hours )
    Base Converting (Primary Cutting Parameters):
    Time:    103.475 seconds  ( 0.029 hours )
    
    Writing Decimal Digits:   1,073,741,825  digits written
    
    
    Total Computation Time:  591.198 seconds  ( 0.164 hours )
    
    Total Time (including writing digits):  612.818 seconds  ( 0.170 hours )
    
    
    
    Benchmark Successful. The digits appear to be OK.
    
    Program Version:    0.4.2 Build 7438 (x64 SSE3)
    Processor(s):       Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    CPU Frequency:      4,089,825,417 Hz  (frequency may be inaccurate)
    Thread(s):          8
    Digits:             1,073,741,824
    Total Time:         591.198 seconds
    Checksum:           a86ee7dcafada01318a8a3ae166500f4
    
    
    
    Press any key to continue . . .

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  16. #216
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by cheapseats View Post
    Hi poke this is as far as i've got with 1B and 1G digits.

    Nice results... Always nice to see large runs.


    ----------------------------------------------------------------

    Anyways... Seeing as how I let a few bugs get through into v0.4.1 (even after 2 months of beta-testing...), I'll just make v0.4.3 a public beta.

    Version 0.4.3 is Out!!!

    Have fun with it. No more boring white output...

    Feel free to override whatever the launcher selects. (Just run the binaries directly.)
    Depending on your computation size, the version that the launcher chooses isn't always the best.

    For example:

    On Core i7, 1M is fastest using x64 SSE4.1 ~ Nagisa. (The launcher chooses x64 SSE4.1 ~ Ushio)

    On dual-Harpertown (and probably 12M Yorkfield as well), 25m is fastest using x64 SSE4.1 ~ Ushio. (The launcher chooses x64 SSE4.1 ~ Nagisa.)


    The binaries are tuned using LARGE computations. So the launcher should be able to nail the best binary for computations larger than 250m.
    But for small computations, anything goes. Take your pick from 5 binaries.
    Last edited by poke349; 09-29-2009 at 07:48 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  17. #217
    Registered User
    Join Date
    Sep 2009
    Location
    Poland, Golub-Dobrzyń
    Posts
    4

    today!

    Today it's nice day! I have evga nforce 780i sli premium and I going to OC to 4 GHZ my c2d xD!
    But this time I don't destroy the mobo ! xD

    Today nothing special...I scared destroy new mobo xD.

    Version v0.4.2.
    50,000,000 digits after OC to 3.7Ghz




    This time I don't risked xD
    Last edited by dranzi666; 10-02-2009 at 05:24 AM. Reason: image link dead

  18. #218

  19. #219
    Registered User
    Join Date
    Sep 2009
    Location
    Poland, Golub-Dobrzyń
    Posts
    4
    I found magic voltage xD and this time system after OC is stable and I could made more benchmarks.

    New results:

    &&


    Have fun ! xD











    xD
    Last edited by dranzi666; 10-02-2009 at 09:07 AM. Reason: update new program version benchmark xD
    CPU: Intel Core 2 Duo E6750 2.66 GHz 4 MB L2 (64nm)
    MoBo #1: EVGA nForce 650i Ultra (dead)
    MoBo #2: EVGA nForce 780i SLI Premium
    GPU: GeForce 8600 GTS 512 MB
    RAM: 2 x DDR2 1 GB PDP Patriot 1000 mhz CL 5
    HDD: Seagate 320 GB SATA II Raid Edition
    PSU: Amacrox Warrior 500W
    CPU Cooling: Pentagram HP-90
    MOUSE: Logitech G5

  20. #220
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    214
    tried to get the CPU details right this time poke


    Vista HP64


    500,000,000 digits

    202.881 - v0.4.3 x64 SSE4.1 - cheapseats - Intel Core i7 920 @ 4.09 GHz (4.294 GHz Turbo Boost) - 6 GB DDR3




    1,000,000,000 digits

    449.062 - v0.4.3 x64 SSE4.1 - cheapseats - Intel Core i7 920 @ 4.09 GHz (4.294 GHz Turbo Boost) - 6 GB DDR3




    Code:
    500Million
    
    Benchmark Successful. The digits appear to be OK.
    
    Version:        0.4.3 Build 7681 (x64 SSE4.1 ~ Ushio)
    Processor(s):   Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    CPU Frequency:  4,089,829,337 Hz  (frequency may be inaccurate)
    Thread(s):      8
    Digits:         500,000,000
    Total Time:     202.881 seconds
    Checksum:       46117a3f76fa532b12fe3c237edb8ef8
    
    
    -------------------------
    
    1Billion
    
    Benchmark Successful. The digits appear to be OK.
    
    Version:        0.4.3 Build 7681 (x64 SSE4.1 ~ Ushio)
    Processor(s):   Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    CPU Frequency:  4,089,825,762 Hz  (frequency may be inaccurate)
    Thread(s):      8
    Digits:         1,000,000,000
    Total Time:     449.062 seconds
    Checksum:       afa902f4e32e19a40fe2f257f0783327

    ----------------------------

    these are just for comparison


    1,000,000,000 digits

    201x21 / uncore 16x v 20x / memory x4 v x5


    x16 + x4 (465.566s)




    x20 + x4 (458.798s)




    x20 + x5 (457.367s)


    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  21. #221
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Nice results

    Interesting... so the program really loves uncore, but it doesn't respond much to memory speed. (For i7 at least, I'd expect Core 2 to be much more sensitive to memory speed.)

    I guess that's a good start as far as figuring out what tweaks the program responds to.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  22. #222
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    some Core i5, and Phenom II Results on the latest Version..

    Phenom II X2 550, Unlocked to Quad, Noctua Aircooling.

    3.71Ghz





    Core i5 750, Stock Cooling (could not complete 250m, too unstable sorry) NO TURBO MODE, the frequency reported is Accurate.



    New version is pretty quick!

    One thing I've noticed is a high level of inconsistancy between runs. The first run is typically slower than 2nd and 3rd. Then occasionally if you keep re-running a slower pass will appear again.

    So the above are the 2n'd run results, as they're typically as much as 0.5s quicker than first, and also ever so slightly quicker than 3rd or subsequent runs.

  23. #223
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by mAJORD View Post
    New version is pretty quick!
    Yeah, new compiler + a ton of optimizations...
    There's still plenty of places that can be improved, but I'm gonna call it quits until after grad-school apps are done.


    Quote Originally Posted by mAJORD View Post
    One thing I've noticed is a high level of inconsistancy between runs. The first run is typically slower than 2nd and 3rd. Then occasionally if you keep re-running a slower pass will appear again.
    I've definitely noticed it myself as well.
    The first runs are probably slower because the OS hasn't fully prepared the buffer and the memory/paging stuff for it. (and maybe hasn't buffered the entire binary yet)

    The inconsistencies should've been there since the first version. It's inherent because of the way the program creates and destroys threads.
    It's very inconsistent and sensitive to background programs.

    There are some thread-management/scheduling settings in windows that can be tweaked. Though I haven't played with them yet, it might be possible to get better consistency and some speedup by tweaking them.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  24. #224
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    And as soon as I release the new version...

    The guy in Japan has some new numbers to show for.
    So we know who's been doing some website camping lately.

    So now the question is: Does anyone have a pair of Gulftown samples?

    We need someone to trip him up a bit.


    2 x Intel Xeon W5590 @ 3.33 GHz (3.46 GHz Turbo Boost)
    72 GB (18 x 4 GB) DDR3 ram

    25m - 6.360
    50m - 11.885
    100m - 25.096
    250m - 68.309
    500m - 146.704
    1b - 321.974
    2.5b - 901.162
    5b - 1,968.124
    10b - 4,480.503
    25b - 14,431.975 (4 hours, 28 minutes - swap mode)
    50b - 112,256.531 (31 hours, 11 minutes - swap mode + pagefile thrashing)

    1M - 0.299
    2M - 0.600
    4M - 1.072
    8M - 2.199
    16M - 4.038
    32M - 7.718
    64M - 16.219
    128M - 34.016
    256M - 72.441
    512M - 156.233
    1G - 343.595
    2G - 756.990
    4G - 1,676.540
    8G - 3,916.248
    16G - 8,628.893 (2 hours, 24 minutes - ram only + a bit of pagefile thrashing)
    32G - 25,978.250 (7 hours, 13 minutes - swap mode)



    And the big one... With significant pagefile thrashing.

    Last edited by poke349; 10-09-2009 at 07:08 AM. Reason: typo
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  25. #225
    Xtreme Member
    Join Date
    Jan 2009
    Posts
    169
    Quote Originally Posted by poke349 View Post
    And as soon as I release the new version...

    The guy in Japan has some new numbers to show for.

    We need someone to trip him up a bit.

    2 x Intel Xeon W5590 @ 3.33 GHz (3.46 GHz Turbo Boost)
    72 GB (18 x 4 GB) DDR3 ram
    The guy in Japan? There can be only one.**

    Shigeru Kondo aka 先生 Pi / 戦艦 Pi

    **Ok, we all know there's another..

    Daisuke Takahashi aka 天皇 Pi

    XmX

Page 9 of 33 FirstFirst ... 678910111219 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •