Page 11 of 33 FirstFirst ... 89101112131421 ... LastLast
Results 251 to 275 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #251
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Kurumi View Post
    Ran the batch benches with the new 0.4.4. Here are my results:

    25000000 8.074 7573850b0bc23f8e1bee78aaa4354166
    50000000 17.922 864abf38181e874d028bf4272c443e26
    100000000 37.940 48a784fc35471ec408603601ec00df76
    250000000 107.450 647c8e7c845bbb4c1b732a2d71c57923
    500000000 234.795 1174e0882e3932251d65f55e6d8c44dc
    1000000000 519.458 677f930fa3dcd53514e3e5c8dee7ae59
    2500000000 2433.400 8f48e6e52fdfca981d82a0005220b7b7


    Guess there's no page-trashing this time.

    Cool

    btw, you need the rest of the stuff at the top of file to complete the batch validation. But I'll accept these anyway.

    (Not like I've been validating any benchmarks anyway... )


    EDIT:
    What frequency is this at? (Before and after turbo boost)
    And I'm assuming Core i7 920 and 12GB of ram?
    Last edited by poke349; 01-01-2010 at 11:18 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  2. #252
    Xtreme Mentor
    Join Date
    Sep 2007
    Location
    Ohio
    Posts
    2,977
    My Q6600 at a mild 3.6GHz: (96.5633% Multi-Core efficiency.)
    Last edited by Talonman; 01-01-2010 at 12:11 PM.
    Asus Maximus SE X38 / Lapped Q6600 G0 @ 3.8GHz (L726B397 stock VID=1.224) / 7 Ultimate x64 /EVGA GTX 295 C=650 S=1512 M=1188 (Graphics)/ EVGA GTX 280 C=756 S=1512 M=1296 (PhysX)/ G.SKILL 8GB (4 x 2GB) SDRAM DDR2 1000 (PC2 8000) / Gateway FPD2485W (1920 x 1200 res) / Toughpower 1,000-Watt modular PSU / SilverStone TJ-09 BW / (2) 150 GB Raptor's RAID-0 / (1) Western Digital Caviar 750 GB / LG GGC-H20L (CD, DVD, HD-DVD, and BlueRay Drive) / WaterKegIII Xtreme / D-TEK FuZion CPU, EVGA Hydro Copper 16 GPU, and EK NB S-MAX Acetal Waterblocks / Enzotech Forged Copper CNB-S1L (South Bridge heat sink)

  3. #253
    Xtreme Member
    Join Date
    May 2007
    Location
    Sweden
    Posts
    127
    Quote Originally Posted by poke349 View Post
    Hot application!!!
    But, why does the K10 CPU's run at SSE3 instructionset and not the supported SSE4? The Intel CPU's run on it's supported SSE4.1. Thanks for the program and an answer would been appreciated. Happy new year from Sweden also.
    Ivy Bridge 3770K @ ????MHz
    6c Intel Xeon X7460 24MB cache 16GB RAM 22TB HDD fileserver
    Dual Intel Xeon E5620 workstation
    SB 2600K @ 5016MHz 1.37v HT on AIR primestable
    AMD Athlon X3 425 @ B25 4GHz+ AIR
    AMD Athlon X2 6400+ @ 3811MHz AIR
    AMD Athlon X2 3600+ @ 3200MHz AIR
    AMD Athlon XP 1700+ @ 2714MHz AIR
    Thermalright Ultra-120 Extreme
    Corsair 8GB XMS3 2000MHz
    ATI Radeon HD5850 @ 1000MHz+/1200MHz+
    Windows 7 Enterprise x64
    Corsair HX750W

  4. #254
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by 2good4you View Post
    But, why does the K10 CPU's run at SSE3 instructionset and not the supported SSE4? The Intel CPU's run on it's supported SSE4.1. Thanks for the program and an answer would been appreciated. Happy new year from Sweden also.
    SSE4.1 and SSE4a are different instruction sets.

    Intel has SSE4.1.
    AMD has SSE4a.

    In my opinion, there's nothing in the SSE4a instruction set that is useful for this program.

    EDIT:
    And even it did, I don't have access to a K10 machine with enough ram to properly test it. (Since an SSE4a version won't run on my Xeon workstation.)

    EDIT 2:
    The AMD optimized (Kasumi) binary was also tested on my Xeon workstation for correctness. (along with all the other x64 binaries)
    For correctness testing, it doesn't matter that I'm using an Intel machine to test an AMD binary.
    Only the performance tuning had to be done on an AMD machine - which was a Phenom II X3 unlocked to 4 cores.

    And yes, Happy New Year!!!
    Last edited by poke349; 01-03-2010 at 02:14 AM. Reason: rephrasing
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  5. #255
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Just tried a run on a Core i7-720QM laptop...

    My recommendation... not a very good idea...

    Look at these temps: (click to enlarge)



    4-core load can do +1 turbo... No chance under these conditions...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #256
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    so apparently you have been beat, not in speed though.
    http://bellard.org/pi/pi2700e9/announce.html

  7. #257
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Chumbucket843 View Post
    so apparently you have been beat, not in speed though.
    http://bellard.org/pi/pi2700e9/announce.html
    I was never beat. Since I never had the Pi record.

    jk...

    *btw, I knew long before you posted this.

    This is actually a lot more impressive than the previous supercomputer record...
    Especially the fact that he did it using only 6 GB of ram.

    He made no mention of how fast it is for ram-only computations.
    Given that disk IO is a huge factor, there's no way to guesstimate how fast it is from the data he gives.

    I also can't comment on how fast his program for the disk-sizes since I haven't finished writing my own "Advanced Swap" mode yet.
    Though it sure is a hella lot faster than QuickPi 4.5.


    His pdf about the algorithms he used is pretty interesting...
    There's a number of them I've never even heard of... not a surprise given that he's a world-renowned mathematician. (unlike me... )



    So we can't say anything about speed until we can benchmark his program.

    Judging by his paper, I can tell he has taken optimization very far... a lot more than me.
    So I wouldn't be at all surprised if it is faster than y-cruncher.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  8. #258
    Registered User
    Join Date
    Jul 2008
    Location
    Frankfurt, Germany
    Posts
    5
    Here are my new results with a Dell Precision T7400 with 2x Intel Xeon X5450 Harpertown (Revision E0)












  9. #259
    Registered User
    Join Date
    Mar 2006
    Posts
    3
    i7-920 @ 4.0ghz, no turbo boost
    6gb OCZ Gold "low voltage" memory @ 1600mhz
    Asus P6X58D Premium
    Attached Images Attached Images

  10. #260
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by StevensDE View Post
    Here are my new results with a Dell Precision T7400 with 2x Intel Xeon X5450 Harpertown (Revision E0)
    Nice! A new machine?

    Quote Originally Posted by botld92z View Post
    i7-920 @ 4.0ghz, no turbo boost
    6gb OCZ Gold "low voltage" memory @ 1600mhz
    Asus P6X58D Premium
    Looks like you take the top spot for single-socket.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  11. #261
    Registered User
    Join Date
    Mar 2006
    Posts
    3
    Sweet. :-) I'll play more when I get home.

  12. #262
    Registered User
    Join Date
    Jul 2008
    Location
    Frankfurt, Germany
    Posts
    5
    Quote Originally Posted by poke349 View Post
    Nice! A new machine?
    Yes, my old precision t7400 with 2x E5420 was broken-down and Dell changed it to a brand new precision t7400 with 2x X5450

  13. #263
    Xtreme Addict
    Join Date
    Jun 2007
    Posts
    1,442
    4.83ghz with 5C ambients/water cooled with new version, (6gb ram 7,7,7, 21, turbo off)

    spi1M


    spi32M


    25K


    50K
    Last edited by rge; 01-08-2010 at 05:50 PM.

  14. #264
    Registered User
    Join Date
    Mar 2006
    Posts
    3
    ^ nice!

  15. #265
    Xtreme Addict
    Join Date
    Jun 2007
    Posts
    1,442
    Quote Originally Posted by botld92z View Post
    ^ nice!
    Thanks....only I have to stick to lower size benches, so the dual sockets dont completely hammer me even at their stock speed.

  16. #266
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    @RGE
    Looks like I've got some updating to do on my website...

    Since you clearly demolished the single-socket rankings for those sizes...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  17. #267
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Updated the records on my site.
    http://www.numberworld.org/y-cruncher/#FastestTimes


    I just ran some benchmarks on my laptop...
    And I have to say that the turbo boost is aggressive enough to "utterly destroy" the multi-core scalability of the program. (In a good way )

    Under 4-core hyper-threaded intensive load, turbo won't activate at all.
    +1 (1.73 GHz) is the max specification, but I only see it when it's under a less intensive load (by running less intensive stuff, or by disabling HT).

    1.6 GHz under 8-thread load:



    Under single-threaded load (locked to a single-core via processor affinity), it gets all the way to +8 turbo (2.66 GHz).
    +9 (2.80 GHz) is the max specification, but again, I only see it when it's under a load that isn't as cpu-intensive... (such as compiling code)

    2.66 GHz under 1-thread load:



    And here are the results of single-thread vs. multi-threaded:


    Only 2.64x faster with multi-threading. (whereas the desktop i7s will get more than 4x at this size)


    Assuming turbo boost gets more and more aggressive in the future, we'll be seeing more of this.


    *I was originally gonna make a specially optimized binary for this laptop (x64 SSE4.1 ~ Akari), but seeing as how temps will reach almost 90C under y-cruncher load, I don't think I'm gonna do it.
    (Depending on the amount of ram, it usually takes about 12-48 hours of 100% cpu across all cores to tune for any particular machine... )

    It's too nice of a laptop to destroy at such a early age. (I love this thing... Thanks 舅舅!!! Gonna get an SSD for it and bump the HD to the second HD slot. )
    Last edited by poke349; 01-09-2010 at 09:35 PM. Reason: grammar fix
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  18. #268
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    What kind of tool are you using for the development?

  19. #269
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Calmatory View Post
    What kind of tool are you using for the development?
    That's kind of a broad question. So I'm not sure what you're asking for.


    Software:

    I use Visual Studio and its compiler for the vast majority of the coding.
    But all tuning is done using the Intel Compiler. The final SSE binaries are all compiled using the Intel Compiler while x86 binary is compiled using Visual Studio.
    I have yet to learn how to use vtune and other profilers yet.


    Hardware:

    I have full access to about 8 or 9 different computers with which to run and test things on. (And a lot more if you include all my friends who let me toy with their machines.)
    In addition to those, I also have access to a lot of the computers in the EECS department at my school here.

    EDIT: But for the most part, there really is only one that matters. And that is obviously my 64GB workstation - which is the only thing I have that can run the larger tests.
    Last edited by poke349; 01-09-2010 at 07:54 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  20. #270
    Registered User
    Join Date
    Jan 2007
    Location
    Fredericksburg, VA - Podunk
    Posts
    43
    Quote Originally Posted by poke349 View Post
    Cool

    btw, you need the rest of the stuff at the top of file to complete the batch validation. But I'll accept these anyway.

    (Not like I've been validating any benchmarks anyway... )


    EDIT:
    What frequency is this at? (Before and after turbo boost)
    And I'm assuming Core i7 920 and 12GB of ram?
    Same specs as before. :P
    Intel Core i7 970 | Corsair H50 + 2x SFF21E | Asus P6X58D-E | 24GB Patriot ViperII Series 7 DDR3-1600 | Seasonic X650W | 4x Samsung PM800 128GB SSD | XFX HD5870 XXX | Lian-Li PC-A05N | M-Audio Firewire 410 | BenQ FP241W

  21. #271
    Registered User
    Join Date
    Sep 2007
    Location
    italy
    Posts
    85
    Last edited by bonis62; 01-15-2010 at 10:04 AM.

  22. #272
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Is this a mis-post? Since it doesn't seem relevant to this thread. Or any sort of digit-crunching for that matter...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  23. #273
    Registered User
    Join Date
    Sep 2007
    Location
    italy
    Posts
    85
    Quote Originally Posted by poke349 View Post
    Is this a mis-post? Since it doesn't seem relevant to this thread. Or any sort of digit-crunching for that matter...

    PrimeCores is written in assembler ( 85M in 11 secs at 3.2 GHz ) and does not use
    advanced mathematical routine therefore fully
    compatible with older processors,
    the problem of using advanced mathematics is a bottleneck more than help, the good is writing custom advanced math routine that run on all processors,
    for benchmarks the compatibility is crucial,
    at least I think so.
    Last edited by bonis62; 01-16-2010 at 12:31 AM.

  24. #274
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by bonis62 View Post
    PrimeCores is written in assembler ( 85M in 11 secs at 3.2 GHz ) and does not use
    advanced mathematical routine therefore fully
    compatible with older processors,
    the problem of using advanced mathematics is a bottleneck more than help, the good is writing custom advanced math that run on all processors,
    for benchmarks the compatibility is crucial,
    at least I think so.
    uh...

    PrimeCores finds prime numbers. It has nothing to do with Pi.

    y-cruncher is fully compatible with older processors. It just won't be as fast.

    In my opinion, the true benchmark is one that adapts to the processor that it's running on by utilizing all special features it has. (Otherwise, why have those special features in the first place?)

    Also, y-cruncher wasn't originally written for benchmarking.
    It was done for a completely different purpose and later converted into a benchmark.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  25. #275
    Registered User
    Join Date
    Sep 2007
    Location
    italy
    Posts
    85
    Quote Originally Posted by poke349 View Post
    uh...

    PrimeCores finds prime numbers. It has nothing to do with Pi.

    y-cruncher is fully compatible with older processors. It just won't be as fast.

    In my opinion, the true benchmark is one that adapts to the processor that it's running on by utilizing all special features it has. (Otherwise, why have those special features in the first place?)

    Also, y-cruncher wasn't originally written for benchmarking.
    It was done for a completely different purpose and later converted into a benchmark.
    if the code performs different operations for different CPU ,
    you dont have a true benchmark, you have a simple test.

    PI use floating point unit , Prime use integer unit,

    I do not want put this into question,
    do not speed race,
    however, are convinced that a benchmark should run with the same (identical) code on many CPUs to be true benchmark.

Page 11 of 33 FirstFirst ... 89101112131421 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •