Page 3 of 33 FirstFirst 12345613 ... LastLast
Results 51 to 75 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #51
    Registered User
    Join Date
    Nov 2005
    Location
    Plano, TX
    Posts
    82
    When you think think the new version will be released? I'm very eager to try it out on some older systems that have very little ram. I haven't disassembled it yet but I'm wondering if the non-sse3 x86 version can run on a 486 or older socket-7 class processors without MMX, FCMOV/CMOV instructions or any SSE at all.
    Core i7 920 @ 4.4GHz, EVGA Classified E760, 3x1GB OCZ Platinum DDR3-1600 @ 1680 7-8-7-24, SLI eVGA 8800GT @ 756/1890/2200, Heatkiller 3.0 CU waterblock, WD Caviar Black 1TB, Hitachi E7K500 500gb, Seasonic S12 SS-650HT psu

  2. #52
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    It'll be out by the weekend. Could be as early as tomorrow... Depends on how bad my problem set that's due friday is...

    Looking at the disassembly of the x86 version...

    Code:
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 15.00.30729.01 
    
    	TITLE	d:\y-cruncher\Source Files\Main.cpp
    	.686P
    	.XMM
    	include listing.inc
    	.model	flat
    .XMM is a bad sign... BUT... I find absolutely no SSE/xmm, etc... whatsoever in all 5MB of assembly...

    But I did enable the x86 versions to allow more than 2GB memory usage - which might be a problem on older systems if they don't recognize that flag...

    I should probably disable that flag, because even though the OS will let the program use more than 2GB of ram, it won't let you allocate a contiguous block of memory > 2GB of ram - which is what y-cruncher needs to do... So it's kinda useless...

    One thing I have noticed is that all x64 compilations use SSE for all floating point - regardless of what option I set it to. That is probably because all x64 processors have SSE2 or better. But VS isn't smart enough to actually try to vectorize any of the floating point... It just uses SSE to utilize the extra registers...
    Last edited by poke349; 04-29-2009 at 10:37 PM. Reason: typo
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  3. #53
    Registered User
    Join Date
    Nov 2005
    Location
    Plano, TX
    Posts
    82
    Quote Originally Posted by poke349 View Post
    One thing I have noticed is that all x64 compilations use SSE for all floating point - regardless of what option I set it to. That is probably because all x64 processors have SSE2 or better. But VS isn't smart enough to actually try to vectorize any of the floating point... It just uses SSE to utilize the extra registers...
    I tried the Intel compiler but the vectorization it performs is still quite inferior to hand coded asm almost all of the time. I gave up on using it for that so unfortunately any SIMD instruction set usage that involves vectorization I end up doing by hand. It is a pain but the rewards are usually worth it.

    I'm curious, did you do most or all of the SSE3 code yourself? Also have you seen anything useful in future instruction sets (SSE4.1/4.2 and SSE5A) that may help increase the speed of the PI calculations?
    Core i7 920 @ 4.4GHz, EVGA Classified E760, 3x1GB OCZ Platinum DDR3-1600 @ 1680 7-8-7-24, SLI eVGA 8800GT @ 756/1890/2200, Heatkiller 3.0 CU waterblock, WD Caviar Black 1TB, Hitachi E7K500 500gb, Seasonic S12 SS-650HT psu

  4. #54
    Registered User
    Join Date
    Dec 2008
    Posts
    7
    Nice.

    But I'm bit "worried" about my result, 'cause I OC-ed CPU to 3.5GHz and still have weaker result than Phenom 9550 which is not OC-ed. Is it possible that this program knows that much how to use whole 4 cores? I guess, I'm suprised because SuperPi is way different from this (still, doesn't mean that it's better).

    Here are results:

    1. y-cruncher 0.3.1.6891 Alpha (x86 SSE3).exe



    2. y-cruncher 0.3.1.6891 Alpha (x86).exe



    Both tests are with 25,000,000 as you can see on the screens.

    Hint for everyone who are new with this types of program: Try to close as much background unused (at the moment) programs as you can. It helped me for 'bout 10s (SuperPi gives me like ms...)

    My rig:

    E6550 @ 3.5GHz
    Gigabyte P35-DS3L @ 500MHz FSB
    Kingmax 2x2GB PC8500 @ 1000MHz 5-6-6-20
    Windows XP SP2 32bit

    ...and the rest doesn't matter.

  5. #55
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Partybreaker View Post
    Nice.

    But I'm bit "worried" about my result, 'cause I OC-ed CPU to 3.5GHz and still have weaker result than Phenom 9550 which is not OC-ed. Is it possible that this program knows that much how to use whole 4 cores? I guess, I'm suprised because SuperPi is way different from this (still, doesn't mean that it's better).

    Yes, the program will detect 4 cores and use all of them. Also, the other major reason is that you're running the x86. The x64 versions are about 40% faster than the x86 versions.

    I mixed all the rankings together in that first post. Most of them are done with the x64 SSE binaries. But the occasional x86 (with or without SSE3) binary would explain some of the "unusually slow" scores.


    I'm curious, did you do most or all of the SSE3 code yourself? Also have you seen anything useful in future instruction sets (SSE4.1/4.2 and SSE5A) that may help increase the speed of the PI calculations?
    Yes, all vectorized SSE code was done by hand using intrinsics. No inline assembly was used though. SSE 4.1 has one or two interesting instructions that might be useful. SSE5A's packed MAC instructions could be useful... But I'm most interested in Intel's 256-bit AVX instructions...

    There's nothing in SSSE3, SSE4.2 and SSE4a that are useful.
    Last edited by poke349; 04-29-2009 at 03:18 PM. Reason: typo
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #56
    Registered User
    Join Date
    Sep 2004
    Posts
    81
    this is my run of ycruncher thought it was ok for that test

    http://smg.photobucket.com/albums/v5...=ycruncher.jpg
    __________________
    My Athlon Powerhouse
    (AXOA3000KV4D AGCA0308VPBW) =xp3000@240x10 so far /1024mb ocz value winbond ddr400/AN7 stock no Mods yet /[B]sparkle 6600 128mb/tt480w purepower :banana4:

    2xFX-74's 3.0ghz 4gig patriot ddr2 800 5-5-5-12 2xgtx260+'s 896mb 216sp 65nm sli asus L1N64-SLI-WS 2x500gb samsung 16mb cache sata 300 raid 0 1x 160gig sys win xp 64bit pioneer 1xblueray burner and 1x liteon dvd burner coolermaster stacker se 830

  7. #57
    Registered User
    Join Date
    Oct 2006
    Posts
    14
    Here's some of the smaller ones from the dual X5560's to see how they stack up against the Skulltrail

    50 Million



    100 Million



    250 Million



    Looks like it does OK

  8. #58
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Wow... as had been said, "There's a monster within these...".

    Couldn't quite tip the skulltrail at 25m?

    The larger the computation, the more memory intensive it gets, and the better it scales to multithreading - both of which suit the i7s well.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  9. #59
    Registered User
    Join Date
    Oct 2006
    Posts
    14
    Yes, the Skulltrail pips it by a few tenths at 25 million -



    Interesting

  10. #60
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Version 0.3.2 is out!!!!

    http://www.numberworld.org/y-cruncher/

    Main Improvements:

    - Decreased Memory usage: All i7 systems should be able to go one size larger with this new version.
    - Single-threaded benchmark mode: Now you can see how well y-cruncher scales with multi-threading on your system!
    - Stronger anti-cheat protection: The program should be more resistant to cheating.*


    *This version uses some rather aggressive methods to detect cheating and I don't have access to enough overclocked computers to fully test it before releasing it.
    Let me know if it gives any false positives.

    Also, I have yet to test if this version is still vulnerable to "slow-motion" cheats. It may or it may not. I don't have access to a suitable computer to try these cheats so I don't know yet. Other than that, let me know if there are other ways to cheat the program. (There's one method I can think of that "might" work... But I haven't tried it because it's risky and I don't want to inadvertently kill my system right before midterms...)
    Last edited by poke349; 04-30-2009 at 12:30 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  11. #61
    Xtreme Enthusiast
    Join Date
    Dec 2003
    Location
    UK
    Posts
    567
    Pretty cool benchmarking program. Heres my result with 1.7 billion digits (8GB of RAM usage) on a E7400 at 3.6GHz (9x400)


  12. #62
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    uh... pushing so close to 8GB of ram. Don't you need space for the OS?
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  13. #63
    Registered User
    Join Date
    Nov 2005
    Location
    Plano, TX
    Posts
    82
    Quote Originally Posted by wizzard67 View Post
    Yes, the Skulltrail pips it by a few tenths at 25 million -



    Interesting
    In a previous post you mentioned you had turbo mode enabled, what is your real clock speed? For example the PI program says my i7 at 4.3 is 4.1GHz due to turbo mode. Also, what are your memory timings?
    Core i7 920 @ 4.4GHz, EVGA Classified E760, 3x1GB OCZ Platinum DDR3-1600 @ 1680 7-8-7-24, SLI eVGA 8800GT @ 756/1890/2200, Heatkiller 3.0 CU waterblock, WD Caviar Black 1TB, Hitachi E7K500 500gb, Seasonic S12 SS-650HT psu

  14. #64
    Registered User
    Join Date
    Oct 2006
    Posts
    14
    Quote Originally Posted by spdycpu View Post
    In a previous post you mentioned you had turbo mode enabled, what is your real clock speed? For example the PI program says my i7 at 4.3 is 4.1GHz due to turbo mode. Also, what are your memory timings?
    AFAIK, turbo is enabled by default on this system but it will only kick in if I'm using less cores.

    Default speed of theses CPU's is 2.8GHz and the turbo will only bump that up when the processing load allows. In the case of this benchmark, turbo mode will not activate as all cores are in use. I have an Intel lab here somewhere which I can run to show the turbo in action if you like

    Memory is DDR3 1066 with timings of -

    CAS Latency - 7
    RAS# to CAS# Delay - 7
    RAS# Precharge - 7
    Cycle Time - 20
    Command Rate - 1T

    HTH
    Last edited by wizzard67; 05-04-2009 at 02:33 AM. Reason: Added memory info

  15. #65
    Registered User
    Join Date
    Nov 2005
    Location
    Plano, TX
    Posts
    82
    Quote Originally Posted by wizzard67 View Post
    AFAIK, turbo is enabled by default on this system but it will only kick in if I'm using less cores.

    Default speed of theses CPU's is 2.8GHz and the turbo will only bump that up when the processing load allows. In the case of this benchmark, turbo mode will not activate as all cores are in use. I have an Intel lab here somewhere which I can run to show the turbo in action if you like

    Memory is DDR3 1066 with timings of -

    CAS Latency - 7
    RAS# to CAS# Delay - 7
    RAS# Precharge - 7
    Cycle Time - 20
    Command Rate - 1T

    HTH
    Do you know if the Gainestown turbo mode functions any differently than the standard i7? Also what are you using to verify you're dropping out of turbo mode with all cores running? The one I found that seems to be accurate is Everest 5 under Computer->Overclock. When running 4.1GHz with 4.3GHz turbo it'll stay at 4.3 during the entire PI run (HT enabled, 8 threads total). The only time I can see it dropping out of turbo is when I get something going for extended periods of time like Prime95.

    Another interesting thing may be if you run the PI test with turbo mode off and compare on/off results (if you have time). I'd love to see the turbo stuff in Intel Lab, thanks.
    Core i7 920 @ 4.4GHz, EVGA Classified E760, 3x1GB OCZ Platinum DDR3-1600 @ 1680 7-8-7-24, SLI eVGA 8800GT @ 756/1890/2200, Heatkiller 3.0 CU waterblock, WD Caviar Black 1TB, Hitachi E7K500 500gb, Seasonic S12 SS-650HT psu

  16. #66
    Registered User
    Join Date
    Oct 2006
    Posts
    14
    Quote Originally Posted by spdycpu View Post
    Do you know if the Gainestown turbo mode functions any differently than the standard i7? Also what are you using to verify you're dropping out of turbo mode with all cores running? The one I found that seems to be accurate is Everest 5 under Computer->Overclock. When running 4.1GHz with 4.3GHz turbo it'll stay at 4.3 during the entire PI run (HT enabled, 8 threads total). The only time I can see it dropping out of turbo is when I get something going for extended periods of time like Prime95.

    Another interesting thing may be if you run the PI test with turbo mode off and compare on/off results (if you have time). I'd love to see the turbo stuff in Intel Lab, thanks.
    Hi,

    Looking back through my notes from the Intel Bootcamp, the program we used there was called 'FreakOut' which I can't seem to find anywhere. I'll give Everest a go instead.

    I'll try and find some time to run with Turbo disabled as well to see if there's a difference.

    Cheers

  17. #67
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Finally done with the last of my midterms.

    Turbo has 2 levels. It will up your multiplier by 1 or 2.

    Upping by 1 seems to always be the case when it's under load. Even under multi-threaded load.

    Upping by 2 only seems to happen when you have 1 thread that is locked to a single core.

    My suitemate and I were just playing around with this on his 920. Running the program in multi-threaded mode always jacks it from 20 to 21.
    Under single-threaded mode (without locking it to a single core), it still only goes up to 21.
    If we lock it to one core (via processor affinity), then it will go up to 22 for short bursts.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  18. #68
    Registered User
    Join Date
    Oct 2006
    Posts
    14
    Hmm, it would appear that the Nehalem-EP has a different level of turbo to the desktop i9xx.

    From the documents that came with the ES CPU's (X5560 @2.8GHz) I got, the Turbo specs are as follows.

    Turbo active Cores 4c 3c 2c 1c
    Max Turbo Frequency 3.06 3.06 3.20 3.20
    Max Turbo Bin Upside 2 2 3 3

    Now, these are different for the other CPU's in the range.

    5502 has no Turbo
    5520 and 5530 have Turbo Bin Upside of 1 (4c and 3c) and 2 (2c and 1c)
    5550/60/70 Have Turbo Bin Upside of 2 and 3
    5580 Has only 1 Turbo Bin Upside of 1 on a single core.

    I'll try some more tests in the coming week to see what effects this has on the benchmark.

    Cheers

  19. #69
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    some win 7 x64 results:

    Phenom II X4 955 @ 3.8Ghz
    NB @ 2.4Ghz
    DDR3-1333

    will probably follow up with better results at some stage with a higher NB and RAM clock.. better cooling on the way

    25M: 13.795s



    50M: 29.782s



    100M: 63.857s


    250M: 177.889s


  20. #70
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by wizzard67 View Post
    5502 has no Turbo
    5520 and 5530 have Turbo Bin Upside of 1 (4c and 3c) and 2 (2c and 1c)
    5550/60/70 Have Turbo Bin Upside of 2 and 3
    5580 Has only 1 Turbo Bin Upside of 1 on a single core.
    hmm... So if I'm reading that right, the 5580 only gives you 133 Mhz more than the 5570 since the 5580 doesn't turbo past 3.2 GHz yet the 5570 will turbo up to 3.07 GHz.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  21. #71
    Xtreme Member
    Join Date
    Mar 2005
    Location
    Trinidad and Tobago
    Posts
    400
    rig1 results
    Rig 1 Asus ROG Strix B550-F WiFi, R7 5800x, 64GB Vengeance LPX 4*16GB, Zotac GTX 1070Ti, X-Fi Titanium, Enermax Revolution D.F. 850w, SSDs 768GB, HDD 3TB, CM 912HAF, NH D-15 Black.cr
    Rig 2 Asus Maximus VI Hero, i7 4770K@4000, 32GB Ballistix VLP 4*8GB, Gigabyte GTX 970, eVGA G3 850, SSD 512GB, HDD 2TB, TT Element-T, NH D-14.

  22. #72
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Great benchmarks! Good to see some more AMDs.

    Are there anymore i7s with 6GB of ram? So far, there isn't many 1b results...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  23. #73
    Xtreme Member
    Join Date
    Mar 2005
    Location
    Trinidad and Tobago
    Posts
    400
    going to change cpu cooler soon
    then i'll be able to use 8gb and do some more benchies
    Rig 1 Asus ROG Strix B550-F WiFi, R7 5800x, 64GB Vengeance LPX 4*16GB, Zotac GTX 1070Ti, X-Fi Titanium, Enermax Revolution D.F. 850w, SSDs 768GB, HDD 3TB, CM 912HAF, NH D-15 Black.cr
    Rig 2 Asus Maximus VI Hero, i7 4770K@4000, 32GB Ballistix VLP 4*8GB, Gigabyte GTX 970, eVGA G3 850, SSD 512GB, HDD 2TB, TT Element-T, NH D-14.

  24. #74
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Just to note:

    I've started on some optimizations. Which means that:
    Future versions will NOT be speed consistent with the current releases.

    Furthermore, future versions will not be consistent with each other.
    I know this will wreck havoc on benchmarks, but that isn't the point of this program. Don't worry though, I'm keeping track of the version #'s from now on.


    Here is a screenshot from my latest build:

    (click to enlarge)


    7036.54 seconds (down from 7360.56) is about ~4.6% improvement from the previous version at this size. This is enough to make benchmarks not-comparable between the current and future versions.

    For this reason (among others), I'm not going to release a new version for a while. (barring any major bugs that need immediate patching)

    Version 0.3.3 will not be released. The next release will probably be v0.4 and it won't be for a while since I have my priorities elsewhere for the time being.


    This "quick and dirty" optimization was done for a completely retarded reason:

    My suitemates (including Serotoninn) and I have been trying for quite a while to tweak this computer to go under 2 hours (7200 seconds) for 10b...
    But after like 2 months, we couldn't do any better than 7280 seconds (and no it won't overclock)...

    So we gave up... and I finally decided to do this optimization - just to end our misery..
    Last edited by poke349; 05-19-2009 at 09:01 AM. Reason: grammar fix
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  25. #75
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Here's a small preview into version 0.4.1...

    Since it seems that every single Pi benchmark out there has support for SuperPi sizes, (1M, 2M, 4M, etc...) I figured I should add them too...

    But with a slightly higher limit...




    And before anyone starts to drool over some belief that I have now have 128GB of ram... NO, I don't. I still "only" have my 64GB. So no, I will NOT be able to test the 25b and 16G sizes... Though I'm 95% sure they work. Anyone with a monster server?

    *I'll be adding a 32G size as soon as I obtain a checksum for it...
    Nobody's gonna have the ram for it... But I like going overboard.
    Last edited by poke349; 06-10-2009 at 08:39 AM. Reason: typo fix
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

Page 3 of 33 FirstFirst 12345613 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •