Page 6 of 33 FirstFirst ... 345678916 ... LastLast
Results 126 to 150 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #126
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Chumbucket843 View Post
    i turned off the cores in msconfig so no background tasks would be placed on other cores. i noticed there was no 3 thread mode b/c when i had 3 threads it said i had 4. i will have a speed up graph soon but i am a newb at openoffice.
    Yes, the program will round up to the next power of two if it isn't.

    The reason for limiting it to powers of two, is for ease of implementation and efficiency of code.

    There's a crap-load of binary divide-and-conquering in virtually all the algorithms that are used. Stuff like that just don't work well with non-power of two threads...

    I've also found that the penalty of running extra threads is relatively small. Assuming that most computers now (and future) will have either a power-of-two # of cores or a "clean multiple" of one, this restriction is worth the ease of implementation.
    Last edited by poke349; 07-27-2009 at 11:17 AM. Reason: re-worded
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  2. #127
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Location
    Florida
    Posts
    562
    Quote Originally Posted by poke349 View Post
    Anyways...
    Anyone got a Core 2 @ 3.2 GHz? My workstation is, but it's down and it's not coming back online for a few more weeks.

    Heres a Yorkfield at 3.2 even though it says 3.8 in the program, I guess it calculates mhz using only the stock multi.

    Q9650

    2600k

  3. #128
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Hoss331 View Post
    Heres a Yorkfield at 3.2 even though it says 3.8 in the program, I guess it calculates mhz using only the stock multi.

    Could you do that run single-threaded? We were trying to compare the two in single-threaded mode (no bandwidth bottleneck) to see which (Core 2 or K10) has faster arithmetic for this program.

    And I like how you set the FSB and mult to match my workstation.


    It seems like on Core 2 it uses the stock multiplier. On i7, it uses the actual maximum multiplier (as set in BIOS) before Turbo Boost - but it never uses more than the stock multiplier.



    Anyhow... I've some insane results coming in from someone in Japan with a very well tuned Dual Xeon W5580 rig with 72GB of ram... Benchmark sizes going all the way up to 32G with the help of Swap Mode...

    I'll post those later. But they all lack verification checksums.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  4. #129
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Location
    Florida
    Posts
    562
    here you go, affinity set to 1 core

    Q9650

    2600k

  5. #130
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Hoss331 View Post
    here you go, affinity set to 1 core

    Awesome



    So the summary for Core 2 vs. Phenom II. (for y-cruncher)


    Single-threaded (arithmetic speed test):

    Phenom II @ 3.2 GHz - 55.1057
    Core 2 Quad (12MB cache) @ 3.2 GHz - 49.5219


    Multi-threaded (arithmetic + bandwidth test):

    Phenom II @ 3.2 GHz - 15.455
    Core 2 Quad (12MB cache) @ 3.2 GHz - 13.9467


    If we did these runs on Q6600 @ 3.2 GHz, that'll also settle issue of cache size.

    The two Q6600s that are already on the list are from v0.3.2.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #131
    Registered User
    Join Date
    Feb 2008
    Posts
    17
    I' running a P4 3,15Ghz

    25M -->152.961s


    50M -->344.768s


    100M -->780.277s



  7. #132
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Location
    Florida
    Posts
    562
    Quote Originally Posted by poke349 View Post
    If we did these runs on Q6600 @ 3.2 GHz, that'll also settle issue of cache size.

    The two Q6600s that are already on the list are from v0.3.2.

    Id also like to see an I7 do a single core single thread run, turbo off.
    Q9650

    2600k

  8. #133
    Registered User
    Join Date
    Nov 2005
    Location
    Plano, TX
    Posts
    82
    Quote Originally Posted by Hoss331 View Post
    Id also like to see an I7 do a single core single thread run, turbo off.
    I can do a 3.2GHz run on my i7 when I get home. It is 10:33am CST now, I should be able to get it run by 5:30pm.


    Poke349: I finally got a new waterblock, the Heatkiller 3.0 CU. I can't believe the thing, 4.4GHz is 100% stable (Linx w/ 8 threads for 24 hours). That block with regular water is better than my old Apogee with ice water, no kidding. 65C full load at 4.2GHz, 1.3v. 75C full load at 4.4GHz, 1.38v. I'll give ice water a shot at some point, I really want to get a 4.6GHz run done. It would be nice if you could include some batch benchmarking.

    For example you could set 3 loops then specify a range from X to Y. This way I could run 3 loops of each & save the fastest time and test times of 1m, 2m, 4m, 8m, 16m, etc digits as well as the 25, 50, 100, etc. Also outputting the fastest result to a file would be nice as not to need to copy/paste so much text. What do you think?
    Core i7 920 @ 4.4GHz, EVGA Classified E760, 3x1GB OCZ Platinum DDR3-1600 @ 1680 7-8-7-24, SLI eVGA 8800GT @ 756/1890/2200, Heatkiller 3.0 CU waterblock, WD Caviar Black 1TB, Hitachi E7K500 500gb, Seasonic S12 SS-650HT psu

  9. #134
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Hoss331 View Post
    Id also like to see an I7 do a single core single thread run, turbo off.
    I can do them, but my rig is tied up for a few more days. Looks like spdy beat me to it.

    Quote Originally Posted by spdycpu View Post
    I can do a 3.2GHz run on my i7 when I get home. It is 10:33am CST now, I should be able to get it run by 5:30pm.


    Poke349: I finally got a new waterblock, the Heatkiller 3.0 CU. I can't believe the thing, 4.4GHz is 100% stable (Linx w/ 8 threads for 24 hours). That block with regular water is better than my old Apogee with ice water, no kidding. 65C full load at 4.2GHz, 1.3v. 75C full load at 4.4GHz, 1.38v. I'll give ice water a shot at some point, I really want to get a 4.6GHz run done. It would be nice if you could include some batch benchmarking.

    For example you could set 3 loops then specify a range from X to Y. This way I could run 3 loops of each & save the fastest time and test times of 1m, 2m, 4m, 8m, 16m, etc digits as well as the 25, 50, 100, etc. Also outputting the fastest result to a file would be nice as not to need to copy/paste so much text. What do you think?

    I completely agree with you. I just need to find the time to polish up my bulk compute add-on and release it.

    3 runs of each - Good idea. I'll probably set that as a default with an option to override it. And I'll add a size-limit to looped runs - say 10 min. Otherwise those massive single-threaded 10 and 12b runs on my workstation will take days.

    I can have it output the benchmarks to a separate text file.


    Something like 3 categories:

    Standard Sizes: 25m, 100m, 250m, etc... all validated - print the best times (with it's validation) into a text file.

    SuperPi Sizes: 1M, 2M, 4M, etc... all validated, same as above

    Multi-core Scaling: 1m, 1.2m, 1.5m, 2m, 2.5m, etc*...
    - Manually select threading mode
    - No validation
    *These are the sizes I used to generate those fancy multi-core scaling graphs.



    I'd love to see a multi-core scaling graph from a pair of Gainestowns... But I honestly doubt anyone will be patient enough to sit through single-threaded runs of 1b+. For me, I just let it run while I'm at work, run overnight...

    I also need a way to enforce processor affinity. I can't manually force it because I wouldn't know which cores are real and which are virtual from HT.



    As for that... Time for some insaneness....



    Results from Japan: http://ja0hxv.calico.jp/pai/pietc.html
    Google translate it if you can't read Japanese. (I can't either...)

    2 x Intel Xeon W5580 Gainestown @ 3.2 GHz
    72 GB (18 x 4 GB) DDR3
    Windows Server 2008

    25m - 6.92
    50m - 13.31
    100m - 28.14
    250m - 76.34
    500m - 166.07
    1b - 365.20
    2.5b - 1,025.05
    5b - 2,307.18
    10b - 4,961 (1 hour, 22 min, 41 secs)
    25b - 19,415 (5 hours, 23 min, 35 secs) - Done using Swap Mode*

    1M - 0.37
    2M - 0.67
    4M - 1.21
    8M - 2.31
    16M - 4.47
    32M - 8.75
    64M - 18.02
    128M - 38.18
    256M - 82.63
    512M - 185.41
    1G - 398.09
    2G - 868.54
    4G - 1,928.29
    8G - 4,235 (1 hour, 10 min, 35 secs)
    16G - 11,892 (3 hours, 18 min, 12 secs) - Done using Swap Mode*
    32G - 31,061 (8 hours, 37 min, 41 secs) - Done using Swap Mode*


    One thing I have to say... This guy is NUTs...
    He gets new workstations like this about once every half a year.

    The last few he had are:

    2 x Intel Xeon X5470
    128 GB (16 x 8 GB) DDR2 FB-DIMM

    2 x Intel Xeon X5460
    64 GB (16 x 4 GB) DDR2 FB-DIMM


    Not only that... He ACTUALLY ran this program for 8+ hours just for a benchmark. That's a pretty good stress test...
    I've done longer runs than that (200+ hours), but that's because they were either tests, or were for size records. Not benchmarks...



    *Swap Mode requires less memory but is significantly slower.
    There's no validation for it, and it's available under the Custom Compute option.



    Lastly... Dave, if you're here, you've got some SERIOUS competition.
    This guy knows how to tune these things... enough to make his W5580s faster than your W5590s.
    Last edited by poke349; 08-14-2009 at 05:00 PM. Reason: typo
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  10. #135
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    i was just on google trends and japan is the #1 country to search core i7.

  11. #136
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    deleted

  12. #137
    Registered User
    Join Date
    Nov 2005
    Location
    Plano, TX
    Posts
    82
    Quote Originally Posted by poke349 View Post
    Awesome



    So the summary for Core 2 vs. Phenom II. (for y-cruncher)


    Single-threaded (arithmetic speed test):

    Phenom II @ 3.2 GHz - 55.1057
    Core 2 Quad (12MB cache) @ 3.2 GHz - 49.5219


    Multi-threaded (arithmetic + bandwidth test):

    Phenom II @ 3.2 GHz - 15.455
    Core 2 Quad (12MB cache) @ 3.2 GHz - 13.9467


    If we did these runs on Q6600 @ 3.2 GHz, that'll also settle issue of cache size.

    The two Q6600s that are already on the list are from v0.3.2.
    i7 @ 3.2GHz, 3.6GHz Uncore, Memory @ 1600 7-7-6-16.

    Single:
    Code:
    Benchmark Successful. The digits appear to be OK.
    
    Program Version:    0.4.1 Build 7408 (x64 SSE3)
    Processor(s):       Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    CPU Frequency:      3,192,005,951 Hz  (frequency may be inaccurate)
    Thread(s):          1
    Digits:             25,000,000
    Total Time:         44.5555 seconds
    Checksum:           506bd9db81dfe73a07ae66fb5da8af7e
    Multi (with HT):
    Code:
    Benchmark Successful. The digits appear to be OK.
    
    Program Version:    0.4.1 Build 7408 (x64 SSE3)
    Processor(s):       Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    CPU Frequency:      3,192,005,119 Hz  (frequency may be inaccurate)
    Thread(s):          8
    Digits:             25,000,000
    Total Time:         11.4947 seconds
    Checksum:           d2ec2f25569fffbd04422301296a783b
    Core i7 920 @ 4.4GHz, EVGA Classified E760, 3x1GB OCZ Platinum DDR3-1600 @ 1680 7-8-7-24, SLI eVGA 8800GT @ 756/1890/2200, Heatkiller 3.0 CU waterblock, WD Caviar Black 1TB, Hitachi E7K500 500gb, Seasonic S12 SS-650HT psu

  13. #138
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Chumbucket843 View Post
    i was just on google trends and japan is the #1 country to search core i7.
    lolz...
    They always have the newest gadgets for just about everything except for maybe processors... since both Intel and AMD are US-based...

    Quote Originally Posted by spdycpu View Post
    i7 @ 3.2GHz, 3.6GHz Uncore, Memory @ 1600 7-7-6-16.
    Nice... At some point, I'm gonna need to make a database on my site. But I don't have the time for all that... argh...


    And you're hitting 4.6 on plain water? That's just insane... Because 5GHz is already LN2 territory. Those benches will be interesting and hard to beat.


    Today, I got a nice look at a 96-core 16 x Dunnington machine with 512 GB ram at fair... Too bad it was too busy for me to try any benches...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  14. #139
    I am Xtreme Ket's Avatar
    Join Date
    Apr 2004
    Location
    United Kingdom
    Posts
    6,822
    Not bad considering I'm running SETI atm as well reported CPU frequency is wrong, its 3.6GHz. I'll be back with some proper results when my PC9200 turns up.

    Last edited by Ket; 07-30-2009 at 06:10 AM.

  15. #140
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    I got some new fans.

    A pair of these:
    http://www.newegg.com/Product/Produc...82E16835213009
    (I didn't get them from newegg though.)


    Speed controlled, they are just as quiet as my old ones with slightly more airflow. I run it at this speed normally...

    At full power, I can't hear myself talk...


    Now I can safely hit 4.2GHz on air - with more room to spare.

    This was stress test more than a benchmark. I intentionally left RealTemp and CPUz on to monitor it.



    The temps peaked at 84C. They hit 90C when I benched 4GHz with my old fans.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  16. #141
    Xtreme X.I.P. Particle's Avatar
    Join Date
    Apr 2008
    Location
    Kansas
    Posts
    3,219
    Greetings, poke. I like your benchmark program--it's quite nice. Have you by chance experienced an issue where it doesn't seem to hit all cores very effectively? I've got a 12-core machine where it seems to stay in the 40-60% CPU range for smaller benchmarks (under 32M) and 60-80% for larger ones. It never actually "pegs" so to speak.
    Particle's First Rule of Online Technical Discussion:
    As a thread about any computer related subject has its length approach infinity, the likelihood and inevitability of a poorly constructed AMD vs. Intel fight also exponentially increases.

    Rule 1A:
    Likewise, the frequency of a car pseudoanalogy to explain a technical concept increases with thread length. This will make many people chuckle, as computer people are rarely knowledgeable about vehicular mechanics.

    Rule 2:
    When confronted with a post that is contrary to what a poster likes, believes, or most often wants to be correct, the poster will pick out only minor details that are largely irrelevant in an attempt to shut out the conflicting idea. The core of the post will be left alone since it isn't easy to contradict what the person is actually saying.

    Rule 2A:
    When a poster cannot properly refute a post they do not like (as described above), the poster will most likely invent fictitious counter-points and/or begin to attack the other's credibility in feeble ways that are dramatic but irrelevant. Do not underestimate this tactic, as in the online world this will sway many observers. Do not forget: Correctness is decided only by what is said last, the most loudly, or with greatest repetition.

    Rule 3:
    When it comes to computer news, 70% of Internet rumors are outright fabricated, 20% are inaccurate enough to simply be discarded, and about 10% are based in reality. Grains of salt--become familiar with them.

    Remember: When debating online, everyone else is ALWAYS wrong if they do not agree with you!

    Random Tip o' the Whatever
    You just can't win. If your product offers feature A instead of B, people will moan how A is stupid and it didn't offer B. If your product offers B instead of A, they'll likewise complain and rant about how anyone's retarded cousin could figure out A is what the market wants.

  17. #142
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Particle View Post
    Greetings, poke. I like your benchmark program--it's quite nice. Have you by chance experienced an issue where it doesn't seem to hit all cores very effectively? I've got a 12-core machine where it seems to stay in the 40-60% CPU range for smaller benchmarks (under 32M) and 60-80% for larger ones. It never actually "pegs" so to speak.
    Yes, it's a fundamental issue with this type of task. Hence why it's taken while...

    Pi - by it's very nature doesn't parallel as well as wprime, or any other "artificially made" task.

    Why your cores aren't kept busy 100% of the time can be due to several reasons:
    • Load imbalance. Most types of scientific computing like this don't split evenly (or at least it's not easy to do so). So some threads will finish before others. When this happens, the threads that are done need to wait for the others.
    • Not every part of the computation is paralleled. Fast operations like additions and subtractions are limited by memory bandwidth so they will not benefit from multi-threading.
    • Thread creation and destruction have a lot of overhead. When the working size for a particular operation is small enough, the overhead of thread creation becomes greater than the benefit of threading. At this point, the program doesn't parallel it - hence less than 100% cpu.
    • Refresh rate of Task Manager. Task manager and other monitors average cpu usage over a period of time. If the computation is small, there won't any period of sustained 100% cpu long enough to average a 100%.

    The larger the computation, the smaller the effect of these inefficiencies, and the higher the cpu usage.
    With 12 cores, you're probably gonna need to go above 1 billion digits to get cpu usage averaging > 90%.
    You WILL need to go up to several billion digits to achieve sustained 100% cpu that can last a few minutes. Most people don't have that kind of ram so it isn't suitable as a stress-test unless you run multiple instances.


    CPU usage can be improved if I allow multi-threading to increase memory usage... But that gets prohibitive after a while. This type of computing is already enough of a memory hog as it is. So I prefer the ability to hit larger sizes.

    So in some sense, computing Pi is a benchmark that "more closely" resembles real-life scientific computing.


    EDIT:
    So yes. Have fun with it. Tell all those Pi fanatics... they'll need to move away from those C2D's to stay competitive. jk
    Last edited by poke349; 08-04-2009 at 11:02 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  18. #143
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Are these guys with 16-thread Xeon systems not hitting 100% CPU usage either?
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  19. #144
    Xtreme X.I.P. Particle's Avatar
    Join Date
    Apr 2008
    Location
    Kansas
    Posts
    3,219
    Is it possible to specify a manual thread count? Inspired by HT people, I'd like to try a run at 24 threads to see if things stay busier.
    Particle's First Rule of Online Technical Discussion:
    As a thread about any computer related subject has its length approach infinity, the likelihood and inevitability of a poorly constructed AMD vs. Intel fight also exponentially increases.

    Rule 1A:
    Likewise, the frequency of a car pseudoanalogy to explain a technical concept increases with thread length. This will make many people chuckle, as computer people are rarely knowledgeable about vehicular mechanics.

    Rule 2:
    When confronted with a post that is contrary to what a poster likes, believes, or most often wants to be correct, the poster will pick out only minor details that are largely irrelevant in an attempt to shut out the conflicting idea. The core of the post will be left alone since it isn't easy to contradict what the person is actually saying.

    Rule 2A:
    When a poster cannot properly refute a post they do not like (as described above), the poster will most likely invent fictitious counter-points and/or begin to attack the other's credibility in feeble ways that are dramatic but irrelevant. Do not underestimate this tactic, as in the online world this will sway many observers. Do not forget: Correctness is decided only by what is said last, the most loudly, or with greatest repetition.

    Rule 3:
    When it comes to computer news, 70% of Internet rumors are outright fabricated, 20% are inaccurate enough to simply be discarded, and about 10% are based in reality. Grains of salt--become familiar with them.

    Remember: When debating online, everyone else is ALWAYS wrong if they do not agree with you!

    Random Tip o' the Whatever
    You just can't win. If your product offers feature A instead of B, people will moan how A is stupid and it didn't offer B. If your product offers B instead of A, they'll likewise complain and rant about how anyone's retarded cousin could figure out A is what the market wants.

  20. #145
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Mechromancer View Post
    Are these guys with 16-thread Xeon systems not hitting 100% CPU usage either?
    I definitely wouldn't expect them to... But CPU usage as a whole can be fairly deceiving. Even though it "doesn't look" to be efficient, you're still getting massive speed up.

    Here's the 8-core @ 1b screenie on my website:
    I consider this a "good" graph. Mostly @ 100% but with dips every 10 - 20 seconds...

    In most cases is won't be as efficient as this.


    The only time where I've gotten near sustained 100% cpu is during one of the world size-records I set back in April:

    Same computer: 8 cores @ 31 billion digits of a different constant
    (click to enlarge)

    This kind of efficiency... is only achievable if you have either a REALLY SLOW computer , or if you have a completely stupid amount of ram...


    Quote Originally Posted by Particle View Post
    Is it possible to specify a manual thread count? Inspired by HT people, I'd like to try a run at 24 threads to see if things stay busier.
    Good thinking :

    The program actually already does that. When you run N threads, it will usually run 2N and occasionally 4N threads.

    In any case, if you have a non-power of 2 cores, it rounds up. So on your rig, the program is running in 16-core mode which uses anywhere from 16 - 64 threads. (There's an option in task manager that shows how many threads a process is using.)


    You can manually set your settings in the "Custom Compute a Constant" option. But there's no validation.


    Just remember that higher % cpu usage doesn't always mean faster time.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  21. #146
    Xtreme X.I.P. Particle's Avatar
    Join Date
    Apr 2008
    Location
    Kansas
    Posts
    3,219
    Oh wow...I see you're creating and destroying threads during the computation cycle itself. In my own programming I've found it to be a good idea to create x number of threads and then use them all to process pieces of work dispatched from a synchronous controller. That may or may not be practical or applicable to your particular algorithm of course--I won't pretend to be familiar with your project. In any case, that does make sense now. I saw as few as 8 threads and as many as 60-some.
    Particle's First Rule of Online Technical Discussion:
    As a thread about any computer related subject has its length approach infinity, the likelihood and inevitability of a poorly constructed AMD vs. Intel fight also exponentially increases.

    Rule 1A:
    Likewise, the frequency of a car pseudoanalogy to explain a technical concept increases with thread length. This will make many people chuckle, as computer people are rarely knowledgeable about vehicular mechanics.

    Rule 2:
    When confronted with a post that is contrary to what a poster likes, believes, or most often wants to be correct, the poster will pick out only minor details that are largely irrelevant in an attempt to shut out the conflicting idea. The core of the post will be left alone since it isn't easy to contradict what the person is actually saying.

    Rule 2A:
    When a poster cannot properly refute a post they do not like (as described above), the poster will most likely invent fictitious counter-points and/or begin to attack the other's credibility in feeble ways that are dramatic but irrelevant. Do not underestimate this tactic, as in the online world this will sway many observers. Do not forget: Correctness is decided only by what is said last, the most loudly, or with greatest repetition.

    Rule 3:
    When it comes to computer news, 70% of Internet rumors are outright fabricated, 20% are inaccurate enough to simply be discarded, and about 10% are based in reality. Grains of salt--become familiar with them.

    Remember: When debating online, everyone else is ALWAYS wrong if they do not agree with you!

    Random Tip o' the Whatever
    You just can't win. If your product offers feature A instead of B, people will moan how A is stupid and it didn't offer B. If your product offers B instead of A, they'll likewise complain and rant about how anyone's retarded cousin could figure out A is what the market wants.

  22. #147
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Particle View Post
    Oh wow...I see you're creating and destroying threads during the computation cycle itself. In my own programming I've found it to be a good idea to create x number of threads and then use them all to process pieces of work dispatched from a synchronous controller. That may or may not be practical or applicable to your particular algorithm of course--I won't pretend to be familiar with your project. In any case, that does make sense now. I saw as few as 8 threads and as many as 60-some.
    I've thought about using thread pools, but I decided against it for a few reasons:

    1. I couldn't figure out how to use that API.
    2. The program was written for extremely large computations (for breaking size-records). And on large computations, threading overhead is negligible.
    3. My intuition told me that a synchronous work-dispatcher might have problems scaling into "many" cores... by many, I mean tens or hundreds...
      (Specifically, it would take linear time to dispatch N-loads of work for N cores, whereas recursive thread-creation would take only log(N) provided that the memory allocator was efficient.)
    4. Lastly, I was just plain lazy...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  23. #148
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Only if there was Linux binaries...

  24. #149
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    If I had more experience with Linux, then there would be binary for it...

    "Eventually", I'll have Linux binaries... whenever I get the time...
    The entire program has been written to be easily ported to Linux... So I don't expect it to be too hard to do so when the time comes.

    Have you tried running it under Wine?
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  25. #150
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    if you compiled for linux and the cell i could run this thing on my ps3. actually the cell has been beaten in flops by x86 by now though and its a PITA to work with.

Page 6 of 33 FirstFirst ... 345678916 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •