Page 31 of 33 FirstFirst ... 21282930313233 LastLast
Results 751 to 775 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #751
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Sorry guys. I've been busy for a while and kinda neglected both the program and this thread.

    I'll try to get the lists back up to date in the new week or so.

    Quote Originally Posted by st0ned View Post
    Any chance of a Windows 8 Update ? I mean to support AVX on windows 8 :P
    The issue with it not detecting AVX support in Windows 8 is a known problem. I've had multiple reports of this.
    Currently, I don't have a Windows 8 machine to test the fix for this. So I'm not gonna bother with it in the meantime.

    For now, the work-around is to simply go into the "Binaries" folder and run the "x64 - AVX ~ Hina" binary manually. The "y-cruncher.exe" binary in the main folder is just a launcher that detects the environment and tries to pick the best binary to run. You are free to override what it chooses.

    Quote Originally Posted by Utroz View Post
    Nice times and multi-core efficiency CRFX. I am curious why you(CRFX) ran x64 see3 kasumi code path as opposed to the x64 AVX Hina code path? Maybe they can release a FMA code path that would be even faster. (leaning towards FMA3 because it will be supported by future intel haswell and current Amd piledriver cores as opposed to FMA4 which is AMD bulldozer and piledriver only afaik but if it is not to hard to make both it would be cool to compare on piledriver and see whats faster FMA3 or FMA4)
    Quote Originally Posted by CRFX View Post
    I found the AVX Hina version way slower on both my bulldozer and piledriver chips. The Kasumi version is the fastest of all the executable included, for me at least.
    I noticed Y-cruncher 6.1 will include FMA4, so that should speed things up a bit.
    This is also a known "problem". On Bulldozer and family, the FPU can sustain either 2 x 128-bit instructions or 1 x 256-bit instructions per cycle. In other words, there is no benefit to using AVX. Furthermore, there are hardware "optimizations*" that only apply to 128bit instructions.

    Combine that with the extra overhead of packing/unpacking 256-bit SIMD and the it results in a significant net slowdown.

    Currently, my AVX, FMA, and XOP codepaths are all 256-bit. I'm somewhat torn on whether I make 128-bit codepaths just for Bulldozer and family. Or whether I should just leave it and hope AMD will eventually bring 256-bit up to par in the future.

    As for FMA3 vs. FMA4: I plan to set all the FMA codepaths to use FMA3 and all the XOP codepaths to use FMA4. That said, I currently don't have the hardware to properly test either one of these. Whether or not v0.6.1 will have them will depend on whether I finish it before or after I get my hands on the needed hardware.

    *For those familiar with low-level details, I'm specifically talking about the register move renaming. Bulldozer has it for 128-bit SIMD, but not for 256-bit.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  2. #752
    Registered User
    Join Date
    Jan 2008
    Posts
    37
    The sad reality of all AMD users, everyone optimizes for Intel.

  3. #753
    Xtreme Enthusiast
    Join Date
    Sep 2007
    Location
    Coimbra - Portugal
    Posts
    699
    Hi I don't know if you happen to drop by or if you did get my email, anyhow I'm happy to see you around.

    You answered almost everyone of the questions I posed on the email, except for the part I asked if you intend to start releasing beta versions to the public, or particular beta testers. If I can do something to help with the code for Windows 8 I've it running in two machines and I might be of use.

    best regards !

  4. #754
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by CRFX View Post
    The sad reality of all AMD users, everyone optimizes for Intel.
    TBH, I wrote the FMA and XOP code-paths long before Bulldozer was released. Both would be available in 256-bit, including both FMA3 and FMA4. I had no idea that 256-bit would actually be worse than 128-bit.

    When people started leaking benchmarks on the ES Bulldozers, I was extremely surprised that 256-bit AVX was worse than 128-bit SSE3. But the code had already been written (and tested via emulation). I figured AMD will eventually have native 256-bit execution units in the future. So I don't exactly want to waste any effort "backwards optimizing".

    We'll see. It's been a long time since I've looked at the relevant code. So I can't say exactly how hard it would be to branch the 256-bit XOP code-path and modify it 128-bit.

    That said, making and maintaining a lot of codepaths is a lot of work. So I am trying to strike a balance between optimizing for everything, and maintainability.

    Right now, the code-paths I have for v0.6.1 are:
    • x86 (I will most likely be disabling this and removing support for it completely.)
    • x86 SSE3
    • x64 SSE3 ~ Kasumi (AMD K10)
    • x64 SSE4.1 ~ Nagisa (Intel Core 2)
    • x64 SSE4.1 ~ Ushio (Intel Nehalem)
    • x64 AVX ~ Hina (Intel Sandy Bridge)
    • x64 XOP ~ ??? (AMD Bulldozer)
    • x64 AVX2 ~ ??? (Intel Haswell)


    *AVX2 implies FMA3. These have 256-bit code-paths and are together under the same code-path.
    *XOP implies FMA4. These are also 256-bit code-paths and also together under the same codepath.
    *The AVX2 code-path doesn't actually use any AVX2 yet (just FMA3). But I'm labeling it AVX2 so that I can add them later without needing to "upgrade" it from FMA3.

    In any case, it's a lot of code-paths that I'm maintaining. x86 will mostly likely be dropped for technical reasons. I would have dropped x64 SSE4.1 ~ Nagisa (Core 2) a long time ago if it weren't for my X5482 machine.

    As I had mentioned, the FMA3/AVX2 and FMA4/XOP code-paths will come when:
    1. v0.6.1 is ready.
    2. I get my hands on the needed hardware.


    v0.6.1 (with everything up to AVX) will be released when it's ready regardless of whether I get the hardware for AVX2 and XOP.


    Quote Originally Posted by st0ned View Post
    Hi I don't know if you happen to drop by or if you did get my email, anyhow I'm happy to see you around.

    You answered almost everyone of the questions I posed on the email, except for the part I asked if you intend to start releasing beta versions to the public, or particular beta testers. If I can do something to help with the code for Windows 8 I've it running in two machines and I might be of use.

    best regards !
    Your email was mainly what promoted me to check on this thread.

    Yes, I'll be doing a public beta once all the important features are in. Right now, v0.6.1 is barely even functional. It has just enough functionality to test the core math-library.

    Once swap mode and validation is done, I'll be releasing a public beta. Only then will I start searching for a Haswell and Bulldozer machine to test/tune their code-paths.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  5. #755
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Just a quick update. I've updated the charts on my website. But I haven't updated them here on XS. I'll get to that later.

    Let me know if I missed your submission. It's been a long time since I'm updated the charts (since I've been busy). So I easily could have missed someone.

    In any case, there's someone floating around with a dual-sandy, 256 GB, and a bunch of SSDs...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #756
    Registered User
    Join Date
    Aug 2012
    Posts
    70
    Thanks Alex for updating your lists.

    As I am the one "floating" around :-), I thought there might be some interest in the xtremesystems community to share a few words about the 2 systems I built for a personal study project in the last 4 months, which I used here.

    Traditionally and in the last few years, PCs grew their computational capabilities faster than any other part of the overall system. Main memory bandwidth grew slower than mips and mflop/s. I/O latency and bandwidth advances grew significantly slower than sheer compute power, etc .... So, from an I/O perspective, systems became more and more "unbalanced" for data intensive workloads. Yet the amount of data to be processed grew at least as fast as the development of CPU capabilities. Another imbalance continue to happen was in hard disks. The disk capacity grew faster than sustainable bandwidth and even faster than the reduction in latency.

    When I started the project to built an I/O balanced workstation, I took advantage of - from my POV - a few significant developments in recent months, which in combination allow much improved data handling improvements. I'd like to list just 4 components which contribute to the improvements:
    • The new I/O architecture of latest generation Sandy Bridge CPU's, allowing a massive increase in I/O capabilities
    • The latest generation of low-cost SAS/SATA hostbus adapters, which are not impeding the performance of operating parallel SSDs
    • The performance characteristics of indiv?dual SSDs are well known. But there is much less experience in parallel configurations in PCs with an I/O speed of over 20 GB/sec
    • The final availability of Windows Server 2012 with a much improved I/O and networking subsystem


    This is not the place to go deeper in I/O, but I used the PCs to let it compute Alex Yee's excellent ycrunch application as a background task. (All runs in memory were on an otherwise idle machine)
    Not only is ycrunch well optimized on the computational front, but its I/O subsection is able to hit peak transfer rates of 12 GB/sec and more.

    I am currently writing a paper (mostly on weekends) to describe the systems and its performance characteristics of the HW and SW environment in more detail, but a few words about the 2 systems which are in a kind of constant reconfiguration state:

    1) The single socket PC
    CPU: i7-3960K
    MB: Asus P9-X78 WS
    Mem: 8 x 8 GB Kingston DDR3-1600
    Disk controller: 4 x LSI 9207-8i (each with 8 x 6GBit/s SAS/SATA ports)
    Data SSDs: 32 x Samsung 830 (128GB)
    OS: SanDisk SSD 240 GB

    2) The dual socket PC
    CPU: 2 x E5-2687W
    MB: Asus Z9PE-D16 (4 x GBit LAN ports)
    Mem: 16 x 16 GB Kingston ECC DDR3-1600
    Disk controller: 6 x LSI 9207-8i (ea. 8x SATA/SAS ports)
    Data SSD: 48 x Samsung 830 (128GB)
    OS: SanDisk SSD 480GB

    Disk controllers and data SSDs are shared betwen the 2 PCs, depending on requirements.


    Some comments on the numbers and observations during the runs:
    1. "Small" sizes of Pi (below 100m) achieve better performance when HT is disabled
    2. Overall efficiency expressed as % of peak is more challenging on NUMA machines (vs. single socket machines with one physical memory space)
    3. The 1 trillion pi run generated close to 500 TB of data transfer (avg 725 MB/sec over the total run time and > 12 GB/sec peak)
    4. The Sandy Bridge architecture is an excellent platform for high I/O apps (either dedicated I/O application, or as part of a combined compute/IO application like ycrunch)
    5. The new generation of low cost SAS HBA controllers offer much better scaling than previous generation controllers
    6. As said, the machines are in a constant flux of configurations. The runs were done with I/O system configurations ranging from 0 to 48 SSDs
    7. Long running applications like ycrunch with algorithmic error detection show the value of ECC in RAM
    8. To keep the CPUs safe with this computationally demanding application, I ran them below 60 degree Celcius.


    I've tried to aggregate the data in the list below as accurately as possible, please let me know of any potential error.

    With that, thanks to Alex for his great application, and to all community members, enjoy the fascinating world of computing,
    Andy


    PS:
    In the spirit of transparency and as I mentioned a product of my employer.
    In my day job, I am currently working as Regional Technology Officer in Microsoft's field organisation in Western Europe.

    Full size to download



    Depending of the state of the application, the 16 physical cores (plus HT) were quite busy


    During I/O intense times, the CPU graphs look differently


    One snapshot while the application was writing faster than 12 GB/sec
    Last edited by Andreas; 11-19-2012 at 11:13 AM.

  7. #757
    Xtreme Enthusiast
    Join Date
    Sep 2007
    Location
    Coimbra - Portugal
    Posts
    699
    Beast system there ! Would you care to post just a bechmark of that 48x samsung array ? I just wanted to see the scaling and global performance out of curiosity.

  8. #758
    Registered User
    Join Date
    Aug 2012
    Posts
    70
    Quote Originally Posted by st0ned View Post
    Beast system there ! Would you care to post just a bechmark of that 48x samsung array ? I just wanted to see the scaling and global performance out of curiosity.
    The peak transfer rates are (measured with IOMeter):
    Read: 20 GB/sec (out of 25 GB/sec theoretical max). CPU load: 2%
    Write: 15 GB/sec (out of 15 GB/sec max)

    IOPS (I/O operations per second):
    2,2 million I/O with 4 KB sector size = 8.6 GB/sec transferrate with random access
    This level of performance is primarily limited by the CPUs, not I/O.
    Next barrier would be the QPI interconnect between the 2 CPUs, then the 6 SATA controllers, then the 48 Samsung drives and lastly the PCIe subsystem.


    I'll provide more background info and configuration information in the paper.

    kind regards,
    Andy

  9. #759
    Xtreme Enthusiast
    Join Date
    Sep 2007
    Location
    Coimbra - Portugal
    Posts
    699
    Thanks for your time Andreas, I'm looking forward to read your paper!

    As I though that array performance is incredible, moreover achieving 20GB/sec out of the theoretical 25 is good scaling if we consider the number of SSDs involved. Do have any overheating problems with your raid controllers ?
    Another thing that I find amusing is that you should have already or you are very close to overpass your ram raw write/read speed with your SSDs, although I recognize that ram access times should be 1/10th of those of the SSDs


    regards,

    Miguel

  10. #760
    Registered User
    Join Date
    Aug 2012
    Posts
    70
    Miguel,
    there were quite a lot of interesting things I could learn via this project, like saturation levels, component selection, etc ...
    To avoid a further off-topic deviation in this thread on Pi, I'll open up a new one so we have more space to discuss and other people potentially interested in this topic of high performance I/O can join as well.

    regards,
    Andy

  11. #761
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Got a new laptop:



    This thing actually beats my i7 920 @ 3.5 GHz... by a fairly large margin too...


    That aside, I'm starting to realize that what's left of v0.6.1 is turning out to be messier than I thought. So I plan on doing an early release of v0.6.1 without the majority of the intended features, but enough to do ram-only benchmarking.

    I don't have a time-frame for this yet. Benchmark Mode and Validation still need to be done, but they aren't hard.

    I've updated the y-cruncher homepage with some more details on the final progress of v0.6.1. The fact that it's complete enough to do ram-only Pi benchmarks means that it's already complete enough for the majority of casual benchmarkers. So I won't make them wait any longer.

    The high-end swap-mode capability (with the VST algorithm) is still a work-in-progress. So those waiting for that will have to wait longer if they want to play with anything more than just square roots.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  12. #762
    Cyras
    Guest
    My System:

    Intel Xeon E5-2687W @ 3,6GHz
    Computation Time: 41.916 seconds
    Total Time: 45.870 seconds

    CPU Utilization: 1366.01 %
    Multi-core Efficiency: 85.37 %


  13. #763
    Registered User
    Join Date
    Jan 2008
    Posts
    37
    Seems legit. :p


    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2011 Alexander J. Yee    ( a-yee@u.northwestern.edu )
    
    
    User:                  "Username.txt" Not found.
    
    
    Processor(s):          AMD FX(tm)-8350 Eight-Core Processor 
    Logical Cores:         8
    Physical Memory:       34,277,138,432 bytes  ( 32.0 GB )
    CPU Frequency:         4,966,812,719 Hz
    
    Program Version:       0.5.5 Build 9180 (fix 2) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        250,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        8 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        1.26 GB
    
    Start Date:            Thu Jan 31 19:52:58 2013
    End Date:              Thu Jan 31 19:54:21 2013
    
    Computation Time:      18,446,744,073,709,551,612.000 seconds
    Total Time:            3.348 seconds
    
    CPU Utilization:           4294953646.4294967261 %
    Multi-core Efficiency:     4294965590.4294967267 %
    
    Last Digits:
    3673748634 2742427296 0219667627 3141599893 4569474921  :  249,999,950
    9958866734 1705167068 8515785208 0067520395 3452027780  :  250,000,000
    
    Timer Sanity Check:        Failed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   1d89c9bd846a006b012285b80aef0b399978278f8a7958e7151bddf94081ce3c

  14. #764
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by CRFX View Post
    Seems legit. :p


    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2011 Alexander J. Yee    ( a-yee@u.northwestern.edu )
    
    
    User:                  "Username.txt" Not found.
    
    
    Processor(s):          AMD FX(tm)-8350 Eight-Core Processor 
    Logical Cores:         8
    Physical Memory:       34,277,138,432 bytes  ( 32.0 GB )
    CPU Frequency:         4,966,812,719 Hz
    
    Program Version:       0.5.5 Build 9180 (fix 2) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        250,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        8 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        1.26 GB
    
    Start Date:            Thu Jan 31 19:52:58 2013
    End Date:              Thu Jan 31 19:54:21 2013
    
    Computation Time:      18,446,744,073,709,551,612.000 seconds
    Total Time:            3.348 seconds
    
    CPU Utilization:           4294953646.4294967261 %
    Multi-core Efficiency:     4294965590.4294967267 %
    
    Last Digits:
    3673748634 2742427296 0219667627 3141599893 4569474921  :  249,999,950
    9958866734 1705167068 8515785208 0067520395 3452027780  :  250,000,000
    
    Timer Sanity Check:        Failed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   1d89c9bd846a006b012285b80aef0b399978278f8a7958e7151bddf94081ce3c
    LOL! What did you do?

    At least the timer sanity check caught it. So it isn't completely broken.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  15. #765
    Xtreme Addict
    Join Date
    Feb 2007
    Posts
    1,674
    Is y-cruncher cpu utilization supposed to drop between iterations/switching tests for the stress tester?

    I just tested, it appears coredamage < linpack < y-cruncher < prime95 small fft < prime95 large fft < prime95 blend in terms of temperature. Not really what I expected.

    Note, prime95 v27 has an avx x64 binary now.
    Last edited by Boogerlad; 02-20-2013 at 05:35 PM.

  16. #766
    Registered User
    Join Date
    Jan 2008
    Posts
    37
    I use y-cruncher to test the how stable my system is when I'm overclocking. When it starts displaying errors then I know it's not stable.
    If I change the cpu multiplier or memory timings while y-cruncher is running, it does some weird things.

  17. #767
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Boogerlad View Post
    Is y-cruncher cpu utilization supposed to drop between iterations/switching tests for the stress tester?

    I just tested, it appears coredamage < linpack < y-cruncher < prime95 small fft < prime95 large fft < prime95 blend in terms of temperature. Not really what I expected.

    Note, prime95 v27 has an avx x64 binary now.
    Yes it should drop between tests. But only for a split-second - most of the time it isn't even noticeable.
    That's because it kills off the old threads for the old task and recreates them for the next test.

    If prime95 has AVX now, then that would easily explain why it runs hotter. It's hand-assembly optimized - not something I'd like to attempt myself.

    In the y-cruncher v0.6.1 stress-test:
    • BKT - Doesn't run hot at all.
    • FFT - Doesn't run hot at all either.
    • HNT - The temperatures here will fluctuate between cold and hot. This is because the algorithm has "phases". If the test is large enough such that each phase takes more than a few seconds, you will notice these fluctuations.
    • VST - This one is the killer. And on all my machines it runs much hotter than the "hot" phase of the HNT test.

    So if you want to test just for heat, just run the 4th test (VST) and disable all the others.


    Quote Originally Posted by CRFX View Post
    I use y-cruncher to test the how stable my system is when I'm overclocking. When it starts displaying errors then I know it's not stable.
    If I change the cpu multiplier or memory timings while y-cruncher is running, it does some weird things.
    It probably messed up the CPU's internal clock.

    The numbers clearly show that something made the clocks go backwards. And of course that leads to all sorts of integer-overflow crap when casting negative numbers into unsigned integers...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  18. #768
    Xtreme Addict
    Join Date
    Feb 2007
    Posts
    1,674
    AFAIK, Prime95 only tests the fpu, ram, and memory controller under "blend". This cannot guarantee stability though, because the integer units are untouched. Thus hotter != more stressful? It would still be necessary to run the BKT test right?

  19. #769
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Boogerlad View Post
    AFAIK, Prime95 only tests the fpu, ram, and memory controller under "blend". This cannot guarantee stability though, because the integer units are untouched. Thus hotter != more stressful? It would still be necessary to run the BKT test right?
    • BKT is pure integer work. No memory, almost entirely L1 cache.
    • HNT is a mix of everything from integer, floating-point, cache, memory...


    I can't say how well each test stresses the integer units. (there's nothing to compare against)
    But if you want to stress the integer units, then run both tests overnight.

    The old stress-tester in v0.5.5 and earlier has a bit of everything (BKT, FFT, HNT). But the HNT component is dominant. (Note that VST is new to v0.6.1.)


    Going back to floating-point, if anyone here is a programmer and has experience compiling code, you might want to check out my answer to this Stack Overflow question. It has a fairly strong heat generator that you might want to play with.

    I can't say whether it'd make a good floating-point stress-test since it doesn't check to see if the calculation is correct. I wrote it a year ago to measure heat, not to stress-test.
    Last edited by poke349; 03-02-2013 at 05:40 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  20. #770
    Xtreme Addict
    Join Date
    Feb 2007
    Posts
    1,674
    Nice! I frequent Stack Overflow too! Off topic, but do you prefer while loops over for loops? I was reading over your code there and you used exclusively whiles.

  21. #771
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Boogerlad View Post
    Nice! I frequent Stack Overflow too! Off topic, but do you prefer while loops over for loops? I was reading over your code there and you used exclusively whiles.
    It's an old habit I've had since I first started programming - which got reinforced by the fact that C89/90 doesn't allow declarations inside the for-loop statement.

    I'm slowing trying to break that habit. Most of the new code now will properly use for-loops when it makes sense to. But this was only a very recent effort. And 99% of y-cruncher source code is done using while and do-while loops. (do-while loops are used mostly in performance critical code where that extra loop test actually matters)
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  22. #772
    Registered User
    Join Date
    Aug 2008
    Posts
    12
    Validation Version: 1.2

    Program: y-cruncher - Gamma to the eXtReMe!!! ( www.numberworld.org )
    Copyright 2008-2013 Alexander J. Yee ( a-yee@u.northwestern.edu )


    User: None Specified - You can edit this in "Username.txt".


    Processor(s): Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
    Logical Cores: 4
    Physical Memory: 3,101,306,880 bytes ( 2.88 GB )
    CPU Frequency: 3,300,042,752 Hz

    Program Version: 0.6.1 Build 9282 Pre-Alpha (x64 SSE4.1 - Windows ~ Ushio)
    Constant: Pi
    Algorithm: Chudnovsky
    Decimal Digits: 25,000,000
    Hexadecimal Digits: Disabled
    Threading Mode: 4 threads
    Computation Mode: Ram Only
    Working Memory: 120 MB

    Start Date: Fri Mar 08 15:08:49 2013
    End Date: Fri Mar 08 15:08:53 2013

    Computation Time: 4.458 seconds
    Total Time: 4.809 seconds

    CPU Utilization: 336.282 %
    Multi-core Efficiency: 84.070 %

    Last Digits:
    3803750790 9491563108 2381689226 7224175329 0045253446 : 24,999,950
    0786411592 4597806944 2455112852 2554677483 6191884322 : 25,000,000

    Timer Sanity Check: Passed
    Frequency Sanity Check: Disabled in this version of y-cruncher
    ECC Recovered Errors: 0

    Event Log:
    Fri Mar 08 15:08:49 2013 0.000 Reserving Working Memory
    Fri Mar 08 15:08:49 2013 0.020 Constructing Twiddle Tables
    Fri Mar 08 15:08:49 2013 0.022 Allocating I/O Buffers
    Fri Mar 08 15:08:49 2013 0.022 Begin Computation
    Fri Mar 08 15:08:49 2013 0.022 Series: ( 22 ) 0.000%
    Fri Mar 08 15:08:49 2013 0.027 Series: ( 21 ) 0.051%
    Fri Mar 08 15:08:49 2013 0.027 Series: ( 20 ) 0.071%
    Fri Mar 08 15:08:49 2013 0.028 Series: ( 19 ) 0.101%
    Fri Mar 08 15:08:49 2013 0.029 Series: ( 18 ) 0.142%
    Fri Mar 08 15:08:49 2013 0.030 Series: ( 17 ) 0.200%
    Fri Mar 08 15:08:49 2013 0.032 Series: ( 16 ) 0.282%
    Fri Mar 08 15:08:49 2013 0.035 Series: ( 15 ) 0.398%
    Fri Mar 08 15:08:49 2013 0.038 Series: ( 14 ) 0.560%
    Fri Mar 08 15:08:49 2013 0.043 Series: ( 13 ) 0.789%
    Fri Mar 08 15:08:49 2013 0.050 Series: ( 12 ) 1.111%
    Fri Mar 08 15:08:49 2013 0.061 Series: ( 11 ) 1.564%
    Fri Mar 08 15:08:49 2013 0.077 Series: ( 10 ) 2.203%
    Fri Mar 08 15:08:49 2013 0.129 Series: ( 9 ) 3.103%
    Fri Mar 08 15:08:49 2013 0.162 Series: ( 8 ) 4.370%
    Fri Mar 08 15:08:49 2013 0.213 Series: ( 7 ) 6.156%
    Fri Mar 08 15:08:49 2013 0.287 Series: ( 6 ) 8.673%
    Fri Mar 08 15:08:49 2013 0.405 Series: ( 5 ) 12.225%
    Fri Mar 08 15:08:49 2013 0.567 Series: ( 4 ) 17.240%
    Fri Mar 08 15:08:49 2013 0.808 Series: ( 3 ) 24.332%
    Fri Mar 08 15:08:50 2013 1.139 Series: ( 2 ) 34.386%
    Fri Mar 08 15:08:50 2013 1.648 Series: ( 1 ) 48.698%
    Fri Mar 08 15:08:51 2013 2.397 Series: ( 0 ) 69.250%
    Fri Mar 08 15:08:52 2013 3.546 Finishing Series
    Fri Mar 08 15:08:52 2013 3.556 Division
    Fri Mar 08 15:08:52 2013 3.798 InvSqrt
    Fri Mar 08 15:08:53 2013 3.961 Final Multiply
    Fri Mar 08 15:08:53 2013 4.068 Base Converting
    Fri Mar 08 15:08:53 2013 4.481 Writing Decimal Digits
    Fri Mar 08 15:08:53 2013 4.642 Verifying Base Conversion
    Fri Mar 08 15:08:53 2013 4.809 End Computation

    ----

    Checksum: 86e52d9bf387eef30cf7a56196db70f25f94be35dfad1e160e c560c08185df17
    I5 2500k
    ASRock Z77 OC Formula
    Thermaltake Kandalf
    Swiftech 320 xp
    Koolance pmp500
    Apogee GTZ CPU Block.
    Kingston HyperX 2133 at 9-11-9-27 1t
    Crossfire 6970's (soon to be on water)
    1m Superpi 7.595s

  23. #773
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Sent you some pics..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  24. #774
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Version v0.6.2 is out!

    Now with the long overdue swap modes...

    Ridiculous times on a ridiculous computer: 100 billion digits in under 7 hours.



    Specs:

    • 2 x Intel Xeon E5-2690W Sandy Bridge-EP
    • 128 GB ram
    • 16 x 3 TB swap HDs

    Credit: Shigeru Kondo
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  25. #775
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    to a true gentleman..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

Page 31 of 33 FirstFirst ... 21282930313233 LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •