Page 32 of 33 FirstFirst ... 222930313233 LastLast
Results 776 to 800 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #776
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    A bump after a long time...

    Here's a screenshot from a binary tuned for AMD Bulldozer using FMA4 and XOP instructions.
    AMD FX-8350 @ 4.0 GHz (stock) with 16 GB @ 1333 MHz:



    It's not close to done yet as the large size algorithms still need to be re-tuned.
    I plan to release this binary in v0.6.4. But it may come earlier (in v0.6.3) if the stuff that's supposed to be in v0.6.3 drags on too long.

    If all goes well (which it never does), v0.6.3 is ETA: late December. v0.6.4 in January.

    In the meantime, if you have a Bulldozer machine, I highly recommend running the "x64 SSE3 ~ Kasumi" binary instead of what the program auto-selects (which is "x64 AVX ~ Hina"). I've found Bulldozer's 256-bit AVX performance to be pretty crappy.
    The author of Prime95 explains in this link: http://www.mersenneforum.org/showthread.php?t=17618
    And as such, the FMA4/XOP binary will use 128-bit AVX, FMA4, and XOP instructions.

    If the FMA4/XOP binary doesn't make it into v0.6.3, I'll have the version-selector choose "x64 SSE3 ~ Kasumi" instead "x64 AVX ~ Hina" for AMD Bulldozer line processors.

    Other news: I burned my Sandy Bridge machine last week.
    A careless short-circuit took out the motherboard and possibly the CPU as well. So I will no longer be able to do performance tuning for the "x64 AVX ~ Hina" binary. The binary will remain (for a while), but all the tuning parameters can no longer be updated.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  2. #777
    Xtreme Cruncher
    Join Date
    Jun 2005
    Location
    Northern VA
    Posts
    1,285
    Hey Poke, nice to see your still working on this ive got a question for ya. what do you think of a C6100 8xL5520@2.6 with 96gb of ram ive got one inbound for a late Dec delivery and wanted to have some fun with it before i deidicate it to 24/7 boinc
    Its not overkill if it works.


  3. #778
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by skycrane View Post
    Hey Poke, nice to see your still working on this ive got a question for ya. what do you think of a C6100 8xL5520@2.6 with 96gb of ram ive got one inbound for a late Dec delivery and wanted to have some fun with it before i deidicate it to 24/7 boinc
    Wow... Is that an 8-socket I'm seeing?
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  4. #779
    Xtreme Cruncher
    Join Date
    Jun 2005
    Location
    Northern VA
    Posts
    1,285
    its actually 4 dual socket nodes that all fit in a 2u rack

  5. #780
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by skycrane View Post
    its actually 4 dual socket nodes that all fit in a 2u rack
    That will be interesting. Especially since the NUMA affect will be extreme.

    There's one NUMA friendly algorithm in the program. But it's activated only above 50 billion digits since it is slow. So if you're willing to toy around a bit, I can send you a version with the threshold dropped to say 1 billion to see if it does any better than what's available on my website right now.

    At some point in the future, I intend to make this threshold adjustable by the user, but it's not that easy... Right now it needs to be hard-coded into the program and recompiled.
    Last edited by poke349; 12-10-2013 at 05:38 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #781
    Xtreme Cruncher
    Join Date
    Jun 2005
    Location
    Northern VA
    Posts
    1,285
    yea that sounds good it would be fun to try it out. now how all this stuff works im a bit clueless with it. but could we set up a remote login so you can do your magic with the programing? just as long as you dont mess up my boinc workunits lol

    also, i was wondeirng about the lan conections. i was hoping to get 3 of them and inifiniband them all together as a nice lil cluster, but it looks like each node on all 3 racks will need add in card. witch might be a bit cost prohibitive... lol do you think that a dual Gbit connected to a switch will be fast enough to feed the info between all of the nodes
    Last edited by skycrane; 12-15-2013 at 03:53 PM.

  7. #782
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by skycrane View Post
    yea that sounds good it would be fun to try it out. now how all this stuff works im a bit clueless with it. but could we set up a remote login so you can do your magic with the programing? just as long as you dont mess up my boinc workunits lol

    also, i was wondeirng about the lan conections. i was hoping to get 3 of them and inifiniband them all together as a nice lil cluster, but it looks like each node on all 3 racks will need add in card. witch might be a bit cost prohibitive... lol do you think that a dual Gbit connected to a switch will be fast enough to feed the info between all of the nodes
    It would probably depend on how fast the Infiniband is. For a system of this calibur you'd gonna need at least 20 GB/s of sustained bandwidth to have any hope of being able to use it efficiently as shared memory. I'm also unsure of how the high latency is going to play out. Perhaps HyperThreading will be able to cover up most of those delays. I don't know though.

    Lemme know when it's ready so I can send you a binary with the high-end algorithm threshold dropped to 1 billion (or even lower). If the performance scaling turns out to be okay on two motherboards, then you can try going higher. That NUMA-friendly algorithm is NUMA friendly because it's heavily optimized to simply not use memory until it's absolutely needed. But it isn't actually "aware" of the NUMA. By comparison, most of the algorithms thrash memory all over the place.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  8. #783
    Xtreme Cruncher
    Join Date
    Jun 2005
    Location
    Northern VA
    Posts
    1,285
    damn it needs to be that fast??? so i would need what a dual 10gbt card in each node. then connect all 8 wires to the switch?
    did you get my pm? i havent gotten any offers on anything, maybe they are all to expensive?? but its yours untill i can sell it. then ill let you do some remote login work for your classes, and to see what magic you can work with the NUMA programing when you have access to this for testing purposes.. hehe

    what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
    would be better for what i have inmind?
    Its not overkill if it works.


  9. #784
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by skycrane View Post
    damn it needs to be that fast??? so i would need what a dual 10gbt card in each node. then connect all 8 wires to the switch?
    did you get my pm? i havent gotten any offers on anything, maybe they are all to expensive?? but its yours untill i can sell it. then ill let you do some remote login work for your classes, and to see what magic you can work with the NUMA programing when you have access to this for testing purposes.. hehe

    what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
    would be better for what i have inmind?
    Yeah it kinda does. At least enough to match the internal bandwidth or the socket <-> socket connection. That's the problem when you try to use distributed memory like shared memory. Latencies can be hidden pretty well with HT and good cache locality, but not bandwidth.

    FWIW, we had 2 GB/s of just disk bandwidth when we did the 10 trillion digit computation of Pi. Not only was it severely limiting, but the program is specifically optimized for using disk.
    There is somewhat of a fundamental problem though: The FFT algorithm requires very high Bisection Bandwidth to run efficiently.
    Of course this doesn't exist - even on the best connected super-computers. So the efficiency is extremely poor on them. (even with specialized distributed implementations)

    That's not to say I can't find a way to do any better. But I have a full-time job now and I don't have as much time as I used to.

    but its yours untill i can sell it.
    I would feel pretty bad taking another machine from you. I also kind of broke the promise of putting the quad Opteron on WCG. I had it running for a few months, then I realized that I had no way to monitor the heath the machine. (with Summer approaching) So I took it off and used it only for things that needed the NUMA. (to preserve the operational life) So that's how it is right now. It's off most of the time, but every once in a while, I'll boot it up to run some scalability testing.

    what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
    would be better for what i have inmind?
    That looks really cheap for a Sandy Bridge blade. (if I'm reading it right) I would imagine that simple high-end desktops (OCed) would be the cheapest and most power-efficient approach for truly distributed tasks that require little communication. The main reason why you would go with multi-socket boards is to get fast bandwidth between the two chips. But I guess that's not the case here.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  10. #785
    Xtreme Addict
    Join Date
    Sep 2010
    Location
    US, MI
    Posts
    1,680
    Hi, I recently lossed my array and ran chkdsk midway when recovering it, screwing up a dozen or so programs doing that.
    One of them was the copy of y cruncher I had.

    None of the newer ver's work on my config.
    I know the exact zip name of the one I need too but it's not online anymore.
    y-cruncher v0.5.4.9148 (fix 1).zip

    The newer ones just crash, program has stopped working error... (I don't know exactly which ver this started with, I ran 2 other ver's when I 1st got my r4be board a few weeks ago and they both errored out on startup)
    And yes I installed both the x86 and x64 packs of vc2010.

    I don't have any games installed right now other then some ps2 games, I wanted to test 4.3ghz on my cpu, I think it's stable on stock volts with the pll overvoltage enable setting.
    Plan was to run y-cruncher in the bg for 8hrs+ while I watch fma brotherhood, seems I better just reset my pc back to 4.2ghz and wait it out for now lol.


    Update:
    If I use y-cruncher.exe from the older ver I had, and the files in the binaries folder form the new ver (this is what I was missing), it runs.
    It will not run with the newer y-cruncher.exe file though.

    However as much as I'de like it to be all good, it's not quite what I want anymore as a stress teting program.
    It stresses the cpu a bit to much, I can't game with this while it's in the bg.
    Heck I can't do anything while it's running, it lags my mouse so much...
    My mouse lags on the intel setup when the cpu is stressed passed 90% or so, I notcied this when I 1st got the board but didn't understand what was going on, it was only the other day when messing with avisynth 2.6 mt that I knew for sure what caused the mouse lag.
    (Anything past say 90% usage causes it to be slower then say 80%, again noticed this in avisynth, yeah I can't use the last 10% of my cpu without everything slowing down to crap)
    10 Threads works out fine but still...

    I prefer the algo's used by the older ver I had .
    Otherwise I just don't have any use for this program anymore sorry .
    You don't happen to have an old copy of "y-cruncher v0.5.4.9148 (fix 1).zip" lying around do ya?
    I found that ver useful...
    No offense intended.

    Hmm, if I set it to 7gb, 10 threads, and disable all the tests except vst it might be of some use to me, for cpu.
    Fft might be useful for mem, I don't know what hnt is though.
    Wish it supported a cmd tail though, ohwell.
    Hmm :\.
    Last edited by NEOAethyr; 12-31-2013 at 12:10 PM.

  11. #786
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by NEOAethyr View Post
    Hi, I recently lossed my array and ran chkdsk midway when recovering it, screwing up a dozen or so programs doing that.
    One of them was the copy of y cruncher I had.

    None of the newer ver's work on my config.
    I know the exact zip name of the one I need too but it's not online anymore.
    y-cruncher v0.5.4.9148 (fix 1).zip
    You can get the older versions here: http://www.numberworld.org/y-cruncher/versions.html

    The newer ones just crash, program has stopped working error... (I don't know exactly which ver this started with, I ran 2 other ver's when I 1st got my r4be board a few weeks ago and they both errored out on startup)
    And yes I installed both the x86 and x64 packs of vc2010.
    That should not happen. And I haven't received any other reports of this issue. Do you have a screenshot of it or something? It's hard to say what's wrong since I've never seen it before.

    I don't have any games installed right now other then some ps2 games, I wanted to test 4.3ghz on my cpu, I think it's stable on stock volts with the pll overvoltage enable setting.
    Plan was to run y-cruncher in the bg for 8hrs+ while I watch fma brotherhood, seems I better just reset my pc back to 4.2ghz and wait it out for now lol.


    Update:
    If I use y-cruncher.exe from the older ver I had, and the files in the binaries folder form the new ver (this is what I was missing), it runs.
    It will not run with the newer y-cruncher.exe file though.
    That's interesting. How are you running it? Double-click? Command line?

    However as much as I'de like it to be all good, it's not quite what I want anymore as a stress teting program.
    It stresses the cpu a bit to much, I can't game with this while it's in the bg.
    Heck I can't do anything while it's running, it lags my mouse so much...
    My mouse lags on the intel setup when the cpu is stressed passed 90% or so, I notcied this when I 1st got the board but didn't understand what was going on, it was only the other day when messing with avisynth 2.6 mt that I knew for sure what caused the mouse lag.
    (Anything past say 90% usage causes it to be slower then say 80%, again noticed this in avisynth, yeah I can't use the last 10% of my cpu without everything slowing down to crap)
    10 Threads works out fine but still...

    I prefer the algo's used by the older ver I had .
    Otherwise I just don't have any use for this program anymore sorry .
    You don't happen to have an old copy of "y-cruncher v0.5.4.9148 (fix 1).zip" lying around do ya?
    I found that ver useful...
    No offense intended.

    Hmm, if I set it to 7gb, 10 threads, and disable all the tests except vst it might be of some use to me, for cpu.
    Fft might be useful for mem, I don't know what hnt is though.
    Wish it supported a cmd tail though, ohwell.
    Hmm :\.
    No offense taken. It's not uncommon for stress-tests to be "too much" for a computer. (especially laptops)
    Last edited by poke349; 01-02-2014 at 12:25 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  12. #787
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Some updates on v0.6.4...
    • Back in November when I "plugged in" my pre-written FMA4 instruction macros, the performance gain on AMD Piledriver actually negative. Some 10 - 20% slower. This was because of the 256-bit AVX that AMD Bulldozer and Piledriver can't handle well.
    • So I spent some time re-working the FMA4 and XOP code to use 128-bit instead. This got to a 2% improvement over the SSE3 binary. Pathetic, but at least it's positive.
    • After rewriting my auto-tuner and running it on my FX-8350, I got the improvement up to 5%.
    • With more on-and-off tweaking, I've gotten it up to 7 - 8%.

    I'm going to leave it at that. I've run out of things to tweak and I'd rather move on to AVX2 for Haswell. 7% is actually quite a large improvement for a new instruction set that doesn't double the vector width.

    Here are the benchmarks for v0.6.4 on a stock FX8350 with 16 GB @ 1333 MHz. (For some reason, this machine resisted all attempts to overclock it. As soon as I take it off of "Auto" settings, the memory goes unstable. And I haven't had the time to really mess with it.)

    x86 x86 SSE3 x64 SSE3 x64 SSE4.1 x64 AVX x64 XOP
    25m 27.018 13.704 6.544 7.37 9.128 7.207
    50m 45.746 24.635 13.734 14.771 18.678 13.908
    100m 87.906 47.453 28.336 29.467 37.52 27.797
    250m 224.24 117.473 76.576 78.533 103.326 71.436
    500m 166.859 170.067 225.879 153.344
    1b 376.746 382.436 503.946 338.529
    2.5b 1085.634 1132.52 1396.444 1009.923

    Notes:
    • SSE4.1 is slower than SSE3 because the SSE4.1 binary is specialized for Intel Nehalem. The SSE3 binary is specialized for AMD K10 which Bulldozer/Piledriver seems to like better.
    • AVX is slower than SSE3/4.1 because Bulldozer/Piledriver can't efficiently handle 256-bit AVX instructions.
    • The XOP binary doesn't actually get any faster until you pass a certain size. Without going into details, it was what the auto-tuner chose. (for a valid reason) So I stuck with it.

    I plan on releasing v0.6.4 before Pi day - provided that I don't find any serious bugs by then.
    Last edited by poke349; 02-22-2014 at 01:10 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  13. #788
    Xtreme Member
    Join Date
    Jun 2008
    Posts
    160
    My 25 billion benchmark:


    Code:
    Constant :  Pi
    Algorithm:  Chudnovsky Formula
    
    Decimal Digits    :   25,000,000,000
    Hexadecimal Digits:   Disabled
    
    Threads:    32
    Mode   :    Ram Only
    
    Start Time: Wed Mar 12 18:44:01 2014
    
    Reserving Working Memory...          117 GB
    Constructing Twiddle Tables...      4.38 MB
    Allocating I/O Buffers...           0 bytes
    
    Begin Computation:
    
    Summing Series...  1,762,841,738 terms
    Time:    7730.888 seconds  ( 2.147 hours )
    Division...
    Time:    240.249 seconds  ( 0.067 hours )
    InvSqrt...
    Time:    156.062 seconds  ( 0.043 hours )
    Final Multiply...
    Time:    104.593 seconds  ( 0.029 hours )
    
    Pi:  8231.793 seconds  ( 2.287 hours )
    
    Base Converting:
    Time:    329.628 seconds  ( 0.092 hours )
    
    Writing Decimal Digits:   25,000,000,000  digits written
    
    Verifying Base Conversion...
    Time:    154.667 seconds  ( 0.043 hours )
    
    Start Time: Wed Mar 12 18:44:01 2014
    End Time:   Wed Mar 12 21:11:31 2014
    
    Total Computation Time:             8561.420 seconds  ( 2.378 hours )
    Total Time (with output + verify):  8850.835 seconds  ( 2.459 hours )
    
    CPU Utilization:        1658.35 %
    Multi-core Efficiency:  51.8234 %
    
    Last Digits:  Pi
    2448547079 5329693979 7145627081 9204187454 9483487803  :  24,999,999,950
    1309759846 5364560010 7388984278 8403481193 9913806533  :  25,000,000,000
    
    Version:          0.6.3 Build 9416b (fix 1) (x64 AVX - Linux ~ Hina)
    Processor(s):     Genuine Intel(R) CPU @ 2.60GHz
    Logical Cores:    32
    Physical Memory:  203,221,774,336 (  189 GB )
    CPU Frequency:    2,600,380,032 Hz  (frequency may be inaccurate)
    
    Result File: Validation - Pi - 25,000,000,000.txt
    
    Benchmark Successful. The digits appear to be OK.
    And I started this up:

    I don't know if I will actually let it run though:

    Code:
    Current Settings: (select option # to change setting)
    
      1     Constant:    Pi
      2     Algorithm:   Chudnovsky Formula
    
      3     Decimal Digits:        13,300,000,000,000
      4     Hexadecimal Digits:    11,045,410,915,501
    
      5     Multi-Threading:   32 threads
      6     Write Digits To:   /data/pi
      7     Compress Output:   Yes - Compress digits and split them into multiple
                               files with  100,000,000,000  digits per file.
    
      8     Computation Mode:  Swap Mode
    
      9     View Swap Configuration
     10     Change Swap Configuration
     11     Run I/O Benchmark
    
     12     Min I/O Size:      32.0 MB  per smallest unit. ( 32.0 MB global )
    
     13     Memory Needed:      179 GB  ( Minimum =  156 MB )
            Disk Needed:       70.5 TB  +  10.1 TB for output
    
      0     Start Computation!
    
    option: 0
    
    
    
    Constant :  Pi
    Algorithm:  Chudnovsky Formula
    
    Decimal Digits    :   13,300,000,000,000
    Hexadecimal Digits:   11,045,410,915,501
    
    Threads:    32
    Mode   :    Swap Mode
    
    Start Time: Wed Mar 12 21:30:19 2014
    
    Reserving Working Memory...          179 GB
    Constructing Twiddle Tables...      82.2 MB
    Allocating I/O Buffers...           64.0 MB
    
    Begin Computation:
    
    Summing Series...  937,831,802,335 terms
    Summing: 0%  ( 32 )  -> ( 4,454,943,452 )
    Curious to see how bad performance hits when it starts having to hit the disk on linux. I don't know why CPU usage would have been a problem with O_DIRECT since DD writing even at 1 Gbyte/sec is <20% of a core when using direct I/O.
    Supermicro SC846 Case
    Supermicro X9DR3-LN4F+
    Dual Intel Xeon E5 4650L (8 core, 2.6Ghz, 3.1 Ghz Turbo)
    EVGA Geforce gtx 670
    192GB DDR3 PC-1333 ECC Memory
    ARC-1280ML raid controller
    24x2TB Hitachi SATA (raid6)
    ARC-1880x raid controller
    30x3TB Hitachi SATA (raid6)
    - External in two SC933 Case
    Work/Home:

  14. #789
    Xtreme Member
    Join Date
    Jun 2008
    Posts
    160
    Stopped and ran the I/O performance analsys thingy:

    Code:
    I/O Performance Analysis:
    
    Note that this may take a while depending on your hardware configuration.
    
    Working Memory:      179 GB
    Swap-file Size:      358 GB
    Min I/O Size:       32.0 MB
    Computation Threads:    32
    
    Sequential Write:          852 MB/s
    Sequential Read:          1.74 GB/s
    Threshold Strided Write:   489 MB/s
    Threshold Strided Read:    498 MB/s
    
    Overlapped VST-I/O Ratio: 0.5933
    
    Notes:
    
      - The overall I/O speed is unable to keep up with the CPU(s).
        The I/O throughput is 1.68549x slower than the CPU throughput.
        Large computations will be significantly slowed down by disk access.
        I/O bandwidth can be increased in a number of ways:
          - Add more drives in parallel. This is the obvious way.
            Many machines have 4 or more drives just to run this program!
          - Defragment the drives.
          - Use empty drives. Empty and freshly formatted drives perform best.
    
      - Your threshold non-sequential I/O bandwidth is very high.
        This may cause sub-optimal algorithm selection for large computations.
        The optimal ratio between sequential/non-sequential I/O is about 3 to 1.
        It is recommended to decrease the "Min I/O Size" setting and re-run
        this benchmark.
    
      - Your write bandwidth is significantly lower than your read bandwidth.
        It is recommended to examine your storage configuration if you are
        expecting balanced read/write speeds.
    
    Press ENTER to continue . . .
    These values don't seem bad?
    Supermicro SC846 Case
    Supermicro X9DR3-LN4F+
    Dual Intel Xeon E5 4650L (8 core, 2.6Ghz, 3.1 Ghz Turbo)
    EVGA Geforce gtx 670
    192GB DDR3 PC-1333 ECC Memory
    ARC-1280ML raid controller
    24x2TB Hitachi SATA (raid6)
    ARC-1880x raid controller
    30x3TB Hitachi SATA (raid6)
    - External in two SC933 Case
    Work/Home:

  15. #790
    Xtreme Addict
    Join Date
    Sep 2010
    Location
    US, MI
    Posts
    1,680
    Quote Originally Posted by poke349 View Post
    You can get the older versions here: http://www.numberworld.org/y-cruncher/versions.html



    That should not happen. And I haven't received any other reports of this issue. Do you have a screenshot of it or something? It's hard to say what's wrong since I've never seen it before.



    That's interesting. How are you running it? Double-click? Command line?



    No offense taken. It's not uncommon for stress-tests to be "too much" for a computer. (especially laptops)
    Sorry I kinda forgot about ya, I didn't realize I still had a screenshot on my drive for ya but never posted it.


    Sorry it's not much..

    Edit:
    If I were eventually able to collect up enough screenshots of my 4930k failing on the edge of stability, usually around 4.5hrs..., would you beable to make it so it runs the same "old" test repeatedly so the error can be found faster?
    The newer ver's don't seem to find the error any faster then the older ver that runs perfect (I can launch the newer ones with the old launcher thing).

    Ibt and linx don't detect the error at all, tried 8hrs worth and nothing.

    Anyways I got one screenshot, I've had it fail twice so far I think it was the same test, the 1st time at 4.5hrs.
    Last edited by NEOAethyr; 03-16-2014 at 11:54 AM.

  16. #791
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Yea, after all this time I still own all the records from 25 mil to 5 billion!
    Figured by now someone would have stepped in and booted me out!
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  17. #792
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Sandon View Post
    Stopped and ran the I/O performance analsys thingy:

    Code:
    I/O Performance Analysis:
    
    Note that this may take a while depending on your hardware configuration.
    
    Working Memory:      179 GB
    Swap-file Size:      358 GB
    Min I/O Size:       32.0 MB
    Computation Threads:    32
    
    Sequential Write:          852 MB/s
    Sequential Read:          1.74 GB/s
    Threshold Strided Write:   489 MB/s
    Threshold Strided Read:    498 MB/s
    
    Overlapped VST-I/O Ratio: 0.5933
    
    Notes:
    
      - The overall I/O speed is unable to keep up with the CPU(s).
        The I/O throughput is 1.68549x slower than the CPU throughput.
        Large computations will be significantly slowed down by disk access.
        I/O bandwidth can be increased in a number of ways:
          - Add more drives in parallel. This is the obvious way.
            Many machines have 4 or more drives just to run this program!
          - Defragment the drives.
          - Use empty drives. Empty and freshly formatted drives perform best.
    
      - Your threshold non-sequential I/O bandwidth is very high.
        This may cause sub-optimal algorithm selection for large computations.
        The optimal ratio between sequential/non-sequential I/O is about 3 to 1.
        It is recommended to decrease the "Min I/O Size" setting and re-run
        this benchmark.
    
      - Your write bandwidth is significantly lower than your read bandwidth.
        It is recommended to examine your storage configuration if you are
        expecting balanced read/write speeds.
    
    Press ENTER to continue . . .
    These values don't seem bad?
    An answered most of this in the email reply. But yes, it's an amazing system.

    Quote Originally Posted by NEOAethyr View Post
    Sorry I kinda forgot about ya, I didn't realize I still had a screenshot on my drive for ya but never posted it.


    Sorry it's not much..
    In the first case with the illegal instruction, it appears that you don't have proper operating system support to use AVX instructions. But y-cruncher is mistakenly detecting that it does.
    The proper behavior of the program is to give you a red warning that your OS doesn't support AVX, then fall back to the SSE4.1 version.

    • In v0.6.1 - v0.6.4, the AVX binaries use the Microsoft compiler which does no run-time checking for instruction set compatibility. Since my own check is clearly buggy, it proceeded to crash on an AVX instruction.
    • In v0.5.4 - v0.5.5, the AVX binaries use the Intel Compiler. The Intel compiler does its own compatibility checks and it detects that you don't have proper operating system support. So it refuses to run the AVX binary.

    Question: What OS are you running anyway? And service pack? I'd like to know so I can fix the AVX detection.

    Edit:
    If I were eventually able to collect up enough screenshots of my 4930k failing on the edge of stability, usually around 4.5hrs..., would you beable to make it so it runs the same "old" test repeatedly so the error can be found faster?
    The newer ver's don't seem to find the error any faster then the older ver that runs perfect (I can launch the newer ones with the old launcher thing).

    Ibt and linx don't detect the error at all, tried 8hrs worth and nothing.

    Anyways I got one screenshot, I've had it fail twice so far I think it was the same test, the 1st time at 4.5hrs.
    I don't develop older versions of y-cruncher. For that matter, I don't even fix bugs in the latest version unless they are serious. (since I usually have even newer builds*)
    So if you plan on sticking with v0.5.5, what you have is it. In v0.6.x, the component stress-tester is fully customizable.

    *Hint: My latest developer build has a fully working AVX2 binary...


    Quote Originally Posted by Movieman View Post
    Yea, after all this time I still own all the records from 25 mil to 5 billion!
    Figured by now someone would have stepped in and booted me out!
    Shigeru Kondo sent me some benchmarks a while back. I just haven't updated the charts yet. So I don't remember if they were faster than yours though.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  18. #793
    Xtreme Addict
    Join Date
    Sep 2010
    Location
    US, MI
    Posts
    1,680
    My os is win7 x64 sp1.
    It's just tweaked to heck.

  19. #794
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by poke349 View Post
    An answered most of this in the email reply. But yes, it's an amazing system.



    In the first case with the illegal instruction, it appears that you don't have proper operating system support to use AVX instructions. But y-cruncher is mistakenly detecting that it does.
    The proper behavior of the program is to give you a red warning that your OS doesn't support AVX, then fall back to the SSE4.1 version.

    • In v0.6.1 - v0.6.4, the AVX binaries use the Microsoft compiler which does no run-time checking for instruction set compatibility. Since my own check is clearly buggy, it proceeded to crash on an AVX instruction.
    • In v0.5.4 - v0.5.5, the AVX binaries use the Intel Compiler. The Intel compiler does its own compatibility checks and it detects that you don't have proper operating system support. So it refuses to run the AVX binary.

    Question: What OS are you running anyway? And service pack? I'd like to know so I can fix the AVX detection.



    I don't develop older versions of y-cruncher. For that matter, I don't even fix bugs in the latest version unless they are serious. (since I usually have even newer builds*)
    So if you plan on sticking with v0.5.5, what you have is it. In v0.6.x, the component stress-tester is fully customizable.

    *Hint: My latest developer build has a fully working AVX2 binary...




    Shigeru Kondo sent me some benchmarks a while back. I just haven't updated the charts yet. So I don't remember if they were faster than yours though.
    Well I might have some backups laying around here somewhere!
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  20. #795
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by NEOAethyr View Post
    My os is win7 x64 sp1.
    It's just tweaked to heck.
    Win7 SP1 supports AVX. So my program is properly detecting it.
    But according to this: http://superuser.com/questions/24421...on-my-computer
    It looks like AVX can be enabled and disabled in the OS. I haven't tried it, but it's possible your AVX somehow got disabled. (not sure why anyone/anything would want to do that)

    Either way, it seems that having a capable OS isn't sufficient. I also need to check that it is enabled.

    Quote Originally Posted by Movieman View Post
    Well I might have some backups laying around here somewhere!
    I'll try to update the charts later this week so you can see what kind of competition you have. Although at the moment Shigeru's having some HD troubles...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  21. #796
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by poke349 View Post
    Win7 SP1 supports AVX. So my program is properly detecting it.
    But according to this: http://superuser.com/questions/24421...on-my-computer
    It looks like AVX can be enabled and disabled in the OS. I haven't tried it, but it's possible your AVX somehow got disabled. (not sure why anyone/anything would want to do that)

    Either way, it seems that having a capable OS isn't sufficient. I also need to check that it is enabled.



    I'll try to update the charts later this week so you can see what kind of competition you have. Although at the moment Shigeru's having some HD troubles...
    Will the app support the new 15 core IB xeons?
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  22. #797
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Movieman View Post
    Will the app support the new 15 core IB xeons?
    No problem. The app will allow up to 256 threads. But that's an arbitrary limit that I can increase at any time.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  23. #798
    Xtreme Addict
    Join Date
    Sep 2010
    Location
    US, MI
    Posts
    1,680
    I didn't even know you could disable avx...
    Though I notice aida64 is saying it's disabled as well, but honestly I think it's just a screw up, It's probably working anyways.
    There's alot of odd things here and there that don't work on my systems, past and present.
    Cpu load, all sorts of perf counters and so on.
    Anyways I'm gonna try this bcdedit mod to see if I can force avx to enable or whatever.

    Oh and the reason I asked for support on an older ver, what I meant was that the tests in the new ver's don't detect errors on the cpu any faster then the old ones.
    I'm not even sure the new ver's even detect the errors at all.

    It's not that I can't get my cpu stable, it's just that when it's so close to the edge of stability, finding a program that gives off an error is pretty hard.
    When I know for a fact it's not 100%, yet I can't find any apps that tell me that except for an older ver of y-cruncher, and it taking 1.5 - 4.5hrs to tell me so, well, that's no fun .
    That's why when I was saying if I could pool together enough screenshots of the error in the older ver with the exact tests it fails on, if it were possible for those tests to be re-included in the new ver's as a custom test.
    Because even the older ver's, I can't pick those tests to run outright, it just doesn't do that it seems.

    But then again if you don't wanna mess with it, I'll just keep using the older ver then and wait it out for so many hours to tell me about a cpu vcore error lol.
    The prog is great for finding mem errors rather quickly, but finding cpu errors is not so great.
    But then again at least it can find them over a long period of time, linx and ibt couldn't find the errors at all, tested both of those overnight a little while back.
    They're great for finding cpu errors quickly when the cpu is far from being stable, but when it's only 0.005v -/+ or so off then it's nearly impossible.


    Anyways all those errors from that screenshot I posted from the diff major ver's, all stem from my os being a bit to tweaked out.
    I need to go back and re-check them all, like for ex. why vc-2008 won't install after tweaking (new ver's work fine...), all sorts of things...
    The reason I posted is because the prog should run regardless, I mean, the older ver does lol .

    Anyways I'm off to play with bcdedit.
    Though I doubt aida64 will beable to tell if it's working either way.


    Update:
    Ok well, forcing avx to enable doesn't work.
    Avx is just isn't working on my setup lol, got the os, cpu and etc.
    I apparently gutted avx along with float-16 and so on without realizing it, I wouldn't of known with my older amd cpu at the time.
    I didn't think there was another say 80% perf boost just waiting for me lol.
    I got a fresh os to the side I haven't finished setting up, no tweaks.
    I planned on getting around to fixing up my tweaks for x64 win7 but just haven't gotten around to it other then installing windows and calling it quits for the time being, until now anyways.
    Sigh I don't even wanna use windows lol, but I don't have the free space right now for linux...

    I should try re-stressing my cpu with avx enabled a little later on, I thought I was using it but apparently not...
    Linx went from 90 gflops to 160 gflops so apparently not lol.
    Last edited by NEOAethyr; 03-18-2014 at 12:19 AM.

  24. #799
    Xtreme Member
    Join Date
    Mar 2005
    Location
    Trinidad and Tobago
    Posts
    400
    re-ran with new hardware



    Rig 1 Asus ROG Strix B550-F WiFi, R7 5800x, 64GB Vengeance LPX 4*16GB, Zotac GTX 1070Ti, X-Fi Titanium, Enermax Revolution D.F. 850w, SSDs 768GB, HDD 3TB, CM 912HAF, NH D-15 Black.cr
    Rig 2 Asus Maximus VI Hero, i7 4770K@4000, 32GB Ballistix VLP 4*8GB, Gigabyte GTX 970, eVGA G3 850, SSD 512GB, HDD 2TB, TT Element-T, NH D-14.

  25. #800
    Xtreme Member
    Join Date
    Jun 2008
    Posts
    160
    Performance test is looking better with a new raid controlller.. hopefully should speed up my 13.3 trillion calculation by quite a bit I hope:

    Code:
    Sequential Write:         1.59 GB/s
    Sequential Read:          1.77 GB/s
    Threshold Strided Write:   864 MB/s
    Threshold Strided Read:    881 MB/s
    
    Overlapped VST-I/O Ratio: 0.779955
    
    Notes:
    
      - The overall I/O speed is unable to keep up with the CPU(s).
        The I/O throughput is 1.28213x slower than the CPU throughput.
        Large computations will be significantly slowed down by disk access.
        I/O bandwidth can be increased in a number of ways:
          - Add more drives in parallel. This is the obvious way.
            Many machines have 4 or more drives just to run this program!
          - Defragment the drives.
          - Use empty drives. Empty and freshly formatted drives perform best.
    
      - Your threshold non-sequential I/O bandwidth is very high.
        This may cause sub-optimal algorithm selection for large computations.
        The optimal ratio between sequential/non-sequential I/O is about 3 to 1.
        It is recommended to decrease the "Min I/O Size" setting and re-run
        this benchmark.
    I lol'd a bit during the sequential read test for a while it was > 2 GB/sec and I saw what I think is an easter egg =)
    Code:
    Sequential Read:          2.01 GB/s  WTF?!?!
    The WTF?!?! was in blue.
    Supermicro SC846 Case
    Supermicro X9DR3-LN4F+
    Dual Intel Xeon E5 4650L (8 core, 2.6Ghz, 3.1 Ghz Turbo)
    EVGA Geforce gtx 670
    192GB DDR3 PC-1333 ECC Memory
    ARC-1280ML raid controller
    24x2TB Hitachi SATA (raid6)
    ARC-1880x raid controller
    30x3TB Hitachi SATA (raid6)
    - External in two SC933 Case
    Work/Home:

Page 32 of 33 FirstFirst ... 222930313233 LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •