MMM
Page 25 of 33 FirstFirst ... 1522232425262728 ... LastLast
Results 601 to 625 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #601
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by poke349 View Post
    Thanks! Last time I was here was 13 years ago. So it's almost completely new to me.



    Ordering food. If you ask me, that's pretty important. It's hard to get around without food.

    My Cantonese is native, but not fluent. (It's my first language, but having been in the states all my life, English is obviously my strongest language.)

    EDIT:
    I wanna learn enough Japanese to go to Japan to meet Shigeru Kondo. From what I've seen/heard, few people in Japan outside of the tourist areas speak English.
    Right now, all the Japanese I know is from Anime - which I know is not enough (nor appropriate) to use on the streets.


    My mom is kinda afraid of the whole middle east... But I also think it's a little over the top.
    The last time that I was in Hong Kong was I think 1994/95. My mom keeps talking about wanting to take me there. *shrug* who knows. I personally, have little to no desire to go.

    Cantonese is also technically my native tongue as well, but I grew up in Canada, so, like yourself, English is definitely my strongest language (to the point that there are quite a number of people that think I'm born here).

    I ran one of the bigger runs but it came back to 3 errors that were fixed with ECC. So, I don't know. I could try to run them again and see what happens with it.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  2. #602
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by alpha754293 View Post
    The last time that I was in Hong Kong was I think 1994/95. My mom keeps talking about wanting to take me there. *shrug* who knows. I personally, have little to no desire to go.

    Cantonese is also technically my native tongue as well, but I grew up in Canada, so, like yourself, English is definitely my strongest language (to the point that there are quite a number of people that think I'm born here).
    Most people have no idea I can speak anything else... until I get a call from my mom or something...
    Then I make ears bleed with a horrific mix of English and Canto...

    I ran one of the bigger runs but it came back to 3 errors that were fixed with ECC. So, I don't know. I could try to run them again and see what happens with it.
    Interesting, I rarely see cases where a run will still complete after multiple corrected errors. 1 is common. 2 - I've heard of but not seen myself. 3 - first time.
    It usually BSODs or hits an uncorrectable error with that many errors.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  3. #603
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by poke349 View Post
    Most people have no idea I can speak anything else... until I get a call from my mom or something...
    Then I make ears bleed with a horrific mix of English and Canto...



    Interesting, I rarely see cases where a run will still complete after multiple corrected errors. 1 is common. 2 - I've heard of but not seen myself. 3 - first time.
    It usually BSODs or hits an uncorrectable error with that many errors.
    Same thing. When asked if I can speak Chinese, my official answer is always "no". But if you hang around me long enough, you will find that that's NOT entirely true. I can, but at a very minimalistic level. Like...kindergarten level. And even then, most of the time, I would respond in English anyways. (It takes me a while to translate it since I think in English). Talking to my grandparents...OMG....GREAT FUN. LOL.

    Yayyyyy Chinglish!!! But it's not that surprising for people from Hong Kong though.

    I'm re-running it now. Yea, I got a nice little message saying that it ran successfully, but it had 3 errors that it needed to fix itself. I don't have that logfile anymore (because I'm rerunning it), so hopefully I'll have the new results soon. (Maybe like...45 minutes to an hour or so). I want to see if it is going to do it again.

    *edit*
    25B passed on the second go-around, without error.

    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,048,722 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        25,000,000,000
    Hexadecimal Digits:    20,762,050,594
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        112 GB
    
    Start Date:            Mon Jan 03 08:09:10 2011
    End Date:              Mon Jan 03 12:34:19 2011
    
    Computation Time:      14,069.180 seconds
    Total Time:            15,908.900 seconds
    
    CPU Utilization:           3972.97 %
    Multi-core Efficiency:     82.77 %
    
    Last Digits:
    2448547079 5329693979 7145627081 9204187454 9483487803  :  24,999,999,950
    1309759846 5364560010 7388984278 8403481193 9913806533  :  25,000,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   1c42b250a604b3070246bf06edf0d17bebd53bd6a1781fdd11ed4c47abd19822
    Last edited by alpha754293; 01-03-2011 at 10:04 AM.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  4. #604
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Alright, I'm back in the states - jetlagged like hell. I'll get these updated in a bit.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  5. #605
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Charts updated.

    Just making sure before I place my order. The following are compatible right?

    ASUS P8P67 PRO LGA 1155 Intel P67 SATA 6Gb/s USB 3.0 Intel Motherboard
    http://www.newegg.com/Product/Produc...82E16813131682

    Intel Core i7-2600K Sandy Bridge 3.4GHz (3.8GHz Turbo Boost) LGA 1155 95W Quad-Core Desktop Processor BX80623I72600K
    http://www.newegg.com/Product/Produc...82E16819115070

    G.SKILL Ripjaws Series 16GB (4 x 4GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10666) Desktop Memory Model F3-10666CL9Q-16GBRL
    http://www.newegg.com/Product/Produc...82E16820231312

    CORSAIR CWCH50-1 High Performance CPU Cooler
    http://www.newegg.com/Product/Produc...82E16835181010


    I'll be throwing in my spare GTS 250 and my spare 1000W Corsair PS. (It's probably total overkill, so I might get a lower-end PS and keep the 1000W spare for open-air testing of all my other junk.)

    Nothing spectacular. Just enough to OC and play with AVX. I won't be doing any "serious" builds for a while.
    Other than that, I'll be using it mainly as a gigabit/esata/usb3 fileserver.
    Last edited by poke349; 01-09-2011 at 03:45 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #606
    Xtreme Cruncher
    Join Date
    Jun 2005
    Location
    Northern VA
    Posts
    1,285
    hey Poke, its almost christmas in january time what i need to know is what do you need from me?

    im working on making a beauwolf cluster out of my tyans with a 10gbit inifiniband interconnect

    so im going to send you my arima quad socket if thats ok.... its got 2 8220 dual core optis in it, i can get 2 more if you need all 4. i dont think i have any ram for it, ill have to find a vid card for it, they are a bit finickey on the vid.
    do you have an extra psu? 500w should be more than enough, the WCG guys are running it with 4 8347 quad core, so 4 duals should pull less then that.

    last thing, do you need a OS in it ive got win server 03 or umbutu if you want it.

    let me know what you need for it and ill get it together for ya.
    Its not overkill if it works.


  7. #607
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by skycrane View Post
    hey Poke, its almost christmas in january time what i need to know is what do you need from me?

    im working on making a beauwolf cluster out of my tyans with a 10gbit inifiniband interconnect

    so im going to send you my arima quad socket if thats ok.... its got 2 8220 dual core optis in it, i can get 2 more if you need all 4. i dont think i have any ram for it, ill have to find a vid card for it, they are a bit finickey on the vid.
    do you have an extra psu? 500w should be more than enough, the WCG guys are running it with 4 8347 quad core, so 4 duals should pull less then that.

    last thing, do you need a OS in it ive got win server 03 or umbutu if you want it.

    let me know what you need for it and ill get it together for ya.
    Awesome. I think all I need is the mobo and any 4 identical CPUs. I'll take care of everything else. (I shouldn't be asking for more anyway. )

    The extra stuff I have lying around are:
    CM 932 HAF - Currently empty. My Harpertown rig used to be in it before I moved it back to California. So it's furniture right now. The quad is going in this case.
    GTX 9800+ - My Harpertown rig's old card. If this is compatible, I'll throw it in the quad.
    GTS 250 - This is gonna go in my SB rig.
    Corsair 1000W - Unused, currently serving as my parts tester.

    I'll be stopping at Frys on my way back to Urbana to pickup anything else I might need. They pretty much have everything that isn't server-oriented.
    I'll be getting a small case and a weaker PSU for my SB rig. (I won't be running SLI/Xfire nor will be clocking above 5 GHz... so the 1000W Corsair is overkill.) I can easily pickup another PSU for the quad.

    I'm not sure if Frys has socket F heatsinks. And I bet they don't have DDR2 ECC ram. I'll probably grab those off the egg or any old servers that whomever might be throwing out.

    So I'm guessing I'll need 8 sticks to get it running at full speed? (4 sockets x dual-channel)
    I'm probably gonna need at least 16GB of ram to be able to do any serious NUMA/MPI programming on it. (Man, I wish they were compatible with my Harpertown rig! That thing is loaded. )
    But if you happen to have any extra lying around that you no longer want...

    I'm also good with the OS. Is your Win Server already "activated" or "locked" to the rig?
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  8. #608
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Back on campus + two long days of unpacking/organizing/building/debugging...

    I spent most of today wrestling with my SB rig... Sometimes it posts 16GB, sometimes only 8GB... Everything is at stock.

    I'm guessing it probably has something to do with the auto-tuning settings...
    Disabling the auto-tuner and manually setting everything to stock did improve the reliability a bit.
    It's still too early... probably gonna need to wait for a BIOS update.


    My Harpertown rig with a new mobo... (click to enlarge)



    The SB rig... (click to enlarge)



    Anyways... Whenever it does manage post and boot with 16GB, its rock-stable. So I'm sure it's just a BIOS issue and not a faulty mobo.

    Everything at stock for now... I'll OC it later once I'm done fiddling with the stupid memory issue... (click to enlarge)





    Running 10b right now...

    EDIT: Done. Too tired to stay up, so I let it run overnight.



    Anyways... It'll probably take me a few weeks to make the AVX code-paths. I've been gone for too long and I have a lot of school/research-work to catch up on.
    Last edited by poke349; 01-17-2011 at 07:06 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  9. #609
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    poke349
    I just noticed this, but on the charts, you have the quad AMD Opteron 6174 listed as a single AMD Opteron 6174 and only 64 GB. It's 4x AMD Opteron 6174 (2.2 GHz, 12-core) and it's 128 GB of DDR3-1333 RAM.

    Just a minor correction. Thanks.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  10. #610
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by alpha754293 View Post
    poke349
    I just noticed this, but on the charts, you have the quad AMD Opteron 6174 listed as a single AMD Opteron 6174 and only 64 GB. It's 4x AMD Opteron 6174 (2.2 GHz, 12-core) and it's 128 GB of DDR3-1333 RAM.

    Just a minor correction. Thanks.
    I fixed the 4x. The memory is correct. You didn't run 128GB when you ran those benches - except for the 25b.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  11. #611
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by poke349 View Post
    I fixed the 4x. The memory is correct. You didn't run 128GB when you ran those benches - except for the 25b.
    Oh. My bad. haha...maybe that's when we pulled half the RAM out thinking that it was tripping the PSU's overcurrent protection.

    Hmmm, I'll re-run it with the full 128 GB then.

    Sorry about that.

    *edit*

    Where, here's some of the results back. I had to stop it from the 2.5 B mark, cuz I needed the system to do some FEAs for me.

    *edit*
    I re-ran them all again, upto 10 B.

    *edit*
    Huhhh....the new runs are slower than the old ones. I think that I'm going to reboot the system, to clear the system caches and stuff from doing FEAs and re-run it yet again. Stay-tuned for the results.

    *edit*
    Verrrry interesting. With 128 GB of RAM, some of the runs were faster, some were slower. Hmmm....

    25M:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,046,076 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        25,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        221 MB
    
    Start Date:            Fri Jan 21 08:34:39 2011
    End Date:              Fri Jan 21 08:34:50 2011
    
    Computation Time:      8.882 seconds
    Total Time:            10.203 seconds
    
    CPU Utilization:           1321.75 %
    Multi-core Efficiency:     27.53 %
    
    Last Digits:
    3803750790 9491563108 2381689226 7224175329 0045253446  :  24,999,950
    0786411592 4597806944 2455112852 2554677483 6191884322  :  25,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   264dd02129d2100b1b350442f1ee309d6611473c187724dd18cfeeca25a0d5c4
    50M:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,040,226 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        50,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        331 MB
    
    Start Date:            Fri Jan 21 08:34:50 2011
    End Date:              Fri Jan 21 08:35:11 2011
    
    Computation Time:      18.560 seconds
    Total Time:            20.704 seconds
    
    CPU Utilization:           1820.55 %
    Multi-core Efficiency:     37.92 %
    
    Last Digits:
    4127897300 0153683630 8346732220 0943329365 1632962502  :  49,999,950
    5130045796 0464561703 2424263071 4554183801 7945652654  :  50,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   77fcb30338665cc71c510a5f0fb1b312a29ef6287b997a661cdb4dc02987f9e3
    100M:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,034,201 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        100,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        550 MB
    
    Start Date:            Fri Jan 21 08:35:11 2011
    End Date:              Fri Jan 21 08:35:56 2011
    
    Computation Time:      40.932 seconds
    Total Time:            45.280 seconds
    
    CPU Utilization:           2307.99 %
    Multi-core Efficiency:     48.08 %
    
    Last Digits:
    9948682556 3967530560 3352869667 7734610718 4471868529  :  99,999,950
    7572203175 2074898161 1683139375 1497058112 0187751592  :  100,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   46e074e75637fe340a42b6669709bb924ca66d8fbd9931f81fcb11ee14ae6317
    250M:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,048,178 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        250,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        1.27 GB
    
    Start Date:            Fri Jan 21 08:35:56 2011
    End Date:              Fri Jan 21 08:38:12 2011
    
    Computation Time:      125.443 seconds
    Total Time:            135.587 seconds
    
    CPU Utilization:           3197.15 %
    Multi-core Efficiency:     66.60 %
    
    Last Digits:
    3673748634 2742427296 0219667627 3141599893 4569474921  :  249,999,950
    9958866734 1705167068 8515785208 0067520395 3452027780  :  250,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   c3626149b4dfe67701c72a7159fad2ec70103148572c5a300a52546dba3c7208
    500M:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,048,908 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        500,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        2.43 GB
    
    Start Date:            Fri Jan 21 08:38:12 2011
    End Date:              Fri Jan 21 08:43:21 2011
    
    Computation Time:      288.644 seconds
    Total Time:            308.968 seconds
    
    CPU Utilization:           3576.81 %
    Multi-core Efficiency:     74.51 %
    
    Last Digits:
    3896531789 0364496761 5664275325 5483742003 7847987772  :  499,999,950
    5002477883 0364214864 5906800532 7052368734 3293261427  :  500,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   938fb14688423d861d98a3aa2d4c1f23f16e42f72cb3432f1d08b242b18886e6
    1B:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,048,604 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        1,000,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        4.77 GB
    
    Start Date:            Fri Jan 21 08:43:21 2011
    End Date:              Fri Jan 21 08:53:36 2011
    
    Computation Time:      574.231 seconds
    Total Time:            614.573 seconds
    
    CPU Utilization:           3735.20 %
    Multi-core Efficiency:     77.81 %
    
    Last Digits:
    6434543524 2766553567 4357021939 6394581990 5483278746  :  999,999,950
    7139868209 3196353628 2046127557 1517139511 5275045519  :  1,000,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   1562561e2e61dfae00333138f191f0f9343a4cc65b0509ab071b7675a666f600
    2.5B:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,043,100 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        2,500,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        11.3 GB
    
    Start Date:            Fri Jan 21 08:53:36 2011
    End Date:              Fri Jan 21 09:23:18 2011
    
    Computation Time:      1,660.550 seconds
    Total Time:            1,781.429 seconds
    
    CPU Utilization:           3981.26 %
    Multi-core Efficiency:     82.94 %
    
    Last Digits:
    0917027898 3554136437 7123165188 3528593128 0032489094  :  2,499,999,950
    9228502005 4677489552 2459688725 5242233502 7255998083  :  2,500,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   19a547b8f537e6f018fc2f9ffc6168a17c49b8f7942a5e05171764aece3756f7
    5B:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,049,740 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        5,000,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        22.5 GB
    
    Start Date:            Fri Jan 21 09:23:19 2011
    End Date:              Fri Jan 21 10:10:49 2011
    
    Computation Time:      2,626.111 seconds
    Total Time:            2,850.030 seconds
    
    CPU Utilization:           4059.12 %
    Multi-core Efficiency:     84.56 %
    
    Last Digits:
    4384678622 1397184596 0181195416 0748430457 5386741865  :  4,999,999,950
    0914971996 1298184401 9216126684 9425834935 5440797257  :  5,000,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   e4ff0dea9b2eef8e04cc430fcc56aabfc1b94e0edfeeaf9607a530b4c8d6175e
    10B:
    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          AMD Opteron(tm) Processor 6174
    Logical Cores:         48
    Physical Memory:       136,765,112,320 bytes  ( 128 GB )
    CPU Frequency:         2,200,595,152 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE3 - Windows ~ Kasumi)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        10,000,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        64 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        45.1 GB
    
    Start Date:            Fri Jan 21 10:10:51 2011
    End Date:              Fri Jan 21 11:43:42 2011
    
    Computation Time:      5,040.317 seconds
    Total Time:            5,571.566 seconds
    
    CPU Utilization:           4003.85 %
    Multi-core Efficiency:     83.41 %
    
    Last Digits:
    9763261541 1423749758 2083180752 2573977719 9605119144  :  9,999,999,950
    9403994581 8580686529 2375008092 3106244131 4758821220  :  10,000,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   0a154ee18dfa32711d292b9203e1adb220604626a7ef440218e5d76bb7d72c1b
    Last edited by alpha754293; 01-21-2011 at 08:46 AM.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  12. #612
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Nice! I'll update these over the weekend. I also have a bunch of 5 GHz SB results from Shigeru Kondo which I'll add then as well.

    One of my larger runs. (click to enlarge)


    VCore was 1.344 under load. The CPUz value is wrong since the rig wasn't under load the moment I took the screenie.
    4.6 GHz is 3 hours prime stable at the same voltage.

    I was able to get 4.9 GHz benchable and 5.0 GHz screenie'able.
    Unlike my original i7 rig, this one isn't heat limited. But rather, voltage limited. I don't wanna to feed more than 1.45v into the chip...


    Definitely not the best chip...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  13. #613
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Charts Updated.

    I'll hold off on my own results until get the AVX version running...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  14. #614
    Xtreme Enthusiast
    Join Date
    Sep 2007
    Location
    Coimbra - Portugal
    Posts
    699
    Just saw this today, and I like this kind of projects very much. So if I can do something more to help tell me

    Also, hopping for AVX support ASAP

    5.0Ghz, 2600k with 8Gb DDR3 800Mhz - 9 - 9 -9 -24


    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
    Logical Cores:         8
    Physical Memory:       8,569,073,664 bytes  ( 8.00 GB )
    CPU Frequency:         3,400,047,072 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE4.1 - Windows ~ Ushio)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        1,000,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        8 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        4.75 GB
    
    Start Date:            Mon Jan 24 12:03:05 2011
    End Date:              Mon Jan 24 12:08:50 2011
    
    Computation Time:      324.820 seconds
    Total Time:            345.375 seconds
    
    CPU Utilization:           739.36 %
    Multi-core Efficiency:     92.42 %
    
    Last Digits:
    6434543524 2766553567 4357021939 6394581990 5483278746  :  999,999,950
    7139868209 3196353628 2046127557 1517139511 5275045519  :  1,000,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   45d8a0622c232d741a0507c864da8a3438b163d682cfc0f10acf75fa26d47f09

  15. #615
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by st0ned View Post
    Just saw this today, and I like this kind of projects very much. So if I can do something more to help tell me

    Also, hopping for AVX support ASAP

    5.0Ghz, 2600k with 8Gb DDR3 800Mhz - 9 - 9 -9 -24

    Nice! I'll update it later.


    So there's one very annoying thing about AVX now... And that would be the LACK OF LAYMAN DOCUMENTATION on the new instructions...

    I'm having to use trial and error to figure out exactly wtf instructions like "VPERM2F128" (C-intrinsic: _mm256_permute2f128_pd) do...


    Not funny actually...

    The stuff is too new... I'm used to reading everything off of MSDN or forums... but I'm basically working blind right now. lol



    EDIT: Oh goodie... Visual Studio 2010 is miscompiling some of the AVX instructions...
    That's gonna be a bit of a problem... The Intel Compiler is fine, but it's a lot slower at compiling than Visual Studio...
    Last edited by poke349; 01-27-2011 at 10:40 AM. Reason: typo
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  16. #616
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Thought I'd give everyone a little update on the AVX version:


    So I've ported all the easily vectorizable floating-point code to AVX, but the results are actually pretty disappointing.

    The AVX throughput isn't double the SSE. This is in contrast to what I initially thought (since a number of Intel Employees on XS mentioned that Sandy Bridge will have 2 x 256-bit execution unit - same as the 128-bit units on the Core 2's and the Nehalems.) My guess is that Intel didn't put too much effort into the AVX unit yet since there will probably be very few early adopters. This will be the same situation when Bulldozer arrives. It will not have a full 256-bit module for every core.

    Furthermore, (and I've stated this in the past):
    Starting from v0.5.2, y-cruncher is no longer a floating-point bound application.
    Indeed the application uses a lot of floating-point, but thanks to technologies like SSE and AVX (as well as better skills on my part), this floating-point has been reduced to almost nothing.
    AVX does not support 256-bit integer vector operations - and actually, none of y-cruncher's integer code is vectorized... yet

    Do not expect the 2x performance that you'd see from other applications that are 100% floating-point.


    So when I release v0.5.5, you'll be seeing the effect of AVX on an application that is not very floating-point heavy.
    With AVX, y-cruncher is almost entirely integer and memory bound.

    (click to enlarge)



    One more thing:

    The Intel Compiler v11.1 does not support compiling AVX programs for non-Intel processors. To top it off, Visual Studio 2010 miscompiles AVX.

    So I will not be able to provide an AVX binary for Bulldozer until I get a newer version of either compiler that fixes their problem. (Or if I can find a work-around for the bug in Visual Studio 2010...)
    Last edited by poke349; 01-28-2011 at 04:16 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  17. #617
    Xtreme Enthusiast
    Join Date
    Sep 2007
    Location
    Coimbra - Portugal
    Posts
    699
    5.2Ghz mems were really relaxed ( hard to run high freq and low timings at this MHz )

    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
    Logical Cores:         8
    Physical Memory:       8,571,334,656 bytes  ( 8.00 GB )
    CPU Frequency:         3,400,051,711 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE4.1 - Windows ~ Ushio)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        1,000,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        8 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        4.75 GB
    
    Start Date:            Sat Jan 29 16:49:52 2011
    End Date:              Sat Jan 29 16:55:27 2011
    
    Computation Time:      314.985 seconds
    Total Time:            335.500 seconds
    
    CPU Utilization:           734.85 %
    Multi-core Efficiency:     91.85 %
    
    Last Digits:
    6434543524 2766553567 4357021939 6394581990 5483278746  :  999,999,950
    7139868209 3196353628 2046127557 1517139511 5275045519  :  1,000,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   f09b28c057e7ca6f0007d33626109d18dd6f6853199d49db012de9b2630e690f

  18. #618
    Registered User
    Join Date
    Aug 2005
    Posts
    67
    New Lappy:

    stock speed of 2ghz, turbos to 2.5 at load.

    Code:
    Validation Version:    1.1
    
    Program:               y-cruncher - Gamma to the eXtReMe!!!     ( www.numberworld.org )
                           Copyright 2008-2010 Alexander J. Yee    ( a-yee@northwestern.edu )
    
    
    User:                  None Specified - You can edit this in "Username.txt".
    
    
    Processor(s):          Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
    Logical Cores:         8
    Physical Memory:       8,569,573,376 bytes  ( 8.00 GB )
    CPU Frequency:         1,995,525,983 Hz
    
    Program Version:       0.5.4 Build 9148 (fix 1) (x64 SSE4.1 - Windows ~ Ushio)
    Constant:              Pi
    Algorithm:             Chudnovsky Formula
    Decimal Digits:        500,000,000
    Hexadecimal Digits:    Disabled
    Threading Mode:        8 threads
    Computation Mode:      Ram Only
    Swap Disks:            0
    Working Memory:        2.42 GB
    
    Start Date:            Sun Jan 30 21:20:20 2011
    End Date:              Sun Jan 30 21:25:25 2011
    
    Computation Time:      287.077 seconds
    Total Time:            305.364 seconds
    
    CPU Utilization:           758.11 %
    Multi-core Efficiency:     94.76 %
    
    Last Digits:
    3896531789 0364496761 5664275325 5483742003 7847987772  :  499,999,950
    5002477883 0364214864 5906800532 7052368734 3293261427  :  500,000,000
    
    Timer Sanity Check:        Passed
    Frequency Sanity Check:    Passed
    ECC Recovered Errors:      0
    Checkpoint From:           None
    
    ----
    
    Checksum:   788c9880a228b40d0b054777b6990edebc1a3fd62bb58e5e0ea563308f91d071
    Try my multi-threaded prime benchmark!
    If you like it and want to see more - bitcoin me!!
    1MrPonziaM4QT2S7SdPEKQH88BGa4LRHJU
    1HaxXoRZhMLxMJwJ52VfAqanSuLuh8CCki
    1ZomGoxrBqyVdBvHwPLEERsGGQAtc3jHp
    1L33thAxKo1GqRWRYP5ZCK4EjTMUTHFsc8

  19. #619
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by st0ned View Post
    5.2Ghz mems were really relaxed ( hard to run high freq and low timings at this MHz )
    Nice! I'll update these sometime tomorrow or Monday.

    Quote Originally Posted by Alpha View Post
    New Lappy:

    stock speed of 2ghz, turbos to 2.5 at load.
    Sandy Bridge laptops already?!?!?!?!
    Dang... now I want one.



    EDIT:

    Holy $#!+... 4.7 GHz LinX AVX stable for more than 2 hours... (with and without HT)
    I drop the multiplier to 46x keeping all other settings the same...

    And it fails y-cruncher v0.5.5 AVX about 8 hours into a 100b Advanced Swap run @ 4.6 GHz...

    I've measured it, v0.5.5 with AVX doesn't run any hotter than v0.5.4 with SSE4.1... But I guess temperatures don't mean that much anyway.

    p.s. v0.5.5 will be out in a few days if I don't find any major bugs.



    EDIT 2:
    Charts updated.
    v0.5.5 appears to be about 1% faster than v0.5.4 even without AVX. Somewhat surprising since I didn't really do any optimizations. I guess I'll have to credit that to the newer version of the Intel Compiler.

    I'll have SB benchmarks in a few days. I hope I can get 4.8 GHz stable enough to run 2.5b.
    But I've been hearing from my beta-testers that v0.5.5 is indeed more intensive than v0.5.4 because of the AVX. I can barely hold 4.9GHz stable for more than a few min. under LinX (no AVX). I'll see about that.
    Last edited by poke349; 01-31-2011 at 10:06 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  20. #620
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Alright. I think it's time to let this thing loose:

    Version v0.5.5 is released!!! Now with support for AVX instructions!

    Download: http://www.numberworld.org/y-cruncher/


    Note that the AVX version will NOT run on Bulldozer.

    This is because the Intel Compiler v11.1 cannot be configured to compile AVX for non-Intel processors. Visual Studio 2010 can compile AVX, but it miscompiles it. (I have yet to find a work-around.)
    So currently, I have no way to build an AVX binary for Windows that will run on Bulldozer.

    I'm hoping Visual Studio 2010 SP1 will fix this issue, or an update to the Intel Compiler will let me compile for non-Intel processors. Bulldozer isn't out yet, so I still have time.


    Here's a few of my benchmarks:

    4.9 GHz with AVX. SSE4.1 would run at 5.0 GHz, but not with AVX... Could only get 4.9 GHz benchable.
    Shigeru Kondo is reporting similar results. AVX seems knock about 1 or 2 multipliers off of any v0.5.4 stable overclock.

    (click to enlarge)




    And a 100 billion digit run at 4.6 GHz with AVX - 36.486 hours. I had to put a TON of volts into it to get it to pass.
    My first try was with 1.39v which was > 2 LinX (with AVX) stable @ 4.7 GHz. But it BSOD'ed 8 hours in at only 4.6 GHz.

    I raised the vcore to 1.41v and it held:

    (click to enlarge)






    EDIT:
    I surprisingly didn't need to fix anything to get it to compile for Linux. None of the new code from the past few months broke the Linux version in anyway.

    And woah... It looks to be about a percent faster than the Windows version. Though I doubt that will still be the case for the swap modes.



    Gimme a few days to test the Linux builds. Even though all the code compiles and seems to run fine, I still need to test a few things.
    Last edited by poke349; 02-01-2011 at 09:13 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  21. #621
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Might I suggest that for those who are looking to run with AVX that they need to be aware that of course, their hardware has to support AVX, along with the OS. (In case people are trying to run it and they're noticing that very little has changed.)

    *edit*
    P.S. There are other compilers out there that should compile your program without Intel compiler and/or Visual Studios.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  22. #622
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by alpha754293 View Post
    Might I suggest that for those who are looking to run with AVX that they need to be aware that of course, their hardware has to support AVX, along with the OS. (In case people are trying to run it and they're noticing that very little has changed.)

    *edit*
    P.S. There are other compilers out there that should compile your program without Intel compiler and/or Visual Studios.
    In Windows:
    If the hardware supports AVX, but the OS doesn't, it will tell you outright before continuing. (similar message if you were to run a 32-bit OS on 64-bit hardware)
    Similarly, if you were to run this version on Bulldozer, it will tell you that it can't use AVX yet.

    In Linux:
    I haven't made a dispatcher for it yet, so the user has to pick from one of the 4 binaries. I assume Linux users are more technically inclined. So if they run the AVX version and it crashes, it might be a good hint that the hardware doesn't support it. (Linux has had support for AVX for quite a while now, so I shouldn't have to worry about that.)

    If the hardware doesn't support AVX in the first place, then it won't tell you anything about not being able to use AVX.


    As for the compilers, GCC is fine. So the Linux version of the AVX binary will run on Bulldozer.
    For Windows, I'll wait until closer to the release date of Bulldozer. There's no point in spending time on getting/learning/trying another compiler if the problem will go away by itself.
    If neither the Intel Compiler nor Visual Studio has fixed their problem by Bulldozer's release, then I'll start looking for another compiler - MinGW is first on my list.



    EDIT:
    Here's what it looks like if you try running v0.5.5 on SB without OS support for AVX.

    I actually have 3 OS's installed on this machine at the moment.
    Windows 7: My default, has all my games and stuff installed.
    Windows 7 SP1 Beta: Sole purpose -> AVX programming
    Ubuntu 10.10: Linux programming.

    The Windows 7 SP1 Beta install is just temporary. I put it on an extra laptop drive I had sitting around. I'm gonna get rid of it when MS officially releases SP1.

    (click to enlarge)
    Last edited by poke349; 02-03-2011 at 01:18 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  23. #623
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by poke349 View Post
    In Windows:
    If the hardware supports AVX, but the OS doesn't, it will tell you outright before continuing. (similar message if you were to run a 32-bit OS on 64-bit hardware)
    Similarly, if you were to run this version on Bulldozer, it will tell you that it can't use AVX yet.

    In Linux:
    I haven't made a dispatcher for it yet, so the user has to pick from one of the 4 binaries. I assume Linux users are more technically inclined. So if they run the AVX version and it crashes, it might be a good hint that the hardware doesn't support it. (Linux has had support for AVX for quite a while now, so I shouldn't have to worry about that.)

    If the hardware doesn't support AVX in the first place, then it won't tell you anything about not being able to use AVX.


    As for the compilers, GCC is fine. So the Linux version of the AVX binary will run on Bulldozer.
    For Windows, I'll wait until closer to the release date of Bulldozer. There's no point in spending time on getting/learning/trying another compiler if the problem will go away by itself.
    If neither the Intel Compiler nor Visual Studio has fixed their problem by Bulldozer's release, then I'll start looking for another compiler - MinGW is first on my list.



    EDIT:
    Here's what it looks like if you try running v0.5.5 on SB without OS support for AVX.

    I actually have 3 OS's installed on this machine at the moment.
    Windows 7: My default, has all my games and stuff installed.
    Windows 7 SP1 Beta: Sole purpose -> AVX programming
    Ubuntu 10.10: Linux programming.

    The Windows 7 SP1 Beta install is just temporary. I put it on an extra laptop drive I had sitting around. I'm gonna get rid of it when MS officially releases SP1.

    (click to enlarge)
    In all my years, one of the things that I have learned is never assume people know stuff.

    You'd be amazed at how stupid people can really be sometimes.

    The last thing that you'd want to hear is somebody trying to run AVX without an AVX-supported OS or hardware. *rolls eyes*

    I just suggested the other compilers that might be more "friendly" to you only because I had to look up what AVX is on wiki and it lists OS/compilers that support it.

    Some Linux distros might still be using an older kernel (especially true when it comes to Enterprise linuxes), so, recompiling and updating the kernel may be required for AVX support.

    I don't do anything that needs AVX, and as far as I know, none of the commerically available software that I use at work supports AVX, so I don't really have to worry about that for now.

    If anything, I'd be moving over to GPU computing before I'd be moving to AVX computing.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  24. #624
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by alpha754293 View Post
    In all my years, one of the things that I have learned is never assume people know stuff.

    You'd be amazed at how stupid people can really be sometimes.

    The last thing that you'd want to hear is somebody trying to run AVX without an AVX-supported OS or hardware. *rolls eyes*

    I just suggested the other compilers that might be more "friendly" to you only because I had to look up what AVX is on wiki and it lists OS/compilers that support it.

    Some Linux distros might still be using an older kernel (especially true when it comes to Enterprise linuxes), so, recompiling and updating the kernel may be required for AVX support.

    I don't do anything that needs AVX, and as far as I know, none of the commerically available software that I use at work supports AVX, so I don't really have to worry about that for now.

    If anything, I'd be moving over to GPU computing before I'd be moving to AVX computing.
    Yeah, I know what you mean. One the main reasons why I added the dispatcher (y-cruncher.exe) was because I was seeing waaaaay too many people run the 32-bit binaries when they could have run the 64-bit ones. (This was prior to v0.4.3..)

    So the dispatcher got rid of that issue. But there isn't much I can do to stop someone from going into the "Binaries" folder and manually running something that's incompatible. I can add checks, but I don't feel like duplicating the CPU-detection code into the main program itself.


    For the Linux versions, you are entirely correct. The problem is gonna remain until I make a dispatcher for the Linux release.

    As for GPU computing... That's a different can of worms. I suppose anything that scales perfectly from SSE to AVX will probably run well on the Fermi GPUs. (provided it isn't bound by some other resource)
    Later on, I'll probably get myself a Fermi card to experiment with. But I don't have my hopes up since the program is already at the point of diminishing return for vectorization with the current set of algorithms.

    From what I've seen, anything that's bound by memory doesn't run that much better on GPUs (if at all). GPU <-> main memory is slower than CPU <-> main memory. I'd like to see a GPU with 64GB of memory... Then there's no need to go to main memory.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  25. #625
    Xtreme Cruncher
    Join Date
    Jun 2005
    Location
    Northern VA
    Posts
    1,285
    hey poke, looks like your package is scheduladed to be there on wed. hope you enjoy it...

    you will see when you get it that the power demands are a bit big. you need 5 power connectors for it to run. you need 1 24pin atx, 1 4pin molex, 2-8pin power, and 1-4pin power as well. the easiest way i found to hook it up was for one of the 8 pin, to use a 4 to 8 pin adapter, ive sent one with it. its been running at 100% load for the last 14 months like that with no problems now for the last 4 pin, use a pic-e and hook it up backwards. if you need a better explination, call me and ill talk you through it. but what you do is take the pci-e and go to hook it up normaly where the latch on it hooks on the board, flip it around so the latch is on the other side 180 degrees, and let the first pair of connectors hang over the side of the mobo, and give it a little pressure it will fit in nicely. to make sure its right, look at the 2 8pins and where the yellow pos12v is the pcie should have its 12pos on the same side .
    Its not overkill if it works.


Page 25 of 33 FirstFirst ... 1522232425262728 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •