MMM
Page 22 of 33 FirstFirst ... 121920212223242532 ... LastLast
Results 526 to 550 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #526
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Havis View Post
    a little pic, that helps a lot
    Thx

    btw, why not just use tmpfs (it's almost like RAMdisk), put there 50gig file and test your RAW I/O from there?
    I think there is no need to run the computational part and create same 50gig PI file over and over again...

    tmpfs hint:
    add one line to your /etc/fstab
    something like:
    tmpfs /testdir tmpfs size=50G,mode=0775,uid=YOURUSERNAME,gid=YOURGROUP 0 0

    edit yourusername and yourgroup to match your's
    (or just set mode to 0777 and make that directory writeable for everyone)
    be sure not to delete the other lines in fstab

    and then run "mount -a"
    Ram disks are only good for correctness testing.
    It doesn't help much when I'm actually trying to measure the performance of something on actual hard drives.

    Also, I do in fact test the I/O parts by themselves to narrow down the possibilities.
    But there are some parts that involve doing I/O in parallel with computation - for those I need to see how the computation threads will interfere with the I/O threads.
    In Windows, I have to set the I/O threads to a higher priority than the computation threads to keep the I/O threads from being starved by the computation threads.
    In Linux, I'm still trying to figure out what's going on... though I won't be able to do much anyway since OpenMP lacks priority control.

    btw, do you use ncurses for colored output?
    Nah... It seemed easy enough when I found that you can do it by printing "\033[01;31m".
    So all I needed was to add a linux version to each of my color changing functions and all was good. Too easy...

    with ntfs I recomend ntfs-3g, it's userspace implementation, and works very well, (last time I used it was about a year ago and it was flawless)
    in Ubuntu search with aptitude for ntfs3g or ntfs-3g (not sure how it's called there...)
    I tested ext4 today. (Formatted all 8 drives to ext4 for this.)
    Yes, Linux does not like NTFS at all. 30 - 60% faster I/O speeds on ext4 than NTFS. But for some reason it still doesn't like the raw I/Os - which, in contrast, worked really well on Windows...

    PS:
    You are not OCing, your are not Xtreme
    EDIT: ok, just noticed you are OCing your i7, good boy
    Hey!!! I guarantee you that 95% of OCers who are called "Xtreme" do not have 18.5 TB of disk and 64 GB of ram in one machine... And still be able to close the case...
    Not just that... this baby has had that 64 GB of ram since January 2009.

    Also, this board won't OC... I tried SetFSB... doesn't seem to work at all on this board.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  2. #527
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    18 disks thats a lot p.rn

    btw, you might look at I/O schedulers that you are using for your drives.
    if you want CFQ(this is default) just:
    echo "cfq" > /sys/block/sdX/queue/scheduler (where sdX is your drive...)
    other schedulers are deadline(which I am using) and noop.

    other things you might want to look at are these:
    /proc/sys/vm/dirty_expire_centisecs
    /proc/sys/vm/dirty_ratio
    /proc/sys/vm/dirty_background_ratio
    /proc/sys/vm/dirty_bytes
    /proc/sys/vm/dirty_background_bytes

    just google them around
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  3. #528
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Havis View Post
    18 disks thats a lot p.rn

    btw, you might look at I/O schedulers that you are using for your drives.
    if you want CFQ(this is default) just:
    echo "cfq" > /sys/block/sdX/queue/scheduler (where sdX is your drive...)
    other schedulers are deadline(which I am using) and noop.

    other things you might want to look at are these:
    /proc/sys/vm/dirty_expire_centisecs
    /proc/sys/vm/dirty_ratio
    /proc/sys/vm/dirty_background_ratio
    /proc/sys/vm/dirty_bytes
    /proc/sys/vm/dirty_background_bytes

    just google them around
    That sounds more like a tuning thing that's independent of the program... So I'll leave it to the users.
    Whenever I get the time to do the pthread implementation... then I'll try to force some priorities. But that's later.

    Also... it's only 10 HDs...
    64GB SSD + 1.5 TB + 1.0 TB + 8 x 2 TB


    So far, it looks like the un-tuned Linux version isn't too far behind the fully tuned Windows version now.
    After all, this is SSE3 (default) vs. SSE4.1 ~ Nagisa.
    Aside from that, the I/O does look to be a bit slower in Linux - possibly due to CPU starvation or improper buffering (since I'm not using raw I/Os in Linux).

    The Windows version is compiled using:
    icl "x64 SSE4.1 - Windows ~ Nagisa.cpp" /O3 /Qipo /Qprec-div- /fp:fast /Qms2 /Qvc9 /MP /FAs /arch:SSE4.1 advapi32.lib

    The Linux version is compiled using:
    g++ 'x64 SSE3 - Linux.cpp' -msse3 -fopenmp -O2 -ffast-math

    Same machine:
    Windows - NTFS for all 8 drives
    Linux - ext4 for all 8 drives

    Windows: (click to enlarge)


    Linux: (click to enlarge)



    I'll have results on my i7 rig tomorrow - including a picture of how I managed to cram 5 HDs into a micro-atx case.
    Last edited by poke349; 08-28-2010 at 10:10 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  4. #529
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    How much continuous I/O does your program create? (KB/sec, ... or better to say IO operations/sec).
    Are the computational threads starwed by the IO?


    And yes, the things in /proc are tuning of the kernel, but if you do good tuning,
    you can gain a lot, so why bother with direct I/O if you can fine tune kernel for async IO?
    Things like /proc/sys/vm/dirty_background_ratio have default insane settins
    - 10%, which is a lot (~cca~800MB) on my 8GB machine, not to mention your 64Gig workstation.
    Last edited by Havis; 08-28-2010 at 01:25 PM.
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  5. #530
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    It's on and off I/O. So it isn't a constant rate. Sometimes there will be periods with no I/O, and sometimes it will be entirely I/O.
    Whenever possible they'll be done in parallel.

    But typically, the I/O rate is either zero or maxed out. Rarely are there in-betweens.

    On larger computations there will be large periods of time where I/O is done in parallel with computation.
    For Windows at least, the I/O threads get starved by the computation threads. So I set the I/O threads to high priority to keep them going.

    Basically, a computation thread can occupy an entire core. But an I/O thread uses very little CPU.
    So a computation thread can block an I/O thread, but not the vice versa. Hence why I set I/O threads to high priority in Windows.


    EDIT:

    I just released the latest Linux version. It doesn't have to be just me with all the fancy colors...
    Last edited by poke349; 08-28-2010 at 02:55 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  6. #531
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    Maybe you can do some kind of heuristic readahead to read the data in advance from HDDs before they are needed for computation...
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  7. #532
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Havis View Post
    Maybe you can do some kind of heuristic readahead to read the data in advance from HDDs before they are needed for computation...
    No need for heuristics. The program knows exactly what will be accessed. So it already does prefetching.
    In the places where I didn't make the program prefetch - it's usually because the program needs all the memory it can get to do the current operation.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  8. #533
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Hehe, waiting for the pics.

    Just need some enthusiastic people to start fiddling with huge-threadcount computations on linux with tuned kernels, Windows should be pretty much left behind.

  9. #534
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    Linux benchies are comming

    25mil, 50mil and 100mil:



    250mil,500mil and 1000mil:


    Multi-core Efficiency scaling with larger sizes is pretty great:
    87.892% -> 92.6235% -> 94.2262% -> 96.9757% -> 97.6477% -> 98.0798%

    All benchies were run on Gentoo Linux with 2.6.34-r7 kernel
    (whole system compiled with -march=core2 -msse4.1 -O2 -pipe)
    y-cruncher version: 0.5.4.9157 (fix 1) (x64 SSE3 - Linux)
    Core 2 Quad Q9550@3.4GHz
    Last edited by Havis; 09-07-2010 at 02:34 AM.
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  10. #535
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Here's the picture:

    (click to enlarge)


    The upper-most drive bay is a DVD burner.
    The next one down is the 1.5TB boot drive with racks to mount in a 5.25in slot.
    The rest of the bays hold the 4 x 1 TB.
    The 2TB drive (Anime Backup) that you'll see in the screenie below is an esata external.

    I honestly haven't seem too many rigs that are more crowded than this.


    And here's another comparison between Windows and Linux.
    NTFS in Windows
    ext4 in Linux

    Now I'm really starting to feel the full benefit of keeping all my swap/test drives completely separate from my files.
    The original reason was to keep other files from fragmenting the swap drives and affecting the speed of the runs. And the other reason was just in case I ended up killing a swap drive (from all this torture - and it HAS happened), I wouldn't lose any data.






    I think I'm gonna need to switch to pthreads and then tune/build the ~Nagisa and ~Ushio binaries before Linux will start to beat Windows.
    As for the swap computations... The fact that I can't get raw I/O to be efficient seems to be a problem.
    There's one more thing I haven't tried yet, but I'll do it later.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  11. #536
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    +1 for the northbridge fan
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  12. #537
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Havis View Post
    +1 for the northbridge fan
    10C cooler on the NB...
    That thing idled at like 63 - 65C prior to adding that fan.

    My laptop's i7 idles in the low 60s... wtf...
    My mom's laptop has the same i7 and it idles in the 50s... But her's is a metal case whereas mine is plastic...


    Speaking of which... the FB-DIMMs in my workstation:
    They're rated for 105C, but will throttle the refresh rate when they get to 85C.

    With no cooling:
    - Idle: 70 - 80C
    - Load: 120 - 140C

    With my deafeningly loud fans:
    - Idle: 40 - 50
    - Load: 50 - 75
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  13. #538
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    120-140C @full load ? thats
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  14. #539
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Havis View Post
    120-140C @full load ? thats
    Yep... scared the $#!+ out of me the first time I saw the temps...


    Anyways...

    The first post of this thread (with all the benchmarks) is starting to get too long. So I've moved all but the top 10 people in each category to an external page.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  15. #540
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Thought this might be interesting:

    The # of requests/page views I've had on my site since May:

    Code:
    Week beginning	Number of requests	Number of page requests
    1.	May 2, 2010	5,232	1,148
    2.	May 9, 2010	53,263	9,801
    3.	May 16, 2010	24,916	4,382
    4.	May 23, 2010	9,679	1,966
    5.	May 30, 2010	9,133	1,898
    6.	June 6, 2010	9,221	1,690
    7.	June 13, 2010	9,128	1,787
    8.	June 20, 2010	11,464	1,834
    9.	June 27, 2010	8,341	2,131
    10.	July 4, 2010	9,726	1,860
    11.	July 11, 2010	9,300	1,921
    12.	July 18, 2010	8,335	1,774
    13.	July 25, 2010	7,847	1,817
    14.	August 1, 2010	564,488	53,322
    15.	August 8, 2010	309,789	31,227
    16.	August 15, 2010	72,357	8,748
    17.	August 22, 2010	92,783	10,198
    18.	August 29, 2010	63,690	8,262
    The spike during the week of May 9th and May 16th was from a front page reddit blog about my 500 billion digits of e run.
    And the spike during the week of August 1st, is obviously from the 5 trillion digits of Pi.


    Now that things have settled down. Looks like I'm averaging about 8x more traffic than before.

    Although I haven't been keeping track carefully, it looks like y-cruncher is nearing 20,000 downloads since v0.1.0. (First release was in January 2009.)
    Nearly half of those downloads were during the two weeks following the Pi announcement.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  16. #541
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Hm, interesting. What kind of plans do you have for the next few releases to come? ..in case you're willing to share some info.

  17. #542
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Calmatory View Post
    Hm, interesting. What kind of plans do you have for the next few releases to come? ..in case you're willing to share some info.
    Well... AVX and FMA will be the only things you'll see in the near future. I have other things in mind, but it's uncertain how they will turn out.
    I'm in grad school now as a research assistant so my spare time has gone from through the roof to zero.


    For large computations, about 50% of the runtime is currently vectorized using SSE3 - SSE4.1.
    Since v0.5.2, all SSE code has been ready to be extended to AVX and beyond. All I need now is the hardware.

    The reason why Windows will only get AVX is because the lack of compiler support. The Intel compiler is obviously not gonna support FMA until Haswell comes out.
    On the other hand, I can get both AVX and FMA for Linux using GCC.

    The only issue now is if I can get the hardware.
    There's no way I'm gonna get both Sandy Bridge and Bulldozer hardware - that's hard to justify, beyond my budget, and out of the question (unless I get a donation or something...)
    But there's a high chance that I might get Sandy Bridge just because it will be coming out first. If that's the case, than I can't do FMA...

    I'll probably be skipping Haswell. My pocket can't handle a new rig every generation... We'll see...
    And at some point, I'm gonna need a new ram-monster to be able to maintain the program at those godly sizes...
    Depending on the urgency of a new ram-monster, I might push AVX and FMA support back until Haswell (or whatever AMD has to offer by then) to build a new dual-socket machine with 256 GB of ram.
    (The point here is to save some money by completely skipping Sandy Bridge and Bulldozer...)


    Some of the other things I have in mind is:

    1. A factorization speedup that will make the series 20 - 30% faster. This involves simplifying fractions to make them smaller and faster to work on.
    2. Migrate the program to a new number representation that does not need carrying. The purpose is to make the program much more vectorizable.
    3. And lastly, an MPI version for NUMA and distributed systems. y-cruncher won't be able to run well on those quad-sockets, Beowulf clusters, and supercomputers until this is done.


    All of these are very significant and won't be done in the near future.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  18. #543
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    Can you make some CPU specific binaries for linux?

    compilation options for gcc:
    -mtune=generic (you probably are using this for your linux version)
    -march=core2 (for us Core2/i7 positive ppls )
    and maybe something for AMD ppls too ;-)
    -march=k8-sse3 (dunno if non sse4 is supported ehm)
    -march=amdfam10

    I am just curious about generic vs CPU specific binary speedup. (I saw somewhere, that AMD CPU's are really much better with optimizated binaries...)
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  19. #544
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Havis View Post
    Can you make some CPU specific binaries for linux?

    compilation options for gcc:
    -mtune=generic (you probably are using this for your linux version)
    -march=core2 (for us Core2/i7 positive ppls )
    and maybe something for AMD ppls too ;-)
    -march=k8-sse3 (dunno if non sse4 is supported ehm)
    -march=amdfam10

    I am just curious about generic vs CPU specific binary speedup. (I saw somewhere, that AMD CPU's are really much better with optimizated binaries...)
    Yeah. I'm just taking my time.
    I can't to anything for the next week or so since both my test machines are tied down right now.

    Once they're free again, I'll work on the specialized binaries.
    I'm probably just gonna be mirroring the Windows versions and compiling them using the options you suggested. I'm not gonna have the time to actually go through and tweak each critical-loop for each vendor/architecture and GCC.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  20. #545
    Registered User
    Join Date
    Dec 2009
    Posts
    63
    Heres my results for a moderately overclocked AMD 1090t (I opted for cool and quiet on this machine instead of pushing it to the max ):

    25,000,000 digits:
    6.543 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    50,000,000 digits:
    13.849 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    100,000,000 digits:
    29.577 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    250,000,000 digits:
    83.152 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    500,000,000 digits:
    178.526 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz

    Pics:




    Last edited by sam3; 09-07-2010 at 04:21 AM.
    Asus Crosshair IV | AMD 1090t @ 4.0ghz | 2x2gb G-Skill Trident 1800mhz cl7 |XFX AMD 6970 2gb | 128gb Crucial m4 | Corsiar AX850
    Silverstone SST-FT02B-WRI Fortress | Yamaha A-S500 | Monitor Audio BX2 | ASUS Xonar Essence STX

  21. #546
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by sam3 View Post
    Heres my results for a moderately overclocked AMD 1090t (I opted for cool and quiet on this machine instead of pushing it to the max ):

    25,000,000 digits:
    6.543 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    50,000,000 digits:
    13.849 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    100,000,000 digits:
    29.577 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    250,000,000 digits:
    83.152 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz


    500,000,000 digits:
    178.526 - v0.5.4.9148 (fix 1) x64 SSE4.1 - Sam3 - AMD Phenom II X6 1090T @ 4.06 GHz - 4 GB DDR3 @ 1800 MHz
    Updated
    Cool and quiet at 4 GHz is pretty impressive. I can't get my i7 rig to do that past 3.5 GHz...



    btw. Although I don't plan on making any major improvements to the program for a while, I do feel the need to overhaul the entire digit viewer/writer.

    So after the 5 trillion digits of Pi computation, Shigeru Kondo sent me 3 hard drives containing all the digits.
    That was actually a few weeks ago, but I didn't get the time until now to try and "process" all those digits.

    By "process", I wanted to:
    1. Split them up into 5000 files of 1 billion digits each.
    2. Re-compress them.
    3. And launch the mother of all torrents* - seeded using my university's connection.

    *I don't expect anyone to complete the download - but I do want people to be able to get small portions of the digits.


    Then I realized how horrifically slow the digit viewer is at processing digits...
    It decompresses at like 10 million digits/sec per thread on my workstation... unacceptable... lol

    So for v0.5.5, I'm gonna completely rewrite the entire digit viewer as well as all the digit-processing code.
    This time I'm gonna keep the code separate from y-cruncher so that I can completely open source the digit viewer.

    I noticed that on the many-core machines, the program spends a disproportionate amount of time formatting the digits and writing them to disk - all of which is single-threaded.
    Guess I'm gonna try to fix that.
    Last edited by poke349; 09-08-2010 at 12:03 PM. Reason: typo fix
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  22. #547
    Registered User
    Join Date
    Dec 2009
    Posts
    63
    Quote Originally Posted by poke349 View Post
    Updated
    Cool and quiet at 4 GHz is pretty impressive. I can't get my i7 rig to do that past 3.5 GHz...
    I can get it up to 4.2ghz on air but the fans are unbearable So i keep it at 4ghz with the fans at half speed.
    And i just realised I made a stupid mistake. I left windows 7 in the "balanced" power mode when I ran those benchmarks
    Switching it to High performance knocks a few seconds of the larger becnhmarks

    Good luck on overhauling the entire digit viewer/writer.
    Asus Crosshair IV | AMD 1090t @ 4.0ghz | 2x2gb G-Skill Trident 1800mhz cl7 |XFX AMD 6970 2gb | 128gb Crucial m4 | Corsiar AX850
    Silverstone SST-FT02B-WRI Fortress | Yamaha A-S500 | Monitor Audio BX2 | ASUS Xonar Essence STX

  23. #548
    Xtreme Member
    Join Date
    Sep 2007
    Location
    Alberta, Canada
    Posts
    360
    Nice, I just got a 980x to play with, so of course I thought of your wonderful program to test on it.

    Here's what I got with the CPU at 4484 mHz, memory 2000mHz CAS 10


    Program Version: 0.5.4 Build 9148 (fix 1) (x64 SSE4.1 - Windows ~ Ushio)
    Constant: Pi
    Algorithm: Chudnovsky Formula
    Decimal Digits: 25,000,000
    Hexadecimal Digits: Disabled
    Threading Mode: 16 threads
    Computation Mode: Ram Only
    Swap Disks: 0
    Working Memory: 210 MB

    Start Date: Sat Sep 25 07:31:00 2010
    End Date: Sat Sep 25 07:31:06 2010

    Computation Time: 5.062 seconds
    Total Time: 5.792 seconds

    CPU Utilization: 685.03 %
    Multi-core Efficiency: 57.08 %

    Last Digits:
    3803750790 9491563108 2381689226 7224175329 0045253446 : 24,999,950
    0786411592 4597806944 2455112852 2554677483 6191884322 : 25,000,000

    Timer Sanity Check: Passed
    Frequency Sanity Check: Passed
    ECC Recovered Errors: 0
    Checkpoint From: None

    ----

    Checksum: 0ac68f3f7265c384039d897e4e6431f2484390eab9bfe2e518 fed3bded620349



    Constant: Pi
    Algorithm: Chudnovsky Formula
    Decimal Digits: 50,000,000
    Hexadecimal Digits: Disabled
    Threading Mode: 16 threads
    Computation Mode: Ram Only
    Swap Disks: 0
    Working Memory: 318 MB

    Start Date: Sat Sep 25 07:31:12 2010
    End Date: Sat Sep 25 07:31:24 2010

    Computation Time: 10.355 seconds
    Total Time: 11.534 seconds

    CPU Utilization: 880.55 %
    Multi-core Efficiency: 73.37 %

    Last Digits:
    4127897300 0153683630 8346732220 0943329365 1632962502 : 49,999,950
    5130045796 0464561703 2424263071 4554183801 7945652654 : 50,000,000

    Timer Sanity Check: Passed
    Frequency Sanity Check: Passed
    ECC Recovered Errors: 0
    Checkpoint From: None

    ----

    Checksum: 7dee76017e520349093b5e14a2cb2eaec2e02646a0262bec09 7bf4223e7b8acb



    Constant: Pi
    Algorithm: Chudnovsky Formula
    Decimal Digits: 100,000,000
    Hexadecimal Digits: Disabled
    Threading Mode: 16 threads
    Computation Mode: Ram Only
    Swap Disks: 0
    Working Memory: 537 MB

    Start Date: Sat Sep 25 07:31:24 2010
    End Date: Sat Sep 25 07:31:49 2010

    Computation Time: 22.506 seconds
    Total Time: 24.671 seconds

    CPU Utilization: 953.32 %
    Multi-core Efficiency: 79.44 %

    Last Digits:
    9948682556 3967530560 3352869667 7734610718 4471868529 : 99,999,950
    7572203175 2074898161 1683139375 1497058112 0187751592 : 100,000,000

    Timer Sanity Check: Passed
    Frequency Sanity Check: Passed
    ECC Recovered Errors: 0
    Checkpoint From: None

    ----

    Checksum: fbfbfeb88fc65df8151e2d99c5db6b8dc6ac4c465397917700 101527a3655170

    Constant: Pi
    Algorithm: Chudnovsky Formula
    Decimal Digits: 250,000,000
    Hexadecimal Digits: Disabled
    Threading Mode: 16 threads
    Computation Mode: Ram Only
    Swap Disks: 0
    Working Memory: 1.26 GB

    Start Date: Sat Sep 25 07:31:49 2010
    End Date: Sat Sep 25 07:32:56 2010

    Computation Time: 62.434 seconds
    Total Time: 67.782 seconds

    CPU Utilization: 1009.15 %
    Multi-core Efficiency: 84.09 %

    Last Digits:
    3673748634 2742427296 0219667627 3141599893 4569474921 : 249,999,950
    9958866734 1705167068 8515785208 0067520395 3452027780 : 250,000,000

    Timer Sanity Check: Passed
    Frequency Sanity Check: Passed
    ECC Recovered Errors: 0
    Checkpoint From: None

    ----

    Checksum: 293cf4e9be9a511a0e519f7ead5dbfbfe8e12ea2a0e1a6920f 61ed1a439ab62f



    Constant: Pi
    Algorithm: Chudnovsky Formula
    Decimal Digits: 500,000,000
    Hexadecimal Digits: Disabled
    Threading Mode: 16 threads
    Computation Mode: Ram Only
    Swap Disks: 0
    Working Memory: 2.42 GB

    Start Date: Sat Sep 25 07:32:57 2010
    End Date: Sat Sep 25 07:35:23 2010

    Computation Time: 135.983 seconds
    Total Time: 146.231 seconds

    CPU Utilization: 1038.72 %
    Multi-core Efficiency: 86.56 %

    Last Digits:
    3896531789 0364496761 5664275325 5483742003 7847987772 : 499,999,950
    5002477883 0364214864 5906800532 7052368734 3293261427 : 500,000,000

    Timer Sanity Check: Passed
    Frequency Sanity Check: Passed
    ECC Recovered Errors: 0
    Checkpoint From: None

    ----

    Checksum: 702a7e6122715e3d1d3f665b8fdf683667699a8e813bf8ce11 28f193b3b43c98

    Constant: Pi
    Algorithm: Chudnovsky Formula
    Decimal Digits: 1,000,000,000
    Hexadecimal Digits: Disabled
    Threading Mode: 16 threads
    Computation Mode: Ram Only
    Swap Disks: 0
    Working Memory: 4.76 GB

    Start Date: Sat Sep 25 07:35:23 2010
    End Date: Sat Sep 25 07:40:47 2010

    Computation Time: 302.847 seconds
    Total Time: 323.667 seconds

    CPU Utilization: 1068.55 %
    Multi-core Efficiency: 89.04 %

    Last Digits:
    6434543524 2766553567 4357021939 6394581990 5483278746 : 999,999,950
    7139868209 3196353628 2046127557 1517139511 5275045519 : 1,000,000,000

    Timer Sanity Check: Passed
    Frequency Sanity Check: Passed
    ECC Recovered Errors: 0
    Checkpoint From: None

    ----

    Checksum: b5f8b9ff001aea1b1e49adb2a1acb8ade174290bc26d1e530b c4f64f9c4b3334


    The rig... (well almost a rig)

    Last edited by tet5uo; 09-25-2010 at 05:00 AM.
    EVGA z68 FTW
    i7 2600k @ 4.8
    8gb DDR3 1600
    3x GTX 580 3gb HydroCopper2
    Silverstone Strider 1500W
    Areca 1880i w/ 6x intel x25m
    On water

  24. #549
    Xtreme Member
    Join Date
    Sep 2007
    Location
    Alberta, Canada
    Posts
    360
    Heh lets hope that wasn't the last bench of that rig already... Not gonna be using this equipment for a couple days now.

    The piece of crap SLI fitting that you see in this pic...



    ... decided to spray water all over the place as soon as the pump started getting the loop full.. sorry no pics of that i was too busy scrambling like a maniac to try and stop it

    I'm 99% sure nothing's fried, as I had no power going to anything but my pump and it's distilled water, and no dust anywhere since it's all new stuff.... just that 1% will drive me crazy for a couple days till I feel safe enough that things are totally dry and I can find out for sure if everything survived.

    Also, I have no other way of connecting these damn cards in SLI, the stubby fittings I have still end up being a couple mm too long... Might just grind them down a mm and connect the cards with those and a stub of tubing.

    And I was kinda worried about those connectors being reliable when I saw them, shoulda gone with my instinct .



    TLDR... **** my stupidity..
    EVGA z68 FTW
    i7 2600k @ 4.8
    8gb DDR3 1600
    3x GTX 580 3gb HydroCopper2
    Silverstone Strider 1500W
    Areca 1880i w/ 6x intel x25m
    On water

  25. #550
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Holy $#!+...

    I think those boxes alone are enough to make me drool...

    You've got some nice benchmarks there. List updated!

    As for the spills... yeah, that sucks. Kinda reminds me of how I destroyed a $400 mobo with an extra standoff...
    I'm pretty sure if you let it sit in a dry environment long enough it should be fine. Though you might wanna check on the thermal paste between in some of the components. I'm don't know anything about this area, but it might be possible that paste might be water-soluble to some extent.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

Page 22 of 33 FirstFirst ... 121920212223242532 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •