MMM
Page 21 of 33 FirstFirst ... 111819202122232431 ... LastLast
Results 501 to 525 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

  1. #501
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    I know from past experience that O3 does a crazy amount of inlining. So I didn't want it to blow up on me the first time given the codesize of yc.

    I'll play with it later.

    EDIT:
    Found the source of why the validation files were all screwed up.
    Linux uses 4 bytes for wchar_t...

    Which makes it inherently incompatible with Windows... Not sure what I'm gonna do.
    Last edited by poke349; 08-14-2010 at 04:16 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  2. #502
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Here's the first semi-biased comparison between Windows and Linux using the x64 - SSE3 binary.

    It's semi-biased because the program hasn't been tuned for GCC or Linux yet.

    (click to enlarge)







    The validation file is the only thing left to do. So hopefully it won't be too long.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  3. #503
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Seems that it all boils down to CPU multicore utilization, I wonder why though. Could be OpenMP overhead but I doubt it.

  4. #504
    Xtreme Enthusiast
    Join Date
    Mar 2006
    Posts
    703


    Uploaded with ImageShack.us
    Asus RIVE Bios 2003
    3930k 4.5 ghz @1.29v
    G-SKILLS 32gig ddr-1600 ripjaws Z
    Enermax Evo Galaxy 1250W
    2x EVGA GTX 480 Superclocked SLI @ 900/1800/2000
    X-Fi Fatal1ty Titanium PCI-E
    4 x crucial Realssd C300 256 Raid 0
    Areca 1880i
    Seagate 1TB
    CM HAF 932
    On water:
    HK 3.0
    2x MCP655
    FESER X360
    Blackice GTX 480
    DD-GTX 480 VGA blocks
    DD Reservoir
    Windows 7 64bit

    Dell 3008WFP 30"

    Help Save Lives Join World Community Grid!


  5. #505
    Xtreme Enthusiast
    Join Date
    Mar 2006
    Posts
    703
    Are my scores ok? Not sure if I'm running the app correctly.


    Uploaded with ImageShack.us
    Asus RIVE Bios 2003
    3930k 4.5 ghz @1.29v
    G-SKILLS 32gig ddr-1600 ripjaws Z
    Enermax Evo Galaxy 1250W
    2x EVGA GTX 480 Superclocked SLI @ 900/1800/2000
    X-Fi Fatal1ty Titanium PCI-E
    4 x crucial Realssd C300 256 Raid 0
    Areca 1880i
    Seagate 1TB
    CM HAF 932
    On water:
    HK 3.0
    2x MCP655
    FESER X360
    Blackice GTX 480
    DD-GTX 480 VGA blocks
    DD Reservoir
    Windows 7 64bit

    Dell 3008WFP 30"

    Help Save Lives Join World Community Grid!


  6. #506
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Wow!!! You definitely ran it right!
    It can definitely be tuned better, but that's still hard to beat without going multi-socket!


    About the linux port: I just finished fixing a huge (and very annoying) issue with scanf() not working for string inputs... bleh

    So I'll get to the validation file later...

    Other than that, it's mostly working now.
    Benchmark obviously works.
    Stress test works.
    Haven't tested batch mode yet - but I see no reason for it to not work.
    All 3 computation modes seem to work - though I haven't tested the swap modes with multiple HDs yet.
    (I need either my i7 rig or my 64GB workstation to test multiple HDs. But I'm currently moving back to Illinois, so they're both offline right now... )

    And I still need to test the compressed output and the digit viewers...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  7. #507
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Just something I stumbled upon which you mihght find interesting once you start the optimizing work: http://ompf.org/forum/viewtopic.php?f=11&t=896

  8. #508
    Xtreme Enthusiast
    Join Date
    Mar 2006
    Posts
    703
    Here is another result for 250k & 32m & 256m


    Uploaded with ImageShack.us



    Uploaded with ImageShack.us



    Uploaded with ImageShack.us
    Last edited by hlonipha; 08-16-2010 at 06:12 PM.
    Asus RIVE Bios 2003
    3930k 4.5 ghz @1.29v
    G-SKILLS 32gig ddr-1600 ripjaws Z
    Enermax Evo Galaxy 1250W
    2x EVGA GTX 480 Superclocked SLI @ 900/1800/2000
    X-Fi Fatal1ty Titanium PCI-E
    4 x crucial Realssd C300 256 Raid 0
    Areca 1880i
    Seagate 1TB
    CM HAF 932
    On water:
    HK 3.0
    2x MCP655
    FESER X360
    Blackice GTX 480
    DD-GTX 480 VGA blocks
    DD Reservoir
    Windows 7 64bit

    Dell 3008WFP 30"

    Help Save Lives Join World Community Grid!


  9. #509
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Nice results! Updated the list.


    Okay... So due to random reasons, I was told that I absolutely had to get a Linux version released before the end of this week.
    Since my graduate dorm hasn't opened yet, I'm stuck in a hotel right now without access to the all my stuff that's in storage (which includes my two monitors and my i7 rig).
    And because I needed to get this done, I had to go out and grab a new monitor so that I could get my workstation back online... (I only brought the rig with me, no monitor... )

    So I probably have the most ridiculous computer in the entire hotel right now...
    And with that, I spent this whole night working on the Linux port... I think it's working well enough to be released. Enjoy!

    http://www.numberworld.org/y-cruncher/#Download

    I guess at this point, it's gonna be the users who will help me find bugs.

    (click to enlarge)

    Here's a tiny Advanced Swap Mode test using all 8 HDs...


    And here's the 10b all-in-ram test that I do way too often...


    btw, the clock in that Linux boot is completely messed up. It always resets to 7 - 9 hours ahead or behind the correct time every time I boot into Windows and then boot into Linux...
    Last edited by poke349; 08-16-2010 at 09:49 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  10. #510
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Oh. Linux version out, awesome! Now I just need a x64 linux box and I can finally test yc myself.

    Oh, and CPU multicore efficiency went up? Or is it just due to a longer run?

    The clock issue is is because of the difference between how Linux handles the clock and how Windows handles it.

    You have most probably set the linux clock settings to UTC time, which means that to figure out the time, it substracts your timezome(GMT-8?) from the localtime(BIOS clock). Windows doesn't do this, it just uses whatever time the BIOS gives it. I haven't used Ubuntu for a long while so I can't quite help fixing it, apart from suggesting to take a look at the System menu.
    Last edited by Calmatory; 08-17-2010 at 02:54 AM.

  11. #511
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    It's probably just the sheer size of the run.
    The CPU usage approaches 100% as the size goes up when it's all done in ram.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  12. #512
    Xtreme Enthusiast
    Join Date
    Mar 2006
    Posts
    703
    One more crack at it with a little tweaking.



    Uploaded with ImageShack.us
    Last edited by hlonipha; 08-18-2010 at 08:46 PM.
    Asus RIVE Bios 2003
    3930k 4.5 ghz @1.29v
    G-SKILLS 32gig ddr-1600 ripjaws Z
    Enermax Evo Galaxy 1250W
    2x EVGA GTX 480 Superclocked SLI @ 900/1800/2000
    X-Fi Fatal1ty Titanium PCI-E
    4 x crucial Realssd C300 256 Raid 0
    Areca 1880i
    Seagate 1TB
    CM HAF 932
    On water:
    HK 3.0
    2x MCP655
    FESER X360
    Blackice GTX 480
    DD-GTX 480 VGA blocks
    DD Reservoir
    Windows 7 64bit

    Dell 3008WFP 30"

    Help Save Lives Join World Community Grid!


  13. #513
    Registered User
    Join Date
    Dec 2008
    Posts
    67


    Multi-core efficiency: 97.4699 %
    I would say thats pretty good for Core 2 Quad

    Gentoo amd64
    2.6.34-gentoo-r1 kernel
    Core 2 Quad Q9550 @ 3.4GHz, 8GB 1066MHz DDR2

    btw, nice program, do you plan to release it's sources?
    Last edited by Havis; 09-07-2010 at 02:32 AM.
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


  14. #514
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Nice results! And the first Linux benchie.
    Getting kinda late here, so I'll update later... I need to start color-coding the entires for Windows and Linux.

    97% does seem ridiculously high for "any" quad-core machine... I guess that's where Linux starts to show it's advantages over Windows.

    Once I've settled down and finished moving in to my dorm, I'll compile the other binaries for Linux to see what improvements they'll have (if any). Then I'll start playing with the GCC compiler options.
    And when that's done, I'll do a full batch benchmark run from 25m to 10b on my workstation for Windows and Linux to see how they compare.


    p.s. Looks like Linux DOES have colored console output... Need to do that whenever I get the time. At least get it to start "looking like" the Windows versions.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  15. #515
    Xtreme Enthusiast
    Join Date
    Mar 2006
    Posts
    703


    Uploaded with ImageShack.us
    Asus RIVE Bios 2003
    3930k 4.5 ghz @1.29v
    G-SKILLS 32gig ddr-1600 ripjaws Z
    Enermax Evo Galaxy 1250W
    2x EVGA GTX 480 Superclocked SLI @ 900/1800/2000
    X-Fi Fatal1ty Titanium PCI-E
    4 x crucial Realssd C300 256 Raid 0
    Areca 1880i
    Seagate 1TB
    CM HAF 932
    On water:
    HK 3.0
    2x MCP655
    FESER X360
    Blackice GTX 480
    DD-GTX 480 VGA blocks
    DD Reservoir
    Windows 7 64bit

    Dell 3008WFP 30"

    Help Save Lives Join World Community Grid!


  16. #516
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Updated.

    @ hlonipha
    I just noticed that your new 500m is a lot faster.
    Just curious, what did you tweak? 139 -> 126 is a huge.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  17. #517
    Registered User
    Join Date
    Oct 2007
    Posts
    65
    Many thanks for this, love your work

    A question, when it says the "digits appear to be ok" is that actually making sure the CPU has not lost its marbles due to an OC? I like in Prime95 how it will clearly error out if the CPU looses the plot.

  18. #518
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Nullack View Post
    Many thanks for this, love your work

    A question, when it says the "digits appear to be ok" is that actually making sure the CPU has not lost its marbles due to an OC? I like in Prime95 how it will clearly error out if the CPU looses the plot.
    It just means that the digits are correct. So yes it's to make sure that the computation is correct and that there are no errors.

    If the digits are wrong, it will tell you. It means the hardware errored. (it could also be a bug in the program, but that's not likely since everyone runs them all the time.)
    Last edited by poke349; 08-23-2010 at 09:01 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  19. #519
    Diablo 3! Who's Excited?
    Join Date
    May 2005
    Location
    Boulder, Colorado
    Posts
    9,412
    Just checked in on this. Glad you got the Linux version working


    Constant : Pi
    Algorithm: Chudnovsky Formula

    Decimal Digits : 500,000,000
    Hexadecimal Digits: Disabled

    Threads: 16
    Mode : Ram Only

    Start Time: Tue Aug 24 12:45:37 2010


    Allocating and Reserving Memory... 2.42 GB
    Constructing FFT lookup tables...


    Begin Computation:

    Summing Series: 35,256,838 terms
    Time: 164.065 seconds ( 0.046 hours )
    InvSqrt...
    Time: 7.181 seconds ( 0.002 hours )
    Final Multiply...
    Time: 3.799 seconds ( 0.001 hours )

    Pi: 175.045 seconds ( 0.049 hours )


    Constructing Base Conversion Table:
    Time: 0.963 seconds ( 0.000 hours )
    Base Converting:
    Time: 22.879 seconds ( 0.006 hours )

    Writing Decimal Digits: 500,000,001 digits written

    Verifying Base Conversion...
    Time: 5.742 seconds ( 0.002 hours )


    End Time: Tue Aug 24 12:49:09 2010

    Total Computation Time: 198.945 seconds ( 0.055 hours )
    Total Time (with output + verify): 212.407 seconds ( 0.059 hours )

    CPU Utilization: 1190.53 %
    Multi-core Efficiency: 74.4084 %

    Last Digits: Pi
    3896531789 0364496761 5664275325 5483742003 7847987772 : 499,999,950
    5002477883 0364214864 5906800532 7052368734 3293261427 : 500,000,000

    Version: 0.5.4 Build 9150 (fix 1) (x64 SSE3 - Linux)
    Processor(s): Unable to Detect
    Logical Cores: 16
    Physical Memory: Unable to Detect
    CPU Frequency: Unable to Detect

    Benchmark Successful. The digits appear to be OK.

    Result File: Validation - Pi - 500,000,000.txt



    Dual L5520 w/ 72GB of ram running Ubuntu 10.04 x64. Now I just need to find a way to cluster these boxes, 10GbE backbone would make it possible. Time to research Beowulf clusters

  20. #520
    Diablo 3! Who's Excited?
    Join Date
    May 2005
    Location
    Boulder, Colorado
    Posts
    9,412
    10 Billion digits in 4699 seconds. Ran out of disk space, only had a 10GB partition for the main drive

    Constant : Pi
    Algorithm: Chudnovsky Formula

    Decimal Digits : 10,000,000,000
    Hexadecimal Digits: Disabled

    Threads: 16
    Mode : Ram Only

    Start Time: Tue Aug 24 13:01:04 2010


    Allocating and Reserving Memory... 45.0 GB
    Constructing FFT lookup tables...


    Begin Computation:

    Summing Series: 705,136,696 terms
    Time: 4511.288 seconds ( 1.253 hours )
    InvSqrt...
    Time: 124.614 seconds ( 0.035 hours )
    Final Multiply...
    Time: 64.026 seconds ( 0.018 hours )

    Pi: 4699.929 seconds ( 1.306 hours )


    Constructing Base Conversion Table:
    Time: 16.307 seconds ( 0.005 hours )
    Base Converting:
    Time: 586.732 seconds ( 0.163 hours )

    Writing Decimal Digits: 4,232,205,921 digits written
    Error Writing File.

  21. #521
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    lol... Only 10GB.

    Benchmark mode will always write to the path that is in the cmdline when the program is launched. So you can cd to whatever path you want before you launch y-cruncher.

    Custom Compute will let you choose the path.


    Speaking of large tests... I'm running a 50b run in Linux on my workstation... I need to do a large run that isn't bufferable just to see the impact of using non-raw I/Os...

    (click to enlarge)


    Once my new 2TB HD gets here I'll be able to clear out the 4 x 1TB in my i7 machine enough to start doing Linux tests on that as well...

    Then I'll be able to build the Ushio and Nagisa binaries for Linux.
    But I can't build the Kasumi binary for AMD K10 anymore because I no longer have access to the machine that I used to make it...
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  22. #522
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    A little update on the Linux version:

    So I got all the minor features like CPU and frequency detection to work.
    But... I had to use a bit of assembly to access the cpuid and rdtsc instructions to do this... So I guess the program isn't 100% C/C++ anymore...


    As far as optimizations go:

    I tried O3... It doubled the size of the binary and made it about 1% slower...
    So that's a no-go... Though I was half expecting it since I knew it would inline things that I wouldn't want it to.

    So I tried raw I/Os... The raw I/O interface in Linux is the same as in Windows with the same alignment restrictions (all addresses and sizes have to be a multiple of the cluster size of 512 or 4096 bytes).

    That seemed easy enough, so I basically copy/pasted the WinAPI implementation into the Linux implementation and replaced all the I/O calls with the Linux versions...
    Initial tests on 50b tests showed that it ended up being 20 - 50% slower than the non-raw I/Os... (On Windows, raw I/Os were about 10 - 20% faster than non-raw... )

    On the other hand, raw I/Os seem to be fastest on smaller test sizes...
    I'm still toying with this right now... I'm not sure if it's because Linux does something very inefficient with NTFS drives in raw I/O mode... (the 8 x 2TB swap drives are still NTFS formatted since I use them in Windows)
    Whatever the case, this is taking an annoying long time since these tests take half a day or more...

    (click to enlarge)

    My laptop: Intel Core i7 720QM @ 1.6 GHz (stock)


    My i7 rig: Intel Core i7 920 @ 3.33 GHz (3.5 GHz turbo)


    p.s.
    Is it just me or does WUBI eat a small percentage of the ram?
    My laptop is reporting in with 7.8 GB.
    My i7 rig is reporting 11.7.
    But my workstation (which has a native Linux install) reports the full 64GB.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  23. #523
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    It should be possible to get the CPU info without any assembly. At least by reading /proc/cpuinfo and parsing the data from there. It should include everything you'd be interested in.

    I would guess the NTFS support isn't greates on Linux, it used to be very experimental few years ago. I would suggest formatting one of the drives to use ext4 filesystem, it seems to be the fastest all-around filesystem currently, but since there are lots of candidates(xfs, ext2, ext3, reiserfs etc) it could be that there is more suitable filesystem for the work yc does.

    Using -O2 and cherrypicking the best optimizations(if any) from that -O3 enables should result in the best outcome. Also trying to disable some -O2 enabled optimizations could help too, though it's most probably not worth the effort.

    Have you tried any -march/-mtune (they're actually the same as far as I know) switches?
    Last edited by Calmatory; 08-26-2010 at 04:09 PM.

  24. #524
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by Calmatory View Post
    It should be possible to get the CPU info without any assembly. At least by reading /proc/cpuinfo and parsing the data from there. It should include everything you'd be interested in.

    I would guess the NTFS support isn't greates on Linux, it used to be very experimental few years ago. I would suggest formatting one of the drives to use ext4 filesystem, it seems to be the fastest all-around filesystem currently, but since there are lots of candidates(xfs, ext2, ext3, reiserfs etc) it could be that there is more suitable filesystem for the work yc does.

    Using -O2 and cherrypicking the best optimizations(if any) from that -O3 enables should result in the best outcome. Also trying to disable some -O2 enabled optimizations could help too, though it's most probably not worth the effort.

    Have you tried any -march/-mtune (they're actually the same as far as I know) switches?
    So it says -O3 includes:

    -finline-functions
    -funswitch-loops
    -fpredictive-commoning
    -fgcse-after-reload
    -ftree-vectorize

    I looked at each of those, and it looks -finline-functions is the only one that will affect performance-critical code. And since most of the stuff that is worth inlining are already inlined (by abusing macros)... probably explains why it backfires.

    The list of other optimizations in GCC is huge... I haven't looked at all of them yet. One that seems interesting is -funsafe-math-optimizations, but I'm not sure if it's able to optimize around messy SSE instructions that mix stuff like addsubpd and haddpd/hsubpd into the normal addpd/subpd instructions.
    I'll try this later once I'm through with the raw I/O stuff.

    EDIT:
    I'm actually pretty surprised at how many "unsafe" optimizations I can turn on and still keep the program working... but none of them seem to produce any noticeable speedup.


    I'm not a fan of compiler tuning/profiling since it tends to give inconsistent speeds between different versions of the program...
    Last edited by poke349; 08-26-2010 at 08:16 PM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  25. #525
    Registered User
    Join Date
    Dec 2008
    Posts
    67
    a little pic, that helps a lot


    btw, why not just use tmpfs (it's almost like RAMdisk), put there 50gig file and test your RAW I/O from there?
    I think there is no need to run the computational part and create same 50gig PI file over and over again...

    tmpfs hint:
    add one line to your /etc/fstab
    something like:
    tmpfs /testdir tmpfs size=50G,mode=0775,uid=YOURUSERNAME,gid=YOURGROUP 0 0

    edit yourusername and yourgroup to match your's
    (or just set mode to 0777 and make that directory writeable for everyone)
    be sure not to delete the other lines in fstab

    and then run "mount -a"

    btw, do you use ncurses for colored output?

    with ntfs I recomend ntfs-3g, it's userspace implementation, and works very well, (last time I used it was about a year ago and it was flawless)
    in Ubuntu search with aptitude for ntfs3g or ntfs-3g (not sure how it's called there...)

    PS:
    You are not OCing, you are not Xtreme
    EDIT: ok, just noticed you are OCing your i7, good boy
    Last edited by Havis; 09-07-2010 at 02:31 AM.
    Core2 Q9550 | P5Q Deluxe | 4x 2GB Corsair Dominator 1066MHz | TT Frio | AMD Radeon HD6970 | 4x 1TB Samsung Spinpoint F1


Page 21 of 33 FirstFirst ... 111819202122232431 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •