MMM
Results 1 to 25 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

Threaded View

  1. #11
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by alpha754293 View Post
    I don't think that there's numactl for cygwin. I tried looking for it and also trying to see if I compile it from source and get the following error:

    Code:
    $ make
    Makefile:179: .depend: No such file or directory
    cc -MM -DDEPS_RUN -I. bitops.c libnuma.c distance.c memhog.c numactl.c numademo.
    c numamon.c shm.c stream_lib.c stream_main.c syscall.c util.c mt.c clearcache.c
    test/*.c > .depend.X && mv .depend.X .depend
    cc -g -Wall -O2  -I.   -c -o numactl.o numactl.c
    cc -g -Wall -O2  -I.   -c -o util.o util.c
    util.c: In function `memsize':
    util.c:80: warning: array subscript has type `char'
    cc -g -Wall -O2  -I.   -c -o shm.o shm.c
    shm.c: In function `attach_shared':
    shm.c:129: error: storage size of `st' isn't known
    shm.c:140: warning: implicit declaration of function `fstat64'
    shm.c:143: warning: implicit declaration of function `ftruncate64'
    shm.c:157: warning: implicit declaration of function `mmap64'
    shm.c:157: warning: assignment makes pointer from integer without a cast
    shm.c:129: warning: unused variable `st'
    make: *** [shm.o] Error 1
    Source from http://oss.sgi.com/projects/libnuma/
    I'm getting speedups of up to 30% using numactl interleave on Linux. The speedup depends on the size - for smaller sizes, it sometimes backfires.
    I'll accept any settings as valid benchmarks. Tuning the OS like this is part of the game, lol.

    As far as I can tell, only Windows Server has any sort of NUMA-awareness support. But you need Win Server anyway to get more than 2 sockets in the first place.
    There might be a setting somewhere in the OS that can be set to force interleaved memory allocation. If there isn't, maybe there's a special WinAPI malloc() function that does interleaved allocation. If I find it, I'll try it out.

    But in any case, interleaved memory allocation isn't a "solution" to NUMA.
    It lets the program get the full memory bandwidth, but it doesn't get rid of the interconnect contention and latency.

    The program will still need to be redesigned to run well on NUMA.


    EDIT:
    Got my B3 stepping today. I finally get my SATA ports back. (I still don't get how I managed to kill them in only 2 months...)
    Gonna have to re-test my OC tomorrow after the TIM sets. That thermal pad on the H50 was great, but it's only good for the first use, afterwards it's too uneven so I had to scrape it all off and use some of my leftovered Arctic Silver.
    Last edited by poke349; 03-29-2011 at 02:51 AM.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •