Results 1 to 6 of 6

Thread: Haswell and AVX2, the massive boost WCG needs

  1. #1
    Xtreme Member
    Join Date
    Jan 2010
    Posts
    323

    Haswell and AVX2, the massive boost WCG needs

    http://www.extremetech.com/computing...-to-nvidia-amd
    Haswell is a logical extension of the microarchtectural improvements Intel first introduced in Sandy Bridge. The new chip adds support for Intel?s second-generation Advanced Vector eXtensions (AVX2), which doubles the core?s peak FPU throughput. L1 and L2 bandwidth have been doubled to ensure the execution units stay fed, and the integer and FPU register files have all been enlarged. Branch prediction efficiency also gets a boost. Haswell?s real-world single-threaded performance in unoptimized code is expected to improve by 10-15%. In optimized, AVX2 code, the leap will be much larger; AVX2 includes support for integer vectorization that AVX lacks.

    The increased FPU capability and additional AVX2 functionality make a huge difference in Haswell?s floating-point performance. The CPU is capable of up to 32 single-precision and 16 double-precision floating point operations per core. That?s twice what Sandy Bridge could achieve; a theoretical eight-core Haswell clocked at 3.8GHz will offer 972.8 gigaflops of SP and 486.4 gigaflops of DP performance.
    If WCG were to integrate AVX/AVX2 instructions in WCG projects... imagine the boost it would give... it would be quite epic.

    A 3770k at 4.4 ghz is what... 120 glops max? It would be at least 3-8 times the performance...

    BTW this isn't an anti-AMD post... I've only owned AMD GPUs and I'm running two AMD rigs right now... I'm sure AMD will include AVX2 in their next CPUs...
    Last edited by vitchilo; 03-12-2013 at 01:19 PM.

  2. #2
    Xtreme Member
    Join Date
    May 2008
    Location
    Sydney, Australia
    Posts
    242
    I think that most of the WCG projects don't use any of the multimedia extension instructions (SSE, AVX etc) at all.
    At least that was the case in earlier days. Reasons stated were that their code is generated by compilers, while the MMX instructions need to be hand-coded in assembler. The other major reason given was that there may be differences in the way that the MMX hardware of some CPU models handles floating-point roundoff of the least significant binary bit. This complicates the WU verification process.

    However, recently I used a bit of water-boarding etc on WCG and managed to coax out the information that the Q-Chemm program, used by CEP2, uses optimised floating-point libraries such as Intel® Math Kernel Library (Intel® MKL) and ATLAS. MKL automatically uses the best instruction set for the particular CPU and guarantees to return identical results from all CPUs.
    Links: Q: http://www.worldcommunitygrid.org/fo...ad?post=412208, A: http://www.worldcommunitygrid.org/fo...ad?post=412395
    http://software.intel.com/en-us/intel-mkl
    http://math-atlas.sourceforge.net/
    [ BlindFreddie --> <--WCG ]

    When AVX2 CPUs come along, CEP2 should/might use it automatically, provided that the version of MKL distributed with the CEP2 program has AVX2 code in it.

    It is possible that other WCG projects that run off-the-shelf programs also use these libraries. CHARMM, used earlier in CEP Phase 1 and DDT2 (now seems finished too) probably uses these libraries too. Rosetta, used by HPF2, is another off-the-shelf program but I have no idea of whether it uses any MMXs.
    A clue can be fund by observing whether your computer draws more power and runs hotter when running instances of the program in question compared to a program which probably does not use MMXs such as FAAH. I forget which of my machines I tested, but I found power went up by about 5W per CEP2 thread.

    Please keep in mind that not every part of a science program can use MMX/vector instructions, and if they can, these may only run for a small proportion of the time. For example, if 50% of the time had to be spent doing non-vector (scalar) sequential operations, the biggest speedup that could be achieved would be a factor of 2, and that's assuming that the vector operations all happen instantaneously!

    If you want to suggest to WCG that they use optimised libraries more widely, please feel free. I think they're sick of my suggestions at this stage.
    Last edited by BlindFreddie; 03-13-2013 at 06:02 AM.

  3. #3
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    Please keep in mind that WCG is first and foremost a BOINC host for all of the different universities that 'own' the projects.
    They do not write the code themselves at best they will offer security reviews and may help port for different OS but that's it so any suggestion that WCG recode to take advantage of AVX is not useful. Perhaps you could consider contacting the project scientists themselves, but even then, if they are using a framwework created by someone else (Q-Chemm, CHARMM, VINA) the scientists don't really have anything to say about it either ... you really need to find out who owns the source code itself.

  4. #4
    StitchExperimen
    Guest
    PrimeGrid, Proth Prime Search (LLR)
    Is able to use AVX on Intel processors.
    I don't know about other projects under PrimeGrid.

    But, as for Haswell the PC market is falling to other devices as "money" makers and AMD now has a processor that can compare with the i7 3770K. But Intel has the cash and large field to diversify into and back in pre-2004 AMD had their own silicon wafer plants but now for processors Intel is almost 2 generations ahead on die shrinkage and AMD is only at 28nm on video cards. A concern of businesses that have to pay power bills and cooling for AMD products being ~2x the heat. AMD also has less than 12,000 employees so they have to diversify into other fields to make money such as unique server relationships and other products.

  5. #5
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    Quote Originally Posted by StitchExperimen View Post
    ... AMD now has a processor that can compare with the i7 3770K.

  6. #6
    Xtreme Member
    Join Date
    Jul 2012
    Posts
    219
    I'm guessing HSA has a better chance of giving home computing a boost in performance, and should make the cpu/gpu Wu separation disappear. If they are going to reprogram anything, I'm sure there's effort to include gpu power after this hcc run.
    As for the avx2 haswell peak, my a10 overclocked has a theoretical peak @ 981gflops, thanks to the power of the gpu.
    Richland 6790K @ 4.713 Ghz / 2208 NB / 1123 gpu / 2304 Ram [96 Bclk]
    F2A85-M Pro, Mushkin Black 2133, iGPU (8760D)
    9.7L case (excluding 230mm fan) or 11.6L w/2nd rad fan

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •