Sorry guys. I've been busy for a while and kinda neglected both the program and this thread.

I'll try to get the lists back up to date in the new week or so.

Quote Originally Posted by st0ned View Post
Any chance of a Windows 8 Update ? I mean to support AVX on windows 8 :P
The issue with it not detecting AVX support in Windows 8 is a known problem. I've had multiple reports of this.
Currently, I don't have a Windows 8 machine to test the fix for this. So I'm not gonna bother with it in the meantime.

For now, the work-around is to simply go into the "Binaries" folder and run the "x64 - AVX ~ Hina" binary manually. The "y-cruncher.exe" binary in the main folder is just a launcher that detects the environment and tries to pick the best binary to run. You are free to override what it chooses.

Quote Originally Posted by Utroz View Post
Nice times and multi-core efficiency CRFX. I am curious why you(CRFX) ran x64 see3 kasumi code path as opposed to the x64 AVX Hina code path? Maybe they can release a FMA code path that would be even faster. (leaning towards FMA3 because it will be supported by future intel haswell and current Amd piledriver cores as opposed to FMA4 which is AMD bulldozer and piledriver only afaik but if it is not to hard to make both it would be cool to compare on piledriver and see whats faster FMA3 or FMA4)
Quote Originally Posted by CRFX View Post
I found the AVX Hina version way slower on both my bulldozer and piledriver chips. The Kasumi version is the fastest of all the executable included, for me at least.
I noticed Y-cruncher 6.1 will include FMA4, so that should speed things up a bit.
This is also a known "problem". On Bulldozer and family, the FPU can sustain either 2 x 128-bit instructions or 1 x 256-bit instructions per cycle. In other words, there is no benefit to using AVX. Furthermore, there are hardware "optimizations*" that only apply to 128bit instructions.

Combine that with the extra overhead of packing/unpacking 256-bit SIMD and the it results in a significant net slowdown.

Currently, my AVX, FMA, and XOP codepaths are all 256-bit. I'm somewhat torn on whether I make 128-bit codepaths just for Bulldozer and family. Or whether I should just leave it and hope AMD will eventually bring 256-bit up to par in the future.

As for FMA3 vs. FMA4: I plan to set all the FMA codepaths to use FMA3 and all the XOP codepaths to use FMA4. That said, I currently don't have the hardware to properly test either one of these. Whether or not v0.6.1 will have them will depend on whether I finish it before or after I get my hands on the needed hardware.

*For those familiar with low-level details, I'm specifically talking about the register move renaming. Bulldozer has it for 128-bit SIMD, but not for 256-bit.