Sorry guys. I've been busy for a while and kinda neglected both the program and this thread.
I'll try to get the lists back up to date in the new week or so.
The issue with it not detecting AVX support in Windows 8 is a known problem. I've had multiple reports of this.
Currently, I don't have a Windows 8 machine to test the fix for this. So I'm not gonna bother with it in the meantime.
For now, the work-around is to simply go into the "Binaries" folder and run the "x64 - AVX ~ Hina" binary manually. The "y-cruncher.exe" binary in the main folder is just a launcher that detects the environment and tries to pick the best binary to run. You are free to override what it chooses.
This is also a known "problem". On Bulldozer and family, the FPU can sustain either 2 x 128-bit instructions or 1 x 256-bit instructions per cycle. In other words, there is no benefit to using AVX. Furthermore, there are hardware "optimizations*" that only apply to 128bit instructions.
Combine that with the extra overhead of packing/unpacking 256-bit SIMD and the it results in a significant net slowdown.
Currently, my AVX, FMA, and XOP codepaths are all 256-bit. I'm somewhat torn on whether I make 128-bit codepaths just for Bulldozer and family. Or whether I should just leave it and hope AMD will eventually bring 256-bit up to par in the future.
As for FMA3 vs. FMA4: I plan to set all the FMA codepaths to use FMA3 and all the XOP codepaths to use FMA4. That said, I currently don't have the hardware to properly test either one of these. Whether or not v0.6.1 will have them will depend on whether I finish it before or after I get my hands on the needed hardware.
*For those familiar with low-level details, I'm specifically talking about the register move renaming. Bulldozer has it for 128-bit SIMD, but not for 256-bit.
Bookmarks