
Originally Posted by
Calmatory
What if there are multiple threads with lots of AVX instructions? Single module can only feed one AVX instruction at a time, or two 128-bit SSEx instructions, or 4 64-bit FPU instructions, right?
Up to <number of modules> threads running AVX there should be no performance penalty as long as there is no other FP instructions in the fly. More there are, lower the AVX performance will be. And if one adds more AVX threads, the FPU units will just starve and there is no performance improvement?
In short: If I want to do lots of AVX, I can only run <number of modules> threads for improved performance?
Bookmarks