Looking at Mark's answer again with a little more time, I see that he refers to using 128 bit wide EUs (point 1) by Eric) during 2 consecutive clock cycles as double pumping (and thus execute a 256 bit SIMD instruction). This is a different kind of double pumping using base clock cycles as smallest unit.
Ok. Let me cite another Intel employee posting in the same thread (and has been quoted here already IIRC):
With "misleading" you mean the first AVX LO/HI chart? I think there simply was not much time between publishing the first chart and the corrected one to have some meaningful effect.It seems point 1) may have assumed it requires monolithic 256-bit hardware to achieve 1 cycle throughput for 256-bit AVX instructions. That's not true.
OTOH saving die area and thus leakage (which would limit performance otherwise) doesn't look to be a bad choice. So why are you defending a monolithic implementation so heavily?
@Hans:
Thanks for the links.





Reply With Quote
Bookmarks