Your post is a pure speculations. You really don't know who has actually worked on architecture of SB. Also even if assume that those was the same ppl who worked on Netburst, still you have no even a litle bit of info how really AVX was implemented. I can recall you that a "real" 128 bit SSE was firstly implemented by a Haifa team which is curently works on SB.
Your assumptions are not nessesary true. While a general multiplication algorithm looks relative easy to serialize it is not nessesary the case in a real life since multiple heuristic may be added to a HW algorithm to make it using less power/space, make it faster, e.t.c. I heard that it was a big chalenge for intel to implement fast radix-16 divider in Penryn. Also while I don't know how much space consumes an fp multiplier I may assume that hyperpipelining may consume more space (as an example -to save intermidate results in a multiplication loop) then implement additional multiplier. Any way even in Netburst Intel implemented "doublepumping" only for some integer ops and decided not to implement a double pumped alu for a compex integer operations such as divide/multiply.The SIMD units are the easiest (of all units) to hyperpipeline. All instructions
which could cause problems for hyperpipelining have been systematically
left out of the AVX and LNI specifications. (for instance data shuffles
crossing 128 bit boundaries)




Reply With Quote
Bookmarks