Quote Originally Posted by kl0012 View Post
Your post is a pure speculations. You really don't know who has actually worked on architecture of SB. Also even if assume that those was the same ppl who worked on Netburst, still you have no even a litle bit of info how really AVX was implemented. I can recall you that a "real" 128 bit SSE was firstly implemented by a Haifa team which is curently works on SB.

Your assumptions are not nessesary true. While a general multiplication algorithm looks relative easy to serialize it is not nessesary the case in a real life since multiple heuristic may be added to a HW algorithm to make it using less power/space, make it faster, e.t.c. I heard that it was a big chalenge for intel to implement fast radix-16 divider in Penryn. Also while I don't know how much space consumes an fp multiplier I may assume that hyperpipelining may consume more space (as an example -to save intermidate results in a multiplication loop) then implement additional multiplier. Any way even in Netburst Intel implemented "doublepumping" only for some integer ops and decided not to implement a double pumped alu for a compex integer operations such as divide/multiply.
These observations about hyperpipelining are really far beyond any
reasonable doubt.

I designed IEEE compatible Floating Point units myself, not only the usual
multiply/add ones but also much more complicated ones like fully pipelined
FP complex function unit which can output each cycle the result of any of a
square root, reciprocal, exponent, logarithm, sine/cosine, arcsine/arcosine,
while having any mix of these in the pipeline simultaneously.

So I think you may thrust me on this....


Regards, Hans