Quote Originally Posted by kl0012 View Post
While I really believe you designed all that stuff (I designed by self some functional blocks, including adder and multiplier, during an university VLSI study), it is still a bit perkily to think that no one else can do it better (or at least in a different way) then you.
I also trust an intel engineers about a difficulty in designing an efficient execution units.
http://www.intel.com/technology/itj/...er/8-radix.htm
If a design is "feedback free" then hyperpipelining is a fully automated
process. Synopsis design compiler has had such features for years.
The software has to find the locations to which the signals have
propagated after a 1/2 cycle (or 1/N cycle in general) and then place
the flipflops there which will constitute the intermediate pipeline stages.

Intel (that is, the process guys) may probably have developed its own
software specifically adopted to Intel's own process physics model.


"Feedback free" is when the logic doesn't need to know what it just
calculated in the previous cycle. The P4 ALU was not feedback free
because it needs results from the previous cycle:

non-hyperpipelined:

----> C = A + B;
----
----> E = C + D;
----

hyperpipelined:

----> C = A + B;
----> E = C + D;
---->

The problem is that the result C is not yet known after a 1/2 cycle,
so you have to design logic which works with the intermediate result
instead. There exist tricks which do so. In general these tricks are
different for different functions and sometimes there are no tricks.

Hyperpipeling for circuits which are not "feedback free" is a Science
on its own right. The risk is that all these tricks blow up the size of
the circuits as well as its power consumption.


Regards, Hans