I admit having used a simplification. So the transistors populating the said die area will use less leakage.
But why do you mention faster transistors? Pipelining per se is about cutting work into smaller pieces of work, not doing it with faster transistors. Then I probably wouldn't need to further pipeline the circuit in question.
Example: an FP multiplier has a latency of 1000 ps or 1 ns. So if I use it in one pipeline stage (for simplification we leave out operand catching etc.) I could clock it at 1 GHz, being able to feed it once per 1 ns and get a result at the same rate. Latency is just one cycle.
With 2 pipeline stages, some additional latch overhead and some inefficiency due to cutting the multiplier in two about equally fast pieces, the overall latency could become 1100 ps. But I could clock it at 1.8 GHz with two stages of 550 ps. I could feed the multiplier at that rate and get results at that rate. Latency would be 2 cycles. Another slight disadvantage would be, that there could be up to 2 multiplications going on at any time vs. one in the 1 cycle version. Two muls mean more power consumption. OTOH I don't increase energy per instruction.
That's the principle.





Reply With Quote
Bookmarks