AMD cuts to the core with 'Bulldozer' Opterons

**Chumbucket843** · 08-11-2010, 10:55 AM

Originally Posted by Dresdenboy

I admit having used a simplification. So the transistors populating the said die area will use less leakage.

But why do you mention faster transistors? Pipelining per se is about cutting work into smaller pieces of work, not doing it with faster transistors. Then I probably wouldn't need to further pipeline the circuit in question.

using die area as an estimation for static power can be very misleading. leakage increases linearly with transistors and exponentially with drive current.

Example: an FP multiplier has a latency of 1000 ps or 1 ns. So if I use it in one pipeline stage (for simplification we leave out operand catching etc.) I could clock it at 1 GHz, being able to feed it once per 1 ns and get a result at the same rate. Latency is just one cycle.

With 2 pipeline stages, some additional latch overhead and some inefficiency due to cutting the multiplier in two about equally fast pieces, the overall latency could become 1100 ps. But I could clock it at 1.8 GHz with two stages of 550 ps. I could feed the multiplier at that rate and get results at that rate. Latency would be 2 cycles. Another slight disadvantage would be, that there could be up to 2 multiplications going on at any time vs. one in the 1 cycle version. Two muls mean more power consumption. OTOH I don't increase energy per instruction.

That's the principle.

i understand the concept of pipelining.

if speed is critical you might want to use a pulsed latch.

Thread: AMD cuts to the core with 'Bulldozer' Opterons

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions