The FPU is it's own unit. Neither core really owns it or owns a half of it. The threads from either INT core can send FP commands to the FPU, but only one thread at a time. The commands are decoded, renamed, scheduled, and buffered to wait for execution on the appropriate pipe.
From page 37 of the bulldozer optimization guide I posted a link to above:
Skipping over a few points to page 38:FPU Features Summary and Specifications:
•The FPU can receive up to four ops per cycle. These ops can only be from one thread, but the
thread may change every cycle. Likewise the FPU is four wide, capable of issue, execution and
completion of four ops each cycle. Once received by the FPU, ops from multiple threads can be
executed.
•Within the FPU, up to two loads per cycle can be accepted, possibly from different threads.
•There are four logical pipes: two FMAC and two packed integer. For example, two 128-bit
FMAC and two 128-bit integer ALU ops can be issued and executed per cycle.
•Two 128-bit FMAC units. Each FMAC supports four single precision or two double-precision
ops.
Please examine Figure 3 on page 38 to help understand the first and last bullet point. I'll BRB if you want further clarification.•Only 1 256-bit operation can issue per cycle, however an extra cycle can be incurred as in the case
of a FastPath Double if both micro ops cannot issue together.




Reply With Quote
Bookmarks