Quote Originally Posted by Formula350 View Post
So I'm able to conclude:
a. Each core's FPU can process it's own information on a single cycle, but still only has a single scheduler. So I assume a core will receive it's orders ever so slightly delayed after the other core.
The FPU is it's own unit. Neither core really owns it or owns a half of it. The threads from either INT core can send FP commands to the FPU, but only one thread at a time. The commands are decoded, renamed, scheduled, and buffered to wait for execution on the appropriate pipe.

From page 37 of the bulldozer optimization guide I posted a link to above:

FPU Features Summary and Specifications:
•The FPU can receive up to four ops per cycle. These ops can only be from one thread, but the
thread may change every cycle. Likewise the FPU is four wide, capable of issue, execution and
completion of four ops each cycle. Once received by the FPU, ops from multiple threads can be
executed.
•Within the FPU, up to two loads per cycle can be accepted, possibly from different threads.
•There are four logical pipes: two FMAC and two packed integer. For example, two 128-bit
FMAC and two 128-bit integer ALU ops can be issued and executed per cycle.
•Two 128-bit FMAC units. Each FMAC supports four single precision or two double-precision
ops.
Skipping over a few points to page 38:
•Only 1 256-bit operation can issue per cycle, however an extra cycle can be incurred as in the case
of a FastPath Double if both micro ops cannot issue together.
Please examine Figure 3 on page 38 to help understand the first and last bullet point. I'll BRB if you want further clarification.