Instruction set design for CPUs and GPUs

  1. nn_step
    Registers count actually has a point of diminishing returns when it comes to architectural registers. In fact the ideal register count is 88.5 but since we can't have a half register and we tend to allocate 2^n registers. The ideal number of registers is thus 64 architectural registers. Anything else will just increase cycle time without any performance improvement.

    You must remember, register count is always a compromise between cycle time and register count.

    That was the optimal point, given intensive analysis of over 68,000 software packages. Any additional registers, yielded virtually zero additional performance.

    Now, we made some additional assumptions such as a pipelined architecture with bypassing and a 12-stage instruction pipeline. But reducing the pipeline stage count, only reduced the register demand but never below 43. [It would be extremely difficult to implement any multi-Ghz CPU in only 1 stage]
  2. nn_step
    Now we have also found that unifying the register file greatly improves power and efficiency of the instruction set at only a small cost of a Mirrored set of Register Files. This scheme was used as it reduced the number of write and read ports required to serve operands and receive results, thus reducing the physical size of the register file, enabling the microprocessor to operate at higher clock frequencies. Writes to any of the register files thus have to be synchronized, which required a clock cycle to complete, negatively impacting performance by one percent. The reduction of performance resulting from the synchronization was compensated in two ways. Firstly, the higher clock frequency achievable offset the loss. Secondly, the logic responsible for instruction issue avoided creating situations where the register file had to be synchronized by issuing instructions that were not dependent on data held in other register file where possible.
Results 1 to 2 of 2