The front end uses vertical multithreading. One thread per cycle.
This is how it could work from cycle to cycle:
Decode 4 instructions of thread 0
Decode 4 instructions of thread 1
Decode 3 instructions of thread 0 (some suboptimal instruction mix)
Decode 4 instructions of thread 1
Decode 4 instructions of thread 0
Decode 4 instructions of thread 0 (core 1 has to wait for memory)
Decode 3 instructions of thread 0 (suboptimal instructions)
Bookmarks