I am not sure I undestand if you understand what I am trying to say...
A shared resource clocked at one speed to 4 other resources clocked at different speeds will necessitate asyncronous communications... there is no other way... thus AMD must provide functionality to account for floating clocks between 4 cores to one memory pool, L3.... just adding circuits to do this work will incur latency...
Add on top of that, 1:1 divide latency < 3:2 divider latency < 2:1 divider latnecy... hence the 'observed' latency from any core is variable...... at least if you read Kanter's article this is what the FIFO buffers do... he did not mention the x-bar.
There is research ongoing to work on achieving both low BW and low latency asynchronous networking, but there has always been this fundamental trade-off:
http://www.ee.technion.ac.il/courses...OC-async05.pdfPreviously published NoCs which provide GS are ÆTHEREAL [18][9] and NOSTRUM [14]. Both are synchronous and employ variants of time division multiplexing (TDM) for providing per connection bandwidth (BW) guarantees. TDM has the drawback of the connection latency being inversely proportional to the BW, thus connections with low BW and low latency requirements, e.g. interrupts, are not supported.
Not quite the paper I would use, but the one I could find recently written that summarized the issue at hand that I could quote as a source and not have you take my word for it .... i.e. connection latency is hard to get very low in networks where a globalized clock is not real.... here he discusses time division multiplexing, a type of clock dividing.
Edit: Found another paper which is much more detailed, and has some info on the FIFO implementation over a global clock:
http://www.collectionscanada.ca/obj/...11/MQ34126.pdfSimulation results for the FIFO and the two versions of the adder are given in Table 1. The
optimized adder has 2-input c-elernents while the other adder is using 4-input C-elements.
The operations/second indicate the number of logic evaluations done pcr second in each
basic cell. Cycle time is the fastest time at which the pipeline cm send out successive data
values. Latency is the time it takes for data to go from the input of the circuit until it is
finally ready at the output. Pipelined systems work on the principle of reducing the cycle
time at the cost of increased latency. The next section examines how an enhancement to
the system cm reduce the latency even further.
(see page 73). This is an old paper, but he is showing 18 ns latency for a straight up FIFO buffer. This is a large number, and not to be considered true or accurate wrt K10.
Jack
Bookmarks