The explanation of Errata 298
Virtualization exhaserbates the issue since the resource sharing will change context depending on loading of the cores. However, the actual errata can trigger just by running multithreaded code in which the page is labeled in correctly. This is part of a cache coherency problem and not restricted soley to virtualization.The processor operation to change the accessed or dirty bits of a page translation table entry in the L2 from 0b to 1b may not be atomic. A small window of time exists where other cached operations may cause the stale page translation table entry to be installed in the L3 before the modified copy is returned to the L2. In addition, if a probe for this cache line occurs during this window of time, the processor may not set the accessed or dirty bit and may corrupt data for an unrelated cached operation. The system may experience a machine check event reporting an L3 protocol error has occurred. In this case, the MC4 status register (MSR 0000_0410) will be equal to B2000000_000B0C0F or BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be equal to 26h
What this basically is saying is that there are a few cycles between the time a page table is altered and the time the table is marked dirty. If a core grabs that shared memory (say a multithreaded program is using the same page table) before it is marked, then it will incorrectly load the wrong data into L3.
EDIT: But a strong point needs to be made.... that window of time is likely very small, so small that the probability of happening is slim to none ... this errata does not give cycles of time that it takes to update the TLB and even if it is 10's of cycles, it is such a small window, for typical DT usage one may never actually trigger it and if it does, the frequency of M$ blunders would overwhelm the signal.![]()
Bookmarks