Quote Originally Posted by looncraz View Post
Actually, we already have such an issue known for Bulldozer, and NO bench-marked system has the patch installed!

The shared L1 cache is causing cross invalidations across threads so that the prefetch data is incorrect in too many cases and data must be fetched again. The fix is a "simple" memory alignment and (possible)tagging system in the kernel of Windows/Linux.
!
Here is the patch in question:
http://thread.gmane.org/gmane.linux..../focus=1171713
From: Borislav Petkov <bp <at> amd64.org>
Subject: [PATCH] x86, AMD: Correct F15h IC aliasing issue
Newsgroups: gmane.linux.kernel
Date: 2011-07-22 13:15:47 GMT (11 weeks, 3 days, 2 hours and 46 minutes ago)

From: Borislav Petkov <borislav.petkov <at> amd.com>

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.

This patch addresses the issue by zeroing out the slice [14:12] of
the file mapping's virtual address at generation time, thus forcing
those bits the same for all mappings of a single shared library across
processes and, in doing so, avoids instruction cache aliases.

It also adds the kernel command line option
"unalias_va_addr=(32|64|off)" with which virtual address unaliasing
can be enabled for 32-bit or 64-bit x86 individually, or be completely
disabled.

This change leaves virtual region address allocation on other families
and/or vendors unaffected.
and Linus' response http://article.gmane.org/gmane.linux.kernel/1170744
From: Linus Torvalds <torvalds <at> linux-foundation.org>
Subject: Re: [PATCH] x86, AMD: Correct F15h IC aliasing issue
Newsgroups: gmane.linux.kernel
Date: 2011-07-24 16:04:27 GMT (11 weeks, 23 hours and 59 minutes ago)
Argh. This is a small disaster, you know that, right? Suddenly we have
user-visible allocation changes depending on which CPU you are running
on. I just hope that the address-space randomization has caught all
the code that depended on specific layouts.

And even with ASLR, I wouldn't be surprised if there are binaries out
there that "know" that they get dense virtual memory when they do
back-to-back allocations, even when they don't pass in the address
explicitly.

How much testing has AMD done with this change and various legacy
Linux distros? The 32-bit case in particular makes me nervous, that's
where I'd expect a higher likelihood of binaries that depend on the
layout.

You guys do realize that we had to disable ASLR on many machines?

So at a MINIMUM, I would say that this is acceptable only when the
process doing the allocation hasn't got ASLR disabled.
...
Anyway, I seriously think that this patch is completely unacceptable
in this form, and is quite possibly going to break real applications.
Maybe most of the applications that had problems with ASLR only had
trouble with anonymous memory, and the fact that you only do this for
file mappings might mean that it's ok. But I'd be really worried.
Changing address space layout is not a small decision.