lol... I thought I was always bad at explaining stuff.
Also, I don't always know what I'm talking about. So take my posts with a grain of salt.
What you're seeing is load imbalance due to NUMA. What's happening is that some nodes run faster than others depending on where the data is. So you'll find one or two nodes lagging far behind the others - and when the threads on the faster nodes finish first, the slower threads will hang around a lot longer afterwards. So you see low CPU usage.
I was confused as well when I first noticed this on the 4 x 4 that Skycrane sent.
From the beginning, I had already suspected this was the cause. But I couldn't confirm it until I wrote some mini-benchmarks to specifically test for this.
There may be more reasons to it, but so far that's the only explanation I have.







Reply With Quote


Bookmarks