Quote Originally Posted by Solus Corvus View Post
No, I'm dismissing them because they are an obvious attempt to rationalize your position.
And rationalisation is bad how?
I'm not simply defining something for the sake of defining it like you are.

That a shared scheduler can operate on independent threads, for example, is irrelevant to the point of what actually constitutes a core.
That doesn't even make sense. How could it not be relevant in the context of Bulldozer?

The definition of what is part of the core varies by who you are talking to (intel, amd, my CS professors, etc) and over time.

Early patents related to BD had the core and module terms reversed, so clearly it isn't as cut and dry as you suggest. Traditionally everything that wasn't part of IO, memory controller, and cache hierarchy was part of the core. Integration changes all of that and makes the existing terms ambiguous. If future versions of the bulldozer arch allow separate int cores to work on the same thread or eager execution allows two cores to work on one thread then are we to consider a module only one core?
All you're doing is arguing definitions.

If you're going down that route, you COULD technically define 1+1=3 as being true.

A BD module functions the same as two cores, with the added ability to schedule two 128-bit micro-ops from one AVX instruction on the two 128-bit FP pipelines.
So each half of the FP unit can work on only half of an AVX instruction while the other half can process something separate?
Yes.
EVERY 256b AVX instruction is decoded into two 128-bit micro-ops, then sent to the FP scheduler.
The FP scheduler can send both 128b micro-ops for that AVX instruction to one FP pipeline or both, depending on what's available.

It is not possible to process one 256b micro-op on 128-bit pipelines even if there are two of them. They have to be decoded into two 128-bit micro-ops.
No, I don't think so. To process a 256-bit instruction both halves are obligatory from John's description. They can't process half of an AVX instruction and leave the other half for later.
You clearly don't understand how x86 decoding works.
An AVX instruction IS NOT processed as one 256b micro-op. It is processed as two separate 128b micro-ops.
It is quite the opposite. You are saying that you have the only proper definition of core.
I'm not even arguing definitions. I'm actually arguing functionality.

Except that one pipeline can't do both halves of an AVX instruction.
It can do both of them. It can't do both of them at the same time, but one ofter the other.

That would mean that the circuitry for computing all parts of an AVX instruction are present in both halves of the FP unit. That would be counterproductive and defeat the purpose of sharing it in the first place.
You won't always have both FP pipelines available. And it does save transistors.

If both 128b micro-ops get simultaneously processed on both 128b pipelines 50% of the time, you will still have better performance than if it happened 0% of the time (like with separate schedulers).
Like I said, you have set yourself on one definition when, frankly, it is a really ridiculous thing to argue over. I don't see the point of discussing it further. Does a 4870 have 1 core, 160 cores, or 800 cores?
irrelevant since we are talking about x86 cores and BD.
What would we call a cpu if it had a hundred INT units, 20 FP units, and one frontend/backend?
That would be completely different, and not analogous to BD.