BTW, they're saying it brings +xx% with only 33% bigger area, right? Well, +33% over a single thread BD module. But, what about 10h cores? A 2-thread BD module is almost twice the size of a 10h core! I didn't counted about but it's roughly as much powerful (c2c) as much it's bigger, isn't it?
What's the rationale behind it, then, one could ask? I think these:
- It allows for higher clocks, and so higher performance. In theory, at least. I wonder how much...
- It allows for turning off the whole FPU, gaining even more frequency headroom. We will see, how much, as well.
- It allows for replacing the FPU with some GPU-like compute units. I don't know it it's the plan, but I think this one is the most encouraging.
- ...?
BTW 2, I was speculating on the possibility of clocking the integer clusters higher than the rest of the module. When they were to reveal some "BIG secrets" a few days back I was thinking is it perhaps about this...?

More so that I've heard something like if they were trying to solve some issues, but failed, for now. I still wonder if it makes sense...? We know the prefetch and decode logics can handle 4 instructions per cycle, forwarding 4 macro-ops. Then, each integer unit can receive 4
micro-ops, which is essentially 2 macro-ops, so half of the amount the front-end forwards. And it's allways half of it. (In contrast to SMT...) But, what if there are 3 or even 4 independent instructions in a thread? I know it's rare, and so the average IPC is around 1.00 (many times even less), I was also playing with performance counters. But still, it's an average amount, with high and low borders...
Of course, I know about the importance of the front-end in achieved IPC, and I hope the Bulldozer is advanced in this respect, as well.
All in all, I still count on Bulldozer, and I know it can be a surprise (a positive one). Well, if not the current implementation, then the next one (Piledriver).

(Unless should it turning out to be
too weak, from the basics of it, but I don't think so.)
Bookmarks