Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)
![]()
OK, so let me get the gist of all of this whole thread down to two statements:
1. People are claiming Bulldozer will be slower than existing products because they are sharing resources in the processor and sharing is inherently worse.
2. People are claiming that even though Bulldozer has dedicated resources relative to the old architecture that shares them, this is worse.
OK, I got it now.
When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)by John Fruehe
I'll make it short and easy to understand. Original quote:
Which is 100% true, as K10 has more execution units. I don't see the words perfomance, shared or dedicated in this post. Then you say:
Which is wrong, based on the above. I just pointed it out, but it seems it was a perfect excuse to ignore what the guy is actually saying (as you like to do) and repeat the same post you've been repeating how many times now?
I hope you properly get it now.
Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)
![]()
Quotes taken out of context can be true, but in context they can mean something different. Your quote was a response to my post about pipes. He is trying to make 3 pipelines appear like 6 pipes. Which is a twist to the truth.
BD has more resources since it can use 2 ALUs and 2 AGUs every clock, Phenom II averages at 1.5 ALUs and 1.5 AGUs since the share pipe. Again, if you can't use it, it isn't a resource. 2+2=4 (3+3)/2=3.
The discussion is still around IPC. Even if you try to make it look different. And it's still about BDs integer execution capacity compared to k8 (10h), we are pointing out that BDs 4 pipes seems a bit stronger than K8s 3 pipes.
And by adding the different parts of K8s pipeline together some people here are trying to make them look twice as strong.
4 pipes equals more resources than 3.
Last edited by -Boris-; 08-31-2010 at 06:24 AM.
The thing is that it uses it. If the CPU can't use all 6 at the same time that's another thing. All 6 will get used at some point. Either way, they are on the die, they're connected, and they are used. Alternatively, not at the same time, whatever. But they are there, they are used and thus they are a resource. K10 has more resources than BD (integer "clusters").
Instructions per clock (compared to K10). Frequency doesn't matter, this is per clock:
IPC (CPU level) --> Will be higher, more "modules", double integer resources per "module", less resources per integer "cluster", better use of available resources per integer "cluster".
IPC ("module" level) --> Will be higher, double integer resources per "module", less resources per integer "cluster", better use of available resources per integer "cluster".
IPC (single integer "cluster") --> Less resources, better use of available resources. Higher or lower instructions per clock?
The bold part is likely lower, and that's exactly what savantu, terrace and others are discussing here. IPC per integer "cluster". We don't know for sure, since JF just says "IPC will be higher". At what of the previous levels? After all the BS, bans, etc. he still hasn't answered this question.
Now, if you throw frecuency in the mix, knowing that it will be higher than current K10 CPUs, of course you can say single integer "cluster" perfomance is higher. Just notice how he never uses IPC+higher+per integer "cluster" in the same sentence. The only info we know about single thread perfomance is that it will "be higher". Of course, because of the higher frequency, not because IPC is higher.
JF just has to answer the question and this debate is going to end fast: IPC per integer cluster has been increased or not? No BS, just yes or no.
Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)
![]()
2 pages is a few?
And finally we got a statement, its the first time he explicit mentioned this and the question was answered, ironically after the one that asked the question first was banned...
If he would have done so much earlier we could have at least saved 15 pages of nonsense... anyway im satisfied with the answer and there is nothing more to ask.
1) As per what percentage improvement could be seen... JF has already said that with 33% cores 50% performance gain at server workloads could be seen. This is the only information JF is willing to share and unless you hold Intel stock or work for them, i see no reason why'd you press so much for that information... which he already explained that he couldn't share owing to product being some time away from launch (i assume about a good 2 quarters or so...). Personally speaking AMD wouldn't want Intel to have information on an upcoming product, as it will give Intel an edge and possibly a chance to outmaneuver them. It works the same the when it comes to the opposite... The only time Intel leaked information (remember C2D) on an upcoming architecture was when AMD was kicking them around left right and center and in all segments of market... Now if Intel finds out stuff, they could possibly evolve a new pricing strategy (given their scale and market share its easier now) or something else, to counter a competitive product. Competitive BD is...
2) IPC is higher...
3) IPC compared to previous architectures of AMD is higher... he said as much... and many a times over...
to be fair, historically companies that hide information until days before launch tend to have problems with their product, especially the ones who have many delays. Even if BD does have a sizeable increase over k10.5, which it should, I honestly don't think it will be enough to compete with Sandy Bridge.
That preview by Intel was a red cape for AMD to charge at, and I'm willing to bet if they had a better product they would have released their own preview, challenging for the top spot. My guess is that BD will be a fine product, just still not has a powerful as Intel's in terms of pure performance. To me it seems it's more about power efficiency, as JF keeps mentioning 50% off 33% more cores. Well why not 100% off 33% more cores? That's because the thermal envelopes would just be too high not to mention the power draw would be astronomical considering they don't have a working 32nm process.
At least from my perspective, it seems to me that AMD is done challenging for the top enthusiast performance spot. They seem to have shifted onto a new direction, trying to offer the most performance per dollar, especially over the long run when you consider electricity bills. That's quite reasonable, as Intel has far more money spent on their fabrication process, and thus have denser, faster caches which seriously helps out on applications like Super Pi.
Hans wrote for the K8:
http://chip-architect.com/news/2003_...it_Core.html#3Each Scheduler can launch one ALU and one AGU operation per cycle. The ALU operation may come from one x86 instruction while the AGU operation may come from another.
That is no 1.5, that is 3 ... maybe u missed the fact, that the MacroOps are splitted into µOps at that stage ?
No, it is not "either" it is both ... what do you not understand in the quote of Hans' article ?
That is correct, the current IPCs of usual code is around 1, I think Nehalem achievs 1.5-1.7 in best cases, thus: 2 pipes are enoughbut as I understand it is more efficient due to improved prefetchers and smaller die sizes to use a 2+2 simplified design
Yes you are right, but I never said anything against that point ;-)
Maybe one note on that, because I red it earlier: The AGU results are not retired, they go immediately into the LD/STR units, so the waiting µOp can get its mem-data ;-) Later, after the calculation of the µOp is finished, that µOp is retired.
So in short the retire / ExU ratio is 1:2 for both, not 1:3. For K10 it's (3:6) and for BD it's (4:8).
Bookmarks