dont bother with article ars technica
new compilers like open64 and gcc 4.7 improved BD's performance , here Reason :
http://www.phoronix.com/scan.php?pag...r_open64&num=3
http://www.phoronix.com/scan.php?pag...c_open64&num=3
dont bother with article ars technica
new compilers like open64 and gcc 4.7 improved BD's performance , here Reason :
http://www.phoronix.com/scan.php?pag...r_open64&num=3
http://www.phoronix.com/scan.php?pag...c_open64&num=3
I said on 32nm! And I know, we don't exactly know how Phenom II would perform on 32nm, but it wouldn't be worse than 45nm. How do I know? GloFos 32nm isn't that bad since it already manages two really large dies. I think it's fairly safe that Thuban would reach a bit higher frequencies at early 32nm at almost half the size of BD, and with BDs or Llanos better IMC and Llanos IPC improvements it would already there equal a few hundred MHz extra performance. There you have at least 10% higher performance than Thuban at almost half the size of BD, and that with plenty of headroom to grow in!
And as I said many times before, you can't use Llano as an example of the performance of GloFos 32nm. It's a 1.45 billion trannie monster, with almost no "easy" trannies and die space like caches, it's all complex logic. And it has tradeoffs we don't know anything about. A GPU design originally designed for TSMCs low power and low frequency processes for larger dies with more shaders wouldn't work very good at a high power and high frequency process, like the ones you make CPUs on. Just as a CPUs would have a hard time reaching high frequencies if made on a process tuned for wide low frequency chips like a GPU. And another reason for Llanos bad overclocking abilities is that it has no frequency limiters and is locked, thus raising the bus raises a lot of frequencies that shouldn’t be touched. No one says Intels 32nm is bad just because SBs doesn’t overclock good at all when locked.
Seriously? Quote mining much? You forgot the last part of the sentence "Phenom II has higher performance per watt, twice(!) the performance per mm˛ (taking processes in to account)." I honestly thought our discussion would be above your quote mining tactics. And your chart just proves my point. Phenom II DOES have higher performance per watt! And if shrunk with 32nm it would be almost half as big, but even cooler and capable of even higher performance! There you have twice the performance per mm˛. And no, no one has presented any proof that 32nm is very bad at all, of course it might not be the best process right now, but if it's capable right from the start to make two gargantuan chips it can't be too bad, and it would most likely perform much better on smaller chips, like Thuban.
First, AMDs current projections on Piledriver doesn't show it being that much better. It just can't magically get twice the performance per mm˛ it needs to have to be competitive in the long run. And no process scales very good with frequencies above 3-4GHz. That's the reason BD fails, they made huge tradeoffs for frequencies that have a very high price. The differences needed in an architecture or a process to earn an extra GHz at these levels are huge! In the past we could see 50-100% frequency increase with each process, sometimes even more, today, a new process don't give you that. You can still make larger and more complex chips, but not much have happened with frequencies since the 3GHz barrier was broken many years ago. So if an action that usually gave you a lot of frequency headroom in the past no longer does that, how much do you have to do to earn 1-2GHz that AMD needs right now? When closing in to 4GHz intels speed demon P4 couldn't go higher, and designs that were made for lower frequencies kept rising in speed until high IPC A64 and Core 2 were capable of almost the same frequencies. Higher IPC has the same costs it always had, but higher frequencies today require larger tradeoffs than ever before. That's why relatively huge tradeoffs in BD haven't given more than a few hundred MHz, which Thuban on 32nm might have reached just as well.
So, you simply can't make BD at 5GHz and 95W, and that’s where it needs to be to at least be competitive with SB mid-range, not taking BDs enormous die size into account. It would be easier to make a 32nm Thuban with IPC improvements at 95W, and it would have room to grow in. So yes, BD is pretty close to the roof, the roof might be just a bit over 4GHz in base clock, and it needs to be much much higher.
And no, no design can eradicate transistor level leakage. In the old days before leakage was a big problem you could make designs for higher frequencies, but not today, both low-IPC and high-IPC designs suffer from the same leakage at the same high frequencies. The larger the die the larger the problem as it usually means voltage increase. In a leakage free world then half as long steps in the pipe could mean twice the frequency, but if you run into massive leakage problems that grow exponentially with frequency and voltage then both designs suffer from this at the same frequencies. So to tune down IPC to get more frequency is to ask for more heat generated at the same performance today. This is just the same story all over again as when Prescott had it's problems, people blamed the process, even then Dothan shined on the same process, the difference was that the speed demon Prescott already was pushing the roof.
Simply put, to be just a bit competetive with SB then BD would need to be 5GHz at 95w with a much smaller die. No process can fix that! And how will it go with IB which seems to gain an even larger performance per watt advantage over PD. The situation might be even worse between PD and IB!
<Sarcasm>My word, I have never seen so many formidable CPU architects posting in one place at the same time in all my time on the Interwebs....< /sarcasm>
My, my, what are some of you people like?
Yes, things arnt as expected
Yes, things could have been done better
Yes, there are/were alot of smoke and mirrors
But some of the stuff being posted here is borderline insanity!
I dont know much about CPU architecture myself, but I sure know it isnt simply plug and pray.
First some of you need to grasp the concept of how tiny tiny tiny tiny tiny to the power of a million (simple speak :D) the parts that constitute a CPU are.
Then you need to have some humilty and understand, that it takes years of knowledge, research and experience to even be in a position to know somewhat what is going on.
Then you need to understand that you are still reliant on old knowledge and information.
Then you need to understand that even if you know the above, the item still needs to be physically made.
Some people posting here need a reality check and need to learn to chill out and look at things a little more pragmatically as you DONT (I speak for the majority here including myself) have the skillset to be discussing these things in any other way.
Express your opinions, yes
But stating things as fact, well, absurd isnt the word.............
@ Boris: I've never suggested 95W @ 5 GHz but the 32nm process is borked. I think AMD has a much bigger clue what they did then anyone on enthusiast forums. There have been designs before with long pipelines and high frequencies that worked, IBM Power6 reached over 5 GHz on a 65nm-process. IPC is irrelevant it's the relation of IPC and clock frequencies that really matters. Advantage is in theory a lower amount of transistors/die area thus needing less transistors to power but you need higher frequencies. Obviously AMD dropped the ball here, but the concept none the less works if properly executed.
I am not expecting miracles from Piledriver, only that the path AMD choose will start making sense compared to K10. And the manufacturing process they are using are severely borked so there are increases in frequency and power efficiency to be had here.
No, I say 95W at 5GHz is needed to be even a bit competetive with SB, I never said it was your opinion. Power 6 had one thing AMD doesn't, IBM. IBM used som very interesting techniques to combat leakage that AMD doesn't have. And it was an in order processor cutting a lot of heat generating logic away. And I know IPC is nothing without frequencies. But today when you have to make huge sacrifices to make an architecture gain a few hundred MHz then IPC is in it self more important than ever.
And I still haven't seen how the manufacturing process can play such a big role here. Not even Intel could make bulldozer nearly as fast, cool and small as even mid range SB. Besides no one has given any proof that the process is that bad yet. I know there are supply problems, 32nm is still a lot better than 45nm considering that a huge monstrosity like BD is even doable, it wouldn't work at all at 45nm. So even if 32nm can get better than it is today, it's still better than 45nm, which would make a Phenom III on 32nm much more attractive.
You are kidding right? Why don't you instead of looking at cherry picked benchmarks by someone with an agenda, consisting of only single thread benchmarks, and great sites like neoseeker :p:, you looked at the complete reviews of the best tech sites out there?
Dispute this. i will be quietly lmao as you try. Its 72 wins for 8150 against 21 to thuban. Good luck.Quote:
I saw 5\6 reviews, my impression was zambezi won the large majority of the tests vs thuban, but when reading the comments on this thread i doubted myself, so i had to double check, and review the reviews I've seen. I stopped at the third, it was pointless to go on, Techreport 20-5, X-bit labs 21-6, TomsHardware 31-10, bringing the total of 72-21 benchmarks in favor of FX-8150, it's not even close. How does that translate to the FX-8150 being 40% slower? or 1100T being quite faster? or a a worse launch than Barcelona for that matter, Phenom 9600 lost the majority of the benchmarks to the X2 6400.
A recent review made by the best tech site out there:
http://techreport.com/articles.x/21987
8150 wins 25, 1100t wins 7, even the 8120 beats the 1100t often, How do you reconcile this as thuban being faster?
Oh bu-bu-bu-but thuban have better IPC... and? Bulldozer have better turbo, can handle 8 threads and hopefully will get a lot higher frequencies. Why do intel fanboys keep bringing itunes? First cherry picked benchmark, go figure itunes... are they trying to say bulldozer can't handle itunes? Why don't they show a single thread benchmark of windows calculator? It would be as useful.
And if you do manually what the OS should do the difference between 8150 and 1100t is even bigger:
http://techreport.com/articles.x/21865/2
Fact, broken phenom was clearly beaten by K8, a broken bulldozer clearly beats Phenom.
http://www.electroiq.com/articles/ss...nitiative.html
Intel will have finfets on the market next year, with 22nm, everyone else will have it with 14nm, god knows when. A consortium including everyone else but Intel, can't keep up with Intel, and people want AMD alone to compete and win against Intel...
This post reflects pretty well that you aren't familiar with the BD uarch at all.
Shrunk? Please show me me that shrunk. Your statement just based on a simple theory.
Or not. Probably that shrunk would be worse than Thuban because the crappy 32nm tech. If not then prove it please.
There is the Llano. It's a perfect proof. AMD and even the overclockers can't reach similar frequencies what we saw on Propus or Deneb.
Is that so?
Is it performing at stock frequencies in comparison to it's counterparts at similar clocks? No
Other than the MASSIVE heat and power draw, when it's overclocked, does it compete favorably to an overclocked SB? No
Not sure what you want me to say, clock for clock it just doesn't get it done, and there are hundreds of benches out there that prove this to be correct.
You said bulldozer needs to be 5Ghz to be a "BIT" competitive and now you are asking for valid points? Priceless.
You just said in a tech forum "I've seen" and you want to be taken seriously?Quote:
I've seen i3s at standard frequency beat BD at 4.7GHz+ in games
I'm gonna use the same review i used above, it's the most recent:
http://techreport.com/articles.x/21987/17
A 4.4Ghz 8150 beats the stock 2700k quite often, yet somehow 5ghz is needed to be a "BIT" competitive.
Game benchs 8150 vs i3 2100, 8150 wins 7, i3 wins 3, 2 draws... 2 of the 3 wins of the i3 was by 1 frame, in the low resolutions 8150 wins by a large margin, but hey "you've seen"... how can i dispute that.
Now i will stop feeding trolls and i will get on with my life. Thank you sir.
A shrink typically brings you 0.6-0.7 area reduction ( ~220mm^2 for 32nm Thuban vs. 45nm Thuban ) and 10-20% higher clock ( 3.5-3.9GHz ) et ceteris paribus ( uarch wise ). So would be a 220mm^2 3.8GHz Thuban be better than today's BD ? Most likely yes, both in ST and MT workloads.
The problem is that such a Thuban would have several issues :
- I do not know how speed path limited K10.5 is with 3 cycle L1 and 12 cycle L2, in other words, getting to 3.5-4GHz might have required AMD to relax the latencies of the caches ( like 4 and 14-15 cycles respectively ).
- It lacks AVX and FMA. I can only assume a similar aproach like done with BD, use the existing 128bit FPUs and split de 256bit AVX in 2 halves to minimize area and complexity. I do not know the increase in FPU area and power to support AVX and FMA, I don't think it's trivial
-Maintains the status quo vs. Intel. Thuban roughly needs almost 2x the core count to match Intel Xeons. BD did not improve on this, it definately needs at least 2x the core count to match Intel Xeons.
Given BD's failures, it could be that at least vs. BD ver 1, a 32nm Thuban might have performed better.
Imagine we are in late 2008/ early 2009. BD simulations prove the CPU to be to large and to slow in 45nm vs. competitor CPUs. At the same time, Intel announced they will not use SSE5 and XOP but go for AVX and FMA3. This raises an interesting point : what if AMD would have planned a 32nm Thuban in summer 2011 and delay BD to 2012, drop FMA4 support, focus only on AVX and FMA3 ?
The first one should have been not that difficult to do (?) and would have bought time to polish BD. AVX and FMA4 support is more or less irrelevant now and by the time they will become widespread, BDver1 is history anyway.
Do you have an example of such shrink besides Deneb at 45nm? History of AMD shrinks(90nm, 65nm) teaches that they give 10-20% lower clock at start. And speaking of Deneb, it seems to me it was more Agena's failure, but not a Deneb's win. AMD had plans for 3GHz Phenom, but TLB-bug leaved no time to develop frequency-optimized stepping of Agena before Deneb. So 3Ghz Phenom vs 3Ghz PhenomII would mean no clock increase with shink at start.
A theory yes, not a hypothesis. BD wouldn't be possible on 45nm, but it is on 32nm. The fact that their 32nm is capable of BD is proof enough that it's not crap, it might not be the best around, but it's good enough.
No one has yet showed any proof that 32nm is that bad, that BD is alive and kicking is proof enough that 32nm works. Thuban is not nearly as hard to produce as BD, so if you can make BD, then Thuban would be easy. Thuban would probably be below 200mm˛.
For me the existence of beasts like Llano and BD is proof enough that 32nm is way better than 45nm, and Thuban would be better off at 32nm.
Not proof at all! Is locked SB's proof that Intels 32nm sucks because they don't overclock? No! Llano suffers from similar problems since they aren't unlocked, and what worse is, you can't lock frequencies like PCIe. We have no clue what Llano would clock like if it was unlocked. Besides the integrated GPU isn't made for that kind of processes, which means there will be tradeoffs in process choice when making it on die. The proof we do have is that Llano consumes 5-20W less power than comparable Athlon II in different tests, and that with an extra GPU in the test for the Llano! What does that say about GloFos 32nm? It's better than 45nm!
So, you have no valid proof whatsoever that CPUs fare worse on 32nm than 45nm. I on the other hand have numbers that show lower power consumption, and the fact that BD exists is a strong indicator that a much simpler chip would perform quite well on 32nm.
There are few tests that compare Athlon II with Llano, here is one, it's in swedish but I hope you understand charts.
http://www.sweclockers.com/recension...no/25#pagehead
The i3 was a worst case thing. I'm fully aware that an i3 isn't close to an 8150, but when it actually scores better in some games despite the 8150 is heavilly overclocked I use that as an example that there is a long way left to beat i7 in games.
And how often does an overclocked 8150 beat an stock i7 when you look at other stuff than heavily multithreaded benches? How often does a BD at any frequency beat an SB i7 at stock in games? Show me!
At what frequency can an BD match an stock i7 across the board?
EDIT:
For some reason gamebenches with overclocked BD seems to be rare. So I'll give you the ones I found:
http://www.neoseeker.com/Articles/Ha...x-8150/11.html
http://www.overclockers.com/amd-fx-8...ocessor-review <-- Graphics limied so differences appear smaller.
http://www.vortez.net/articles_pages...review,13.html
http://www.madshrimps.be/articles/ar...#axzz1eXh3RmmC
Again, at what frequency can Bulldozer match this?! You are free to supply reviews of your own to show gaming performance between i7 and overclocked BD.
It's a general rule of thumb in the industry. Moving to a new process brings you two advantages :
-die size reduction, maximum is 50% (0.7*0.7 )
-20% more frequency for the same power
All new processes ussually claim 20-50% power reduction or alternatively 20-40% more clocks for the same power consumption.
So I take it you're still ignoring the facts that AMD have said openly that GlobalFoundries 32nm didn't reach AMD's expectations in both performance and that yields are bad? The last two quarterly calls they've talked about it with media and investors, they also issued a press release before their Q3 results saying projections for that quarter would be lower because of bad yields at their 32nm node.
Llano was also still projected to enter the market at 3,0+ GHz yet only retailed at 2,9 GHz, it was also supposed to have launched late-2010 and not mid-2011. Llano is also in extremely short supply both in the retail space, but also with OEM's. 32nm is horrid right now and facts are that AMD aren't happy with it.
There is no doubt 32nm "works" but it's still a dog with horrible yields, which needs to be fixed and is reflected upon in both of their 32nm products. Talking about Bulldozer, it is also very possible AMD are running specific functions at lower clocks, which can impact performance greatly.
Considering half the results lean heavily toward Thuban being the better architecture and the other results show FX matching Sandy Bridge (in MT performance only...losing up to 80% in single thread) using 25% more power to do it.
Facts on 32nm Thuban/Agena? Changed direction after being called out on it?
Llano's refined core was supposed to gain up to 5% IPC, correct?
Lets say we shrunk Thuban but used Llano's core...a 6 core would be 269mm^2 like I stated before, correct? Assuming that the 32nm process can produce chips that function at least as good as the 45nm, (or maybe something like the 90nm > 65nm transition was at least) we would have chips with a much smaller die and less power consumption than current BD, producing much more performance per mm^2 even if you ignore the power consumption. I didn't say "Add two cores for Phenom II X8 and set it at 4 Ghz" like informal thought I did. Anyway, the X6 CPU performs very close to BD in real world apps when both are overclocked to 4.2/4.8. Also, "STARS" is very bandwidth starved, the more you overclock ram and overclock CPUNB the better it performs, what if it had the type of bandwidth available that BD has? More IPC improvement.
The only comment I made about an eight core with the old uarch was that the die size would be around 330-340mm^2, only slightly bigger than BD is today (~5-10%). Anyway, who knows if they couldn't have added two more cores AND increased clock? Even if clockspeed had to be reduced, it would still perform better than BD. Lets say we could only get 3.8 Ghz out of the architecture on 32nm with 8 cores. Would that not perform better than BD? Look what they did going from X4 to X6, the CPUs overclocked just as well, and still do, compared to recent quads. Would it be hard to prove that shrinking Thuban would have brought more performance per mm^2 over BD on 32nm? No, not at all. I believe the answer is quite clear in the first paragraph of this post.
Yield and performance are different things. Having bad yields doesn't preclude working parts to operate at high frequency. The question is still open whether the uarch is to blame or the process for the high power consumption at high clocks. I'd say it is a bit of both, but the process isn't completely broken.
Llano isn't a good indicator since the GPU is causing all the issues apparently.
I don't ignore anything, but lower yields than expected is not the same thing as the finished chips perform worse than 45nm counterparts. On the contrary we have numbers showing that Llano consumes less power than Athlon II despite a GPU. Llano or BD would most likely not even be feasible on 45nm. So even if 32nm yields isn't where AMD want them to be I think it's safe to say that a 32nm Thuban would perform better than a 45nm Thuban. You are forgetting that the chips that currently has yield problems are record breakers when it comes to transistor count. It's not surprising yields is bad so far. Yields would be better with smaller chips, so that's just another reason why 32nm Thuban would be better of. You still can't blame GloFo for BD's shortcomings as some people do, the amount of speed needed to make BD competitive isn't possible on any process, especially not when taking thermals into account.
Even if 32nm isn't where AMD expected it's still most likely to give cooler chips and/or higher frequency headroom considering the evidence that we have.
If so, and if these functions cripple BD considerably, and they expect yields to improve over the next two years allowing them to run these functions at full speed, shouldn't we expect a successor with radically improved performance? AMDs current projections isn't to promising.
So, nothing, still, points at Thuban on 32nm would perform worse than Thuban on 45nm. It should perform much better and with Llanos IPC improvements you could call it a day. Thuban still has higher performance per mm˛ than BD taking processes into account, that don't bode well for the future.