dont bother with article ars technica
new compilers like open64 and gcc 4.7 improved BD's performance , here Reason :
http://www.phoronix.com/scan.php?pag...r_open64&num=3
http://www.phoronix.com/scan.php?pag...c_open64&num=3
dont bother with article ars technica
new compilers like open64 and gcc 4.7 improved BD's performance , here Reason :
http://www.phoronix.com/scan.php?pag...r_open64&num=3
http://www.phoronix.com/scan.php?pag...c_open64&num=3
I said on 32nm! And I know, we don't exactly know how Phenom II would perform on 32nm, but it wouldn't be worse than 45nm. How do I know? GloFos 32nm isn't that bad since it already manages two really large dies. I think it's fairly safe that Thuban would reach a bit higher frequencies at early 32nm at almost half the size of BD, and with BDs or Llanos better IMC and Llanos IPC improvements it would already there equal a few hundred MHz extra performance. There you have at least 10% higher performance than Thuban at almost half the size of BD, and that with plenty of headroom to grow in!
And as I said many times before, you can't use Llano as an example of the performance of GloFos 32nm. It's a 1.45 billion trannie monster, with almost no "easy" trannies and die space like caches, it's all complex logic. And it has tradeoffs we don't know anything about. A GPU design originally designed for TSMCs low power and low frequency processes for larger dies with more shaders wouldn't work very good at a high power and high frequency process, like the ones you make CPUs on. Just as a CPUs would have a hard time reaching high frequencies if made on a process tuned for wide low frequency chips like a GPU. And another reason for Llanos bad overclocking abilities is that it has no frequency limiters and is locked, thus raising the bus raises a lot of frequencies that shouldn’t be touched. No one says Intels 32nm is bad just because SBs doesn’t overclock good at all when locked.
Seriously? Quote mining much? You forgot the last part of the sentence "Phenom II has higher performance per watt, twice(!) the performance per mm˛ (taking processes in to account)." I honestly thought our discussion would be above your quote mining tactics. And your chart just proves my point. Phenom II DOES have higher performance per watt! And if shrunk with 32nm it would be almost half as big, but even cooler and capable of even higher performance! There you have twice the performance per mm˛. And no, no one has presented any proof that 32nm is very bad at all, of course it might not be the best process right now, but if it's capable right from the start to make two gargantuan chips it can't be too bad, and it would most likely perform much better on smaller chips, like Thuban.
First, AMDs current projections on Piledriver doesn't show it being that much better. It just can't magically get twice the performance per mm˛ it needs to have to be competitive in the long run. And no process scales very good with frequencies above 3-4GHz. That's the reason BD fails, they made huge tradeoffs for frequencies that have a very high price. The differences needed in an architecture or a process to earn an extra GHz at these levels are huge! In the past we could see 50-100% frequency increase with each process, sometimes even more, today, a new process don't give you that. You can still make larger and more complex chips, but not much have happened with frequencies since the 3GHz barrier was broken many years ago. So if an action that usually gave you a lot of frequency headroom in the past no longer does that, how much do you have to do to earn 1-2GHz that AMD needs right now? When closing in to 4GHz intels speed demon P4 couldn't go higher, and designs that were made for lower frequencies kept rising in speed until high IPC A64 and Core 2 were capable of almost the same frequencies. Higher IPC has the same costs it always had, but higher frequencies today require larger tradeoffs than ever before. That's why relatively huge tradeoffs in BD haven't given more than a few hundred MHz, which Thuban on 32nm might have reached just as well.
So, you simply can't make BD at 5GHz and 95W, and that’s where it needs to be to at least be competitive with SB mid-range, not taking BDs enormous die size into account. It would be easier to make a 32nm Thuban with IPC improvements at 95W, and it would have room to grow in. So yes, BD is pretty close to the roof, the roof might be just a bit over 4GHz in base clock, and it needs to be much much higher.
And no, no design can eradicate transistor level leakage. In the old days before leakage was a big problem you could make designs for higher frequencies, but not today, both low-IPC and high-IPC designs suffer from the same leakage at the same high frequencies. The larger the die the larger the problem as it usually means voltage increase. In a leakage free world then half as long steps in the pipe could mean twice the frequency, but if you run into massive leakage problems that grow exponentially with frequency and voltage then both designs suffer from this at the same frequencies. So to tune down IPC to get more frequency is to ask for more heat generated at the same performance today. This is just the same story all over again as when Prescott had it's problems, people blamed the process, even then Dothan shined on the same process, the difference was that the speed demon Prescott already was pushing the roof.
Simply put, to be just a bit competetive with SB then BD would need to be 5GHz at 95w with a much smaller die. No process can fix that! And how will it go with IB which seems to gain an even larger performance per watt advantage over PD. The situation might be even worse between PD and IB!
<Sarcasm>My word, I have never seen so many formidable CPU architects posting in one place at the same time in all my time on the Interwebs....< /sarcasm>
My, my, what are some of you people like?
Yes, things arnt as expected
Yes, things could have been done better
Yes, there are/were alot of smoke and mirrors
But some of the stuff being posted here is borderline insanity!
I dont know much about CPU architecture myself, but I sure know it isnt simply plug and pray.
First some of you need to grasp the concept of how tiny tiny tiny tiny tiny to the power of a million (simple speak :D) the parts that constitute a CPU are.
Then you need to have some humilty and understand, that it takes years of knowledge, research and experience to even be in a position to know somewhat what is going on.
Then you need to understand that you are still reliant on old knowledge and information.
Then you need to understand that even if you know the above, the item still needs to be physically made.
Some people posting here need a reality check and need to learn to chill out and look at things a little more pragmatically as you DONT (I speak for the majority here including myself) have the skillset to be discussing these things in any other way.
Express your opinions, yes
But stating things as fact, well, absurd isnt the word.............
@ Boris: I've never suggested 95W @ 5 GHz but the 32nm process is borked. I think AMD has a much bigger clue what they did then anyone on enthusiast forums. There have been designs before with long pipelines and high frequencies that worked, IBM Power6 reached over 5 GHz on a 65nm-process. IPC is irrelevant it's the relation of IPC and clock frequencies that really matters. Advantage is in theory a lower amount of transistors/die area thus needing less transistors to power but you need higher frequencies. Obviously AMD dropped the ball here, but the concept none the less works if properly executed.
I am not expecting miracles from Piledriver, only that the path AMD choose will start making sense compared to K10. And the manufacturing process they are using are severely borked so there are increases in frequency and power efficiency to be had here.
No, I say 95W at 5GHz is needed to be even a bit competetive with SB, I never said it was your opinion. Power 6 had one thing AMD doesn't, IBM. IBM used som very interesting techniques to combat leakage that AMD doesn't have. And it was an in order processor cutting a lot of heat generating logic away. And I know IPC is nothing without frequencies. But today when you have to make huge sacrifices to make an architecture gain a few hundred MHz then IPC is in it self more important than ever.
And I still haven't seen how the manufacturing process can play such a big role here. Not even Intel could make bulldozer nearly as fast, cool and small as even mid range SB. Besides no one has given any proof that the process is that bad yet. I know there are supply problems, 32nm is still a lot better than 45nm considering that a huge monstrosity like BD is even doable, it wouldn't work at all at 45nm. So even if 32nm can get better than it is today, it's still better than 45nm, which would make a Phenom III on 32nm much more attractive.
You are kidding right? Why don't you instead of looking at cherry picked benchmarks by someone with an agenda, consisting of only single thread benchmarks, and great sites like neoseeker :p:, you looked at the complete reviews of the best tech sites out there?
Dispute this. i will be quietly lmao as you try. Its 72 wins for 8150 against 21 to thuban. Good luck.Quote:
I saw 5\6 reviews, my impression was zambezi won the large majority of the tests vs thuban, but when reading the comments on this thread i doubted myself, so i had to double check, and review the reviews I've seen. I stopped at the third, it was pointless to go on, Techreport 20-5, X-bit labs 21-6, TomsHardware 31-10, bringing the total of 72-21 benchmarks in favor of FX-8150, it's not even close. How does that translate to the FX-8150 being 40% slower? or 1100T being quite faster? or a a worse launch than Barcelona for that matter, Phenom 9600 lost the majority of the benchmarks to the X2 6400.
A recent review made by the best tech site out there:
http://techreport.com/articles.x/21987
8150 wins 25, 1100t wins 7, even the 8120 beats the 1100t often, How do you reconcile this as thuban being faster?
Oh bu-bu-bu-but thuban have better IPC... and? Bulldozer have better turbo, can handle 8 threads and hopefully will get a lot higher frequencies. Why do intel fanboys keep bringing itunes? First cherry picked benchmark, go figure itunes... are they trying to say bulldozer can't handle itunes? Why don't they show a single thread benchmark of windows calculator? It would be as useful.
And if you do manually what the OS should do the difference between 8150 and 1100t is even bigger:
http://techreport.com/articles.x/21865/2
Fact, broken phenom was clearly beaten by K8, a broken bulldozer clearly beats Phenom.
http://www.electroiq.com/articles/ss...nitiative.html
Intel will have finfets on the market next year, with 22nm, everyone else will have it with 14nm, god knows when. A consortium including everyone else but Intel, can't keep up with Intel, and people want AMD alone to compete and win against Intel...
This post reflects pretty well that you aren't familiar with the BD uarch at all.
Shrunk? Please show me me that shrunk. Your statement just based on a simple theory.
Or not. Probably that shrunk would be worse than Thuban because the crappy 32nm tech. If not then prove it please.
There is the Llano. It's a perfect proof. AMD and even the overclockers can't reach similar frequencies what we saw on Propus or Deneb.
Is that so?
Is it performing at stock frequencies in comparison to it's counterparts at similar clocks? No
Other than the MASSIVE heat and power draw, when it's overclocked, does it compete favorably to an overclocked SB? No
Not sure what you want me to say, clock for clock it just doesn't get it done, and there are hundreds of benches out there that prove this to be correct.
You said bulldozer needs to be 5Ghz to be a "BIT" competitive and now you are asking for valid points? Priceless.
You just said in a tech forum "I've seen" and you want to be taken seriously?Quote:
I've seen i3s at standard frequency beat BD at 4.7GHz+ in games
I'm gonna use the same review i used above, it's the most recent:
http://techreport.com/articles.x/21987/17
A 4.4Ghz 8150 beats the stock 2700k quite often, yet somehow 5ghz is needed to be a "BIT" competitive.
Game benchs 8150 vs i3 2100, 8150 wins 7, i3 wins 3, 2 draws... 2 of the 3 wins of the i3 was by 1 frame, in the low resolutions 8150 wins by a large margin, but hey "you've seen"... how can i dispute that.
Now i will stop feeding trolls and i will get on with my life. Thank you sir.
A shrink typically brings you 0.6-0.7 area reduction ( ~220mm^2 for 32nm Thuban vs. 45nm Thuban ) and 10-20% higher clock ( 3.5-3.9GHz ) et ceteris paribus ( uarch wise ). So would be a 220mm^2 3.8GHz Thuban be better than today's BD ? Most likely yes, both in ST and MT workloads.
The problem is that such a Thuban would have several issues :
- I do not know how speed path limited K10.5 is with 3 cycle L1 and 12 cycle L2, in other words, getting to 3.5-4GHz might have required AMD to relax the latencies of the caches ( like 4 and 14-15 cycles respectively ).
- It lacks AVX and FMA. I can only assume a similar aproach like done with BD, use the existing 128bit FPUs and split de 256bit AVX in 2 halves to minimize area and complexity. I do not know the increase in FPU area and power to support AVX and FMA, I don't think it's trivial
-Maintains the status quo vs. Intel. Thuban roughly needs almost 2x the core count to match Intel Xeons. BD did not improve on this, it definately needs at least 2x the core count to match Intel Xeons.
Given BD's failures, it could be that at least vs. BD ver 1, a 32nm Thuban might have performed better.
Imagine we are in late 2008/ early 2009. BD simulations prove the CPU to be to large and to slow in 45nm vs. competitor CPUs. At the same time, Intel announced they will not use SSE5 and XOP but go for AVX and FMA3. This raises an interesting point : what if AMD would have planned a 32nm Thuban in summer 2011 and delay BD to 2012, drop FMA4 support, focus only on AVX and FMA3 ?
The first one should have been not that difficult to do (?) and would have bought time to polish BD. AVX and FMA4 support is more or less irrelevant now and by the time they will become widespread, BDver1 is history anyway.
Do you have an example of such shrink besides Deneb at 45nm? History of AMD shrinks(90nm, 65nm) teaches that they give 10-20% lower clock at start. And speaking of Deneb, it seems to me it was more Agena's failure, but not a Deneb's win. AMD had plans for 3GHz Phenom, but TLB-bug leaved no time to develop frequency-optimized stepping of Agena before Deneb. So 3Ghz Phenom vs 3Ghz PhenomII would mean no clock increase with shink at start.
A theory yes, not a hypothesis. BD wouldn't be possible on 45nm, but it is on 32nm. The fact that their 32nm is capable of BD is proof enough that it's not crap, it might not be the best around, but it's good enough.
No one has yet showed any proof that 32nm is that bad, that BD is alive and kicking is proof enough that 32nm works. Thuban is not nearly as hard to produce as BD, so if you can make BD, then Thuban would be easy. Thuban would probably be below 200mm˛.
For me the existence of beasts like Llano and BD is proof enough that 32nm is way better than 45nm, and Thuban would be better off at 32nm.
Not proof at all! Is locked SB's proof that Intels 32nm sucks because they don't overclock? No! Llano suffers from similar problems since they aren't unlocked, and what worse is, you can't lock frequencies like PCIe. We have no clue what Llano would clock like if it was unlocked. Besides the integrated GPU isn't made for that kind of processes, which means there will be tradeoffs in process choice when making it on die. The proof we do have is that Llano consumes 5-20W less power than comparable Athlon II in different tests, and that with an extra GPU in the test for the Llano! What does that say about GloFos 32nm? It's better than 45nm!
So, you have no valid proof whatsoever that CPUs fare worse on 32nm than 45nm. I on the other hand have numbers that show lower power consumption, and the fact that BD exists is a strong indicator that a much simpler chip would perform quite well on 32nm.
There are few tests that compare Athlon II with Llano, here is one, it's in swedish but I hope you understand charts.
http://www.sweclockers.com/recension...no/25#pagehead
The i3 was a worst case thing. I'm fully aware that an i3 isn't close to an 8150, but when it actually scores better in some games despite the 8150 is heavilly overclocked I use that as an example that there is a long way left to beat i7 in games.
And how often does an overclocked 8150 beat an stock i7 when you look at other stuff than heavily multithreaded benches? How often does a BD at any frequency beat an SB i7 at stock in games? Show me!
At what frequency can an BD match an stock i7 across the board?
EDIT:
For some reason gamebenches with overclocked BD seems to be rare. So I'll give you the ones I found:
http://www.neoseeker.com/Articles/Ha...x-8150/11.html
http://www.overclockers.com/amd-fx-8...ocessor-review <-- Graphics limied so differences appear smaller.
http://www.vortez.net/articles_pages...review,13.html
http://www.madshrimps.be/articles/ar...#axzz1eXh3RmmC
Again, at what frequency can Bulldozer match this?! You are free to supply reviews of your own to show gaming performance between i7 and overclocked BD.
It's a general rule of thumb in the industry. Moving to a new process brings you two advantages :
-die size reduction, maximum is 50% (0.7*0.7 )
-20% more frequency for the same power
All new processes ussually claim 20-50% power reduction or alternatively 20-40% more clocks for the same power consumption.
So I take it you're still ignoring the facts that AMD have said openly that GlobalFoundries 32nm didn't reach AMD's expectations in both performance and that yields are bad? The last two quarterly calls they've talked about it with media and investors, they also issued a press release before their Q3 results saying projections for that quarter would be lower because of bad yields at their 32nm node.
Llano was also still projected to enter the market at 3,0+ GHz yet only retailed at 2,9 GHz, it was also supposed to have launched late-2010 and not mid-2011. Llano is also in extremely short supply both in the retail space, but also with OEM's. 32nm is horrid right now and facts are that AMD aren't happy with it.
There is no doubt 32nm "works" but it's still a dog with horrible yields, which needs to be fixed and is reflected upon in both of their 32nm products. Talking about Bulldozer, it is also very possible AMD are running specific functions at lower clocks, which can impact performance greatly.
Considering half the results lean heavily toward Thuban being the better architecture and the other results show FX matching Sandy Bridge (in MT performance only...losing up to 80% in single thread) using 25% more power to do it.
Facts on 32nm Thuban/Agena? Changed direction after being called out on it?
Llano's refined core was supposed to gain up to 5% IPC, correct?
Lets say we shrunk Thuban but used Llano's core...a 6 core would be 269mm^2 like I stated before, correct? Assuming that the 32nm process can produce chips that function at least as good as the 45nm, (or maybe something like the 90nm > 65nm transition was at least) we would have chips with a much smaller die and less power consumption than current BD, producing much more performance per mm^2 even if you ignore the power consumption. I didn't say "Add two cores for Phenom II X8 and set it at 4 Ghz" like informal thought I did. Anyway, the X6 CPU performs very close to BD in real world apps when both are overclocked to 4.2/4.8. Also, "STARS" is very bandwidth starved, the more you overclock ram and overclock CPUNB the better it performs, what if it had the type of bandwidth available that BD has? More IPC improvement.
The only comment I made about an eight core with the old uarch was that the die size would be around 330-340mm^2, only slightly bigger than BD is today (~5-10%). Anyway, who knows if they couldn't have added two more cores AND increased clock? Even if clockspeed had to be reduced, it would still perform better than BD. Lets say we could only get 3.8 Ghz out of the architecture on 32nm with 8 cores. Would that not perform better than BD? Look what they did going from X4 to X6, the CPUs overclocked just as well, and still do, compared to recent quads. Would it be hard to prove that shrinking Thuban would have brought more performance per mm^2 over BD on 32nm? No, not at all. I believe the answer is quite clear in the first paragraph of this post.
Yield and performance are different things. Having bad yields doesn't preclude working parts to operate at high frequency. The question is still open whether the uarch is to blame or the process for the high power consumption at high clocks. I'd say it is a bit of both, but the process isn't completely broken.
Llano isn't a good indicator since the GPU is causing all the issues apparently.
I don't ignore anything, but lower yields than expected is not the same thing as the finished chips perform worse than 45nm counterparts. On the contrary we have numbers showing that Llano consumes less power than Athlon II despite a GPU. Llano or BD would most likely not even be feasible on 45nm. So even if 32nm yields isn't where AMD want them to be I think it's safe to say that a 32nm Thuban would perform better than a 45nm Thuban. You are forgetting that the chips that currently has yield problems are record breakers when it comes to transistor count. It's not surprising yields is bad so far. Yields would be better with smaller chips, so that's just another reason why 32nm Thuban would be better of. You still can't blame GloFo for BD's shortcomings as some people do, the amount of speed needed to make BD competitive isn't possible on any process, especially not when taking thermals into account.
Even if 32nm isn't where AMD expected it's still most likely to give cooler chips and/or higher frequency headroom considering the evidence that we have.
If so, and if these functions cripple BD considerably, and they expect yields to improve over the next two years allowing them to run these functions at full speed, shouldn't we expect a successor with radically improved performance? AMDs current projections isn't to promising.
So, nothing, still, points at Thuban on 32nm would perform worse than Thuban on 45nm. It should perform much better and with Llanos IPC improvements you could call it a day. Thuban still has higher performance per mm˛ than BD taking processes into account, that don't bode well for the future.
-Boris-:shakes: so basically you can't show any proof because you ignore the current reality and instead you run to an alternative reality where you can find a 6C Llano ~3.7Ghz with <=125W TDP on working 32nm process but reality is way different..
I will now comment once more your false comments
1. perf/w (Full load (Linpack))Quote:
Phenom II has higher performance per watt, twice(!) the performance per mm˛ (taking processes in to account) and higher IPC and is capable of almost the same- if not the same or higher - frequencies on the same process.
Power consumption - total system
FX 4M/8C 32nm vs Thuban 6C 45nm vs Deneb 4C 45nm
AMD Phenom II X4 980 [3.7 GHz, 4 cores] 209 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] 213W 102%
AMD FX-8150 [3.6 GHz, 4 modules, CMT, turbo] 231W 111%
FX 2M/4C vs FX 3M/6C vs Llano 4C everything on 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] 165W 100%
AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] 165W 100%
AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] 172W 104%
------------------------------------------------------------------------------------------------------
Power consumption - CPU including converter
FX 4M/8C 32nm vs Thuban 6C 45nm vs Deneb 4C 45nm
AMD Phenom II X4 980 [3.7 GHz, 4 cores] 126W 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] 124W 98%
AMD FX-8150 [3.6 GHz, 4 modules, CMT, turbo] 137W 109%
FX 2M/4C vs FX 3M/6C vs Llano 4C everything on 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] 89W 100%
AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] 77W 87%
AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] 83W 93%
--------------------------------------------------------------------------------------------------------
Performance-Index
AMD A8-3850 [2.9 GHz, 4 core] 100%
AMD FX-4100 [3.6 GHz, 2 Module, CMT, Turbo] 105%
AMD FX-6100 [3.3 GHz, 3 Module, CMT, Turbo] 115%
AMD Phenom II X4 980 [3.7 GHz, 4 core] 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 core, Turbo] 106%
AMD FX-8150 [3.6 GHz, 4 Module, CMT, Turbo] 114%
-------------------------------------------------------------------------------------Final conclusion
link to the results I used in this summary http://translate.googleusercontent.c...Tl4pCGBZpRIH9g
perf/w with 1.total system or 2. CPU including converter
1. AMD A8-3850 100%; 2. AMD A8-3850 100%
1. AMD FX-4100 105%; 2. AMD FX-4100 121%
1. AMD FX-6100 111%; 2. AMD FX-6100 124%
1. AMD Phenom II X4 980 100%; 2. AMD Phenom II X4 980 100%
1. AMD Phenom II X6 1100T 104%; 2. AMD Phenom II X6 1100T 108%
1. AMD FX-8150 103%; 2. AMD FX-8150 105%
As you can see almost every BD model has a better ratio compared to Llano 32nm or Deneb 45nm.
BD FX 8150 is short on perf/W vs Thuban but its kinda understandable because it needs 8 threaded applications to perform best but Thuban 6 threads and Deneb only 4 threads, the same can be said about FX 6100 because it needs 6 threads while the rest of the group only 4 while the tests were a mix from low threaded applications to highly threaded.
To be continued....:yepp:
I never said anything about 6core Llanos. You are using straw man arguments here. Why do you feel that you need such tricks?
Deneb and Llano isn't interesting here, I talk Thuban, and Thuban has higher performance per watt than BD? How do you think Thuban on 32nm would perform?
Sorry, it seems my math is wrong. You are right, I calculated area wrong, neglecting a simple formula.
I had calculated 0.7111 * 346, however the correct formula would be 0.7111 * 0.7111 * 346 (A = L*W), meaning Thuban's die on 32nm if cache structure and IMC were left the same would be ~175.36 mm^2.
A "theoretical" eight core STARS design couldn't be much bigger than 250mm^2...giving 9.69mm^2 (x2) for extra (Llano's) cores and a generous 55mm^2 for extra L2 cache and other improvements. It would be impossible for this CPU to be larger than 300mm^2.
Considering Thuban is beating BD in EVERY x86-64 single threaded application I've see yet but WinRAR and AES-encryption benchmarks (if they happened to run in a single thread, that is), both stock and overclocked, also is near BD performance at equal or lesser power usage while at a deficit of 2 cores, it seems Thuban would be about 80% better in performance per mm^2 ignoring power consumption as that would be an unknown at 32nm.
Also, one would have to think that yeilds would be much better at 32nm with the older, smaller architecture. Smaller dies are easier (not to mention cheaper!) to produce, and chances are that the chips would perform better as well as AMD has worked with K10 for 4 years now.
On server side, since Magny Cours is an MCM package with 2 Instabul dies, its area is 724mm^2. On 32nm, this would translate to ~366mm^2...
A twelve core Magny Cours CPU, just 40mm^2 (about 15%) larger than the current 8 core Bulldozer design, has a four thread benefit (50% more cores/threads for 15% size, and that is the desktop chip)...this defeats Tomasis's arguement about BD being "designed for server".
In fact, that CPU already performs almost as well, sometimes even greater than the 16 core MCM Orochi design while a whole node behind.
AMD was able to pull 2.3 Ghz on 45nm with just a 140w TDP on the old architecture, and 2.5 Ghz at 140w now if you look at numbers before process improvements. 2.2 Ghz was possible with 115w TDP. (Opteron 6176 SE, more recent 6180 SE, 6174.)
To sum up, with (correct me if I'm wrong, like Tomasis said I am a "kid") correct math:
Thuban @ 32nm would be around 175mm^2, up to 80% improvement in performance per mm^2 (315mm^2 being 80% larger than 175mm^2)...no less than 40-50% in worst case scenario.
Magny Cours @ 32nm would be only 40mm^2 (<15%) larger than the current Orochi design, and performs in best case scenario equal to the 16 core Orochi MCM design and worst case 33% lesser. The Orochi MCM design would be 1.7x size of this "theoretical" Magny Cours.
A "theoretical" 8/16 core "STARS" MCM design would be no larger than 250mm^2/500mm^2, so we end up with a 16 core STARS design at ~500mm^2, 130mm^2 smaller than Orochi 16 core MCM. This design would be smaller, more efficient per mm^2, and keep the same performance as Orochi MCM in worst case scenarios (where Orochi MCM has pulled ahead of Magny Cours by 33%) even if clocked at a mere 1.8 Ghz due to GloFo's 32nm process.
Yeilds would also be better, since die sizes would be smaller, chips would be produced much cheaper and AMD/GloFo has been producing K10 for 4 years.
Did I mention that the old uarch runs much cooler as well? (Not known for sure, since smaller node means heat is more concentrated, but less should be produced)
I'm sure wez, TESKATLIPOKA, Tomasis, informal and others will still find a way to blame the process for all of this. If AMD hadn't let go of the fab it would still be AMD's fault and nobody would give a :banana::banana::banana::banana: about that arguement. I did the math, where is yours?
WTF is going on with this thread? I dont see any Trinity news, instead theres lots of BS.
2. perf/mm2
Savantu did pretty much the same calculations
4M/8C BD FX 8150, 32nm, 315mm2
6C Thuban, 45nm, 346mm2
Ideal shrink: 346*(32^2/45^2)= 175mm2
http://www.xtremehardware.it/images/..._die_Llano.jpg
LLano ~228mm2
The link shows you a Llano die shot, If you remove the IGP and add what you want with the same amount of cache you will end up with something like this
http://img521.imageshack.us/img521/5651/002diellano.jpg
and that is ~210-220mm2.
I used a 6C Llano on 32nm with the same amount of cache as Thuban.
And now back to perf/mm2
AMD Phenom II X6 1100T [3.3 GHz, 6 core, Turbo] 100%
AMD FX-8150 [3.6 GHz, 4 Module, CMT, Turbo] 108%
Llano vs Deneb on average from this link is 3.23% better, I had to do an average value, what a hassle:D
http://www.anandtech.com/bench/Product/403?vs=85
so its 103.23% vs 108% add the 3MB L3 cache and you have ~105% but you are still under BD performance not to mention the same problem as before with the mix of differently threaded applications.
So the reality is not 2x as you said but rather 315/210=1.5, so better by 50% but BD performs better.
3. frequencies
APU 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] no turbo
ES model Trinity 2M/4C 3.8Ghz turbo 4.1Ghz
CPUs
45nm AMD Phenom II X4 980 [3.7 GHz, 4 cores] no turbo
45nm AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] turbo 3.6Ghz
32nm AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] turbo 3.9Ghz
32nm AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] turbo 3.9Ghz
As you can see IGP doesn't affect the cpu frequencies if it affected it then you wouldn't see Trinity with the same speed as classic cpu models without any IGP, it only affects TDP because some needs to be reserved for IGP so you need to lower default clocks but turbo can make up for it if IGP is idling.
Basically K10 on the same process can't work on the same frequencies while staying in the same TDP, Llano drawing the same or more than a higher clocked BD(default +25%, turbo +34%) in Linpack(IGP is power gated).
really:shakes: and what is thisQuote:
I never said anything about 6core Llanos. You are using straw man arguments here. Why do you feel that you need such tricks?
I think thats a 6C Llano, of course I meant without IGP. I don't think I am using straw man arguments here:confused:.Quote:
I think it's fairly safe that Thuban would reach a bit higher frequencies at early 32nm at almost half the size of BD, and with BDs or Llanos better IMC and Llanos IPC improvements it would already there equal a few hundred MHz extra performance. There you have at least 10% higher performance than Thuban at almost half the size of BD, and that with plenty of headroom to grow in!
Why not? you never said just Thuban, you said Phenom 2Quote:
Deneb and Llano isn't interesting here, I talk Thuban, and Thuban has higher performance per watt than BD? How do you think Thuban on 32nm would perform?
Deneb is also Phenom II and I wanted to compare Llano what is practically a better Deneb vs BDs on the same 32nm node.Quote:
"Phenom II has higher performance per watt, twice(!) the performance per mm˛ (taking processes in to account)."
Which Thuban you mean?, just a shrink or Thuban based on Llano cores? In my opinion frequencies would be lower at the current 32nm compared to 45nm.
I still don't know what good is talking about something what will be never released:shrug:.
BeepBeep2 Thanks for remembering me:down: and your claim about shrinking to 170mm2 is wrong, look at this
http://img521.imageshack.us/img521/5651/002diellano.jpg
its >200mm2 and not 170mm2 what is by the way better than the ideal shrink 175mm2
The reality is GLOFO's 32nm should be working better and BD needs much tweaking. With Trinity we will see what they did or didn't.Quote:
I'm sure wez, TESKATLIPOKA, Tomasis, informal and others will still find a way to blame the process for all of this. If AMD hadn't let go of the fab it would still be AMD's fault and nobody would give a about that arguement. I did the math, where is yours?
now back to piledriver trinity.
Immersion lithography is what made 45nm good I was wonder if 32nm is still making use of it?
That is Llano's die with some crude photoshop work. I'm talking about shrinking the existing Thuban die. You copied and pasted extra cores and L2 on Llano's die, which makes absolutely no sense considering it is of different shape and L2 capacity.
The chip is around 22.2mm long and 15.6mm wide (equaling 346mm^2).
32nm / 45nm = .71 repeating, (22.2 * .7111)(15.6 * .7111) ... 15.7842 * 11.09316 =175mm^2 so you are right. I rounded to .7 in my original calculations.
If Llano's core is 9.69 mm^2, and they got the "ideal shrink", the 45nm core would be 19.16mm^2...but Llano's core isn't exactly the same as the 45nm core, I estimate the 45nm core to be around 16mm^2 looking at images. (Estimated by overlaying a ruler on thuban's die and looking at core perimeter)
Llano's core is also more square than Thuban's core, due to refinements made for IPC gain...I would have to guess that the extra length/width (depending on how you look at it) is what accounts for the 3mm^2 difference. Even if Llano's core IS the same as Thuban's (I know it's not), then you are looking at a 18.75% increase. 1.1875 * 175 does leave us at 207 mm^2 for a Thuban die shrink.
Still, Llano's core is a noticeable difference taller, there isn't really any way the shrink could be more than 200mm^2 regardless of circumstances.
Still, everything else I said stands, take a few percent in regards to what I said about performance per mm^2. I'll update my post with correct math.
@demonkevy
They have to, they went to immersion lithography to help them shrink easier. Intel uses it on their 32nm now as well.
If anyone is guessing how good BD is, it's bad enough that it's making ardent AMD Fanboys gnaw at each other.
I must say that I've never seen that before, so that should tell ya something.
We are talking about AMD, not about the industry as a whole. So do you have such example of shrink in AMD/GF history besides Deben on 45nm?
Anyway it is not reduced transistor size, that brings improvements, it is R&D done during 2 years between two nodes that does it. If this R&D is applied to the old process two, then difference will be smaller. And defects rate is in direct correlation with process performance, as defects also cause process variability, but manufacturer's claims are for defectless transistors. If you have only one "slow" transistor on a die you should regard all die as slow. So with high defect rate it really doesn't matter how fast your process's defectless transistor is. It is strange that you have not mentioned it, as you pretend to understand industry.
BeepBeep2I know its crude, changes done in windows paint. I just added 2 cores and cache and it makes much much more sense than just calculating ideal die size shrink. The sum of L2 and L3 is 9MB(L2 6MB and L3 3MB), the same as Thuban(L2 3MB and L3 6MB) so there is no problemQuote:
That is Llano's die with some crude photoshop work. I'm talking about shrinking the existing Thuban die. You copied and pasted extra cores and L2 on Llano's die, which makes absolutely no sense considering it is of different shape and L2 capacity.
All this is no longer important because I made some calculations based on real shrink to prove my point.
Here is an interesting comparison between Deneb and agena:shocked:
http://img.tomshardware.com/us/2008/...phenom-die.jpg
http://www.xtremesystems.org/forums/...8&d=1308242603
Deneb 45nm 258mm2
Agena 65nm 285mm2
It should have been 45^2/65^2=0.48 so ideal Agena shrink would mean 285*0.48=136.8mm2 but it ended 285*0.905=258mm2.
Its true Deneb has +4MB L3 cache so lets look what it does after removing it from Deneb or adding it to Agena and then doing the shrink
Deviation from Ideal Scaling: 90nm-> 0%, 65nm->14%, 45nm->39%
Equal Die Size Cache: 90nm 1MB, 65nm 1.75MB, 45nm 2.89MB
http://people.ac.upc.edu/rcanal/pdf/Liang-intel08.pdf
Quote:
Our first exposure to the Athlon 64 X2 came in the form of the 4800+ model. That chip is code-named "Toledo," and it packs 1MB of L2 cache per processor core, as do the dual-core Opterons. Toledo-core chips sport a transistor count of about 230 million, all crammed into a die size of 199 mm2.
AMD also makes several models of Athlon 64 X2 that have only 512K of L2 cache. In the past, CPUs with smaller caches have sometimes been based on the exact same chip as the ones with more cache, but they'd have half of the L2 cache disabled for one reason or another. That's not the case with the X2 3800+. AMD says this "Manchester"-core part has about 154 million transistors and a die size of 147 mm2, so it's clearly a different chip.
1MB L2 die size is 52mm2 on 90nmQuote:
What you may or may not have noticed in that paragraph above is that the 3800+ features a "Manchester" core, not the "Toledo core used in the rest of the X2 line. The difference? The Manchester core features fewer transistors (154M compared to the Toledo's 233.2M) and a smaller die size (147mm^ compared to the Toledo's 199mm^2), which also definitely gives it a far better thermal numbers than its siblings (89W as opposed to 110W).
Agena on 65nm with 6MB L3 cache
4/1.75*52mm2=119mm2
285+119=404mm2 -> 258/404=0.64 die shrink instead of ideal 0.48
Deneb on 45nm with 2MB L3 cache
4/2.89*52mm2=72mm2
258-72=186mm2 -> 186/285=0.65 die shrink instead of ideal 0.48
So back to Thuban shrink, ideal is 32^2/45^2=0.51 so its a bit worse than 65nm->45nm -> 0.51-0.48=0.03
ideal Thuban shrink 346mm2*0.51= 176.5mm2
close to reality thuban shrink 346mm2* ((0.64 or 0.65) +0.03)=232~235mm2:yepp:
I made my point and I will no longer continue in this debate which is just killing my time, I would much more have a debate about Trinity.
nice that this thread went to war instead of talk about Piledriver :)
Maybe it did 45nm good, but I don't think that was the key to success only. Immersion is a musthave for 45nm and denser transistor technology, because they (and all modern/up-to-date semiconductors) are using 193 nm light source for patterning. You can't build 45nm structures with that, you must use immersion (ultra-pure water) to focus the light. Even with immersion, they use (they have to use) double-patterning too for critical circuitry. Both things bring you advantages, and of course more costs.
The two next 'big thing' for the semiconductor industry seems a bit far away from now (450mm wafers and EUV technology). AMD already has an EUV-tool at NY for testing purposes, but I didn't heard anything about them and 450mm wafers.
Fine, we use your die-size estimates, you do have a point there. So we agree that Thuban is more like 50% effective per mm˛. Higher single thread performance and smaller size making more cores possible for better multithread performance is a winner in my eyes.
That's an unfair comparision since Llano is a real quadcore. It's not a suprise if a dual module get's higher frequencies. But on the other hand, if they manage to get Piledriver to outperform Llano with lower power consumption I guess you are right. We'll have to wait and see. Even if I do think Llano is capable of higher frequencies, especially in the spring when Piledriver arrives. It's not unusual that hardware makers hold old tech back to give room for successors.
Now that's a straw man! I said THUBAN with Llanos IPC-improvements and with BD's or Llanos IMC. That's not the same thing. The GPU is defining Llano more than some IPC improvements. A Thuban with some IPC-improvements is not a Llano. But I understand it's tempting to call it a Llano since Llanos integration of NB has made it very hard to overclock, so it's tempting to make my suggestion look bad by comparing it with crippled products.
I've said Thuban many times during this discussion, and Deneb is an older and less performing version of Phenom II. You can't just use a product with lesser performance when we are talking about how capable a line up is. It's like saying Fords are faster than Ferraris just because Fords fastest car is faster than Ferraris worst. And that's definitely a straw man argument. Let us compare the best of Phenom II to the best of Bulldozer!
And I still see no reason why Phenom II would have lower frequencies on 32nm. Llano shows that 32nm brings a big drop in power consumption. Why would the frequencies be worse? We can't tell until we have an truly unlocked Llano, and even then we still don't know if Llano suffers from tradeoffs from being coupled with a GPU not made for the same type of process.
Because the question was why people didn't like what AMD did with BD, and the answer is that we don't feel it's the best thing they could have done. Of course it's easy to say that in retrospect, but that is what many of us feel.
-Boris-I don't agree about perf/mm2Quote:
Fine, we use your die-size estimates, you do have a point there. So we agree that Thuban is more like 50% effective per mm˛. Higher single thread performance and smaller size making more cores possible for better multithread performance is a winner in my eyes.
I made new calculations and it ended with
6C Thuban cache 9MB 32nm 232mm2
4M/8C BD cache 16MB 32nm 315mm2
Thats 36% difference in die size but BD is 8% faster.
The thing is you are talking about the whole chip and I think you know cache doesn't give nowhere near as much performance as the size it occupies.
Second, if you want to compare it that badly compare just the cores vs modules it would be more accurate:cool:. 1M/2C is more or less equal to 2C Llano ~ 2 Deneb cores in size.
K10 can have better perf/mm2 actually I think it has but nowhere near as much as you want(think).
what you are doing is trying to convince us that Thuban with Llano IMC and Llano Improvements is not Llano just because it doesn't have the IGP:shakes:, you can call It Thuban 2 if you prefer I don't care because its not important and BTW I clearly wrote 6C Llano without IGP and not deactivated so you had enough time to comprehend what I meant in my original post and that part wasn't even important compared to the rest.Quote:
Now that's a straw man! I said THUBAN with Llanos IPC-improvements and with BD's or Llanos IMC. That's not the same thing. The GPU is defining Llano more than some IPC improvements. A Thuban with some IPC-improvements is not a Llano.
Yeah its really tempting and unfair to call a 6 core chip with Llano cores and IMC as Llano :shakes:Quote:
But I understand it's tempting to call it a Llano since Llanos integration of NB has made it very hard to overclock, so it's tempting to make my suggestion look bad by comparing it with crippled products.
Your point about OC is pointless because I never said anything about that, I was always comparing on default frequency.
For your information Deneb and Thuban have the same core and IMC, everything is the same, L1,L2 cache per core even L3 cache, the only difference is Thuban has 2 more cores with L2 cache nothing more.Quote:
I've said Thuban many times during this discussion, and Deneb is an older and less performing version of Phenom II. You can't just use a product with lesser performance when we are talking about how capable a line up is. It's like saying Fords are faster than Ferraris just because Fords fastest car is faster than Ferraris worst. And that's definitely a straw man argument. Let us compare the best of Phenom II to the best of Bulldozer!
BTW I still don't know what's your problem. Did I compare 4M BD versus 6C Thuban? yes I did, but I had the audacity to include the highest(best) Deneb and even compare lower BD models to Llano(IGP was powergated) because they are on the same process. The best thing would have been if I also compared Deneb vs FX4100 (4threads vs 4threads) and Thuban vs FX6100(6threads vs 6threads).
I will tell you one last time and you don't need an unlocked Llano.Quote:
And I still see no reason why Phenom II would have lower frequencies on 32nm. Llano shows that 32nm brings a big drop in power consumption. Why would the frequencies be worse? We can't tell until we have an truly unlocked Llano, and even then we still don't know if Llano suffers from tradeoffs from being coupled with a GPU not made for the same type of process.
4C 32nm Llano 2.9Ghz TDP 100W (has higher power draw than FX4100 with TDP 95W while the IGP is power-gated so the TDP 100W should be correct for CPU)
4C 45nm Phenom II X4 B99(Deneb) 3.3Ghz TDP 95W (+400Mhz)
4C 45nm Deneb 3.7Ghz TDP 125W (+800Mhz)
TDP 95->125W is 400Mhz for +30W in TDP
So increasing TDP to 130W would mean 3.3Ghz Llano(+400mhz), the difference is still 400Mhz between the older and current process on the same architecture.
I don't think IGP has a noticeable impact on frequency, because if it had then already produced ES Trinity with bigger IGP than Llano wouldn't have default +200Mhz(turbo +300Mhz) than FX 4100 on the same process and I wouldn't be surprised if it's not the final frequency during launch.
savantuI think 50% is too much, the difference in size is 36% as you said but if you add 10-20% from frequency on Thuban you still need to substract 8% for BD so it will end up 2-12% and that is 38-48%, but first of all you need to have Thuban 6C on 32nm with default clocks 3.63-3.96Ghz(turbo: 4070-4440Mhz) to have 38-48% better perf/mm2 and thats unreasonable on the current 32nm process, when its mature enough you will probably get that high, but then BD will have higher clocks as well.Quote:
BD is 8% faster than the 45nm Thuban. For 32nm, you'd have 36% size difference and likely 10-20% performance difference in Thuban's favour ( higher clocks at the same power if nothing else ). So 50% more performance/sq mm isn't unreasonable.
Look at the last part of my previous post, if you think I am right or not is up to you.
P.S. comparing Thuban vs BD will result just in perf/mm2 between different models but nowhere near the perf/mm2 between architectures, you would need to have at least the same amount of cache, same process and the same TDP to be more or less accurate. I almost forgot about the same thread count, 2module vs 4core and so on.
It depends, I think 32nm would bring som nice frequency improvements. A thuban with 6% higher IPC from Llano-optimizations and 3% from BD IMC and a base clock at 3.8GHz seems extremely resonable to me. Add Turbo Core 2.0 and it will beat Bulldozer even more in single thread performance. You can read further down in this post about why I think 32nm has such potential.
You did not write the part without IGP, that's just lies, you can't just pretend that you said that from the beginning. Here it is:
And that's just a straw man. The whole thing with Llano is the IGP, if you talk about Llanos without IGP you have to say that. If you pretend or insinuate that my arguments is about something totally different from what they really is about. That is making a straw man. And don't dare get me a quote where you say you meant without IGP if it's from a post after I made my complaint about your straw man tactics.
No you didn't say it, but Llano, due to it's design clocks like turds. So by calling my suggestion a Llano without mentioning you meant without the parts that cripples Llanos frequency potential, you made it seem like my argument was about full Llano APUs with extra cores and higher frequency. Which of course is pretty stupid. A straw man is to "misinterpret" someones argument to something stupid and then argue against that made up stupid stand point instead of the real one. That's why I protested. But if you want we can call it Thuban II or Phenom III from now on. But then we have to be clear what we are arguing about. My definition is a shrinked Thuban core, with the same caches, and with Llanos IPC-improvements and IMC.
When I talk about how efficient Phenom II is I of course meant the most efficient modell. And Thuban has more performance per mm˛, higher performance per watt and higher IPC for the whole die than Deneb. It should be obvius that I talk about Thuban, since I mentioned it tens of times in this thread already. So please stop making my arguments seem to be about things they aren't.
Some very creative math there with lots of things that can go wrong. You can't calculate that way, I can show math that 700MHz is 0w extra TDP. Or that 2 extra cores is 30W less TDP! So drop that please.
The thing is, Llano uses 5-20W less power than Athlon II on 45nm! So your argument is invalid. And power gated doesn't mean it doesn't affect performance, if you make a large chip and turn half of the chip off it won't be nearly as cool and fast as a chip made half as big from the beginning. That said Llano was the first of it's kind, it's not surprising if tradeoffs has been made. And you tend to forget that Llano is faster than FX-4100.
And again, we don't know anything about Llanos capabilities, since it's locked and is an architecture that don't allow high buses. Just like locked SB! And it's not unreasonable that Llano is held back because of Piledriver. If they released Llano at 3.4GHz and it beats Piledriver, how would that look? It might be the Tualatin syndrome all over again.
Since 32nm uses less power than 45nm despite an extra GPU you will have a hard time proving that 32nm wouldn't work good with Thuban. And that math of your can "prove" just about anything, try harder.
So discussion went from constructive to "who wrote what, and you're using straw man arguments". The most important thing is being right, not actually bringing something to the table :cool:
GO GO THE INTERNETZ :horse:!
+1 internet for Smartidiot
Despite TESKATLIPOKA's more "correct" calculations (though I measured the 45nm core and compared it to 32nm Llano core) I'm pretty sure even teskatlipoka even agrees that performance is currently better on STARS than BD. It seems we've settled around 30% per mm2, in worst case. What if we did have that 8 core STARS? Is it even possible for single thread to be slower than BD? By the way, in some single threaded applications, BD needs ~6 Ghz+ to keep up with the stock 2600K...
So, what improvements do we expect to see for Piledriver? :D
when people look at perf/mm2, are they counting all that extra space that we never understood, or are they only using the modules space?
i can blow all your minds and say that Thuban and BD and Deneb all suck compared to Propus in perf/mm2. L3 for amd takes up a huge space and offers only a minor increase in perf. however due to perf/watt, its a fine addition. so they eat a smaller marginal profit, to have a higher performing chip.
from newegg
Athlon 640 (3ghz no L3) 100$
Phenom x4 945 (3ghz and L3) 110$ (but its out of stock so it might be a little off)
Phenom x4 960T (3ghz, L3 and turbo) 125$
the Athlon kills them all in price/mm2 and perf/mm2, yet 2 more products that are slightly faster exist.
-Boris- After this comment of yours I am convinced any more debate with you is pointless and from now on I will utterly ignore you, you can say what you want I don't care so the last 2 things I do is defend myself from your false accusations and make a quick correction.
I don't need to pretend because I didn't lie.Quote:
You did not write the part without IGP, that's just lies, you can't just pretend that you said that from the beginning. Here it is:Quote:
Originally Posted by TESKATLIPOKA
what you are doing is trying to convince us that Thuban with Llano IMC and Llano Improvements is not Llano just because it doesn't have the IGP, you can call It Thuban 2 if you prefer I don't care because its not important and BTW I clearly wrote 6C Llano without IGP and not deactivated so you had enough time to comprehend what I meant in my original post and that part wasn't even important compared to the rest.
And that's just a straw man. The whole thing with Llano is the IGP, if you talk about Llanos without IGP you have to say that. If you pretend or insinuate that my arguments is about something totally different from what they really is about. That is making a straw man. And don't dare get me a quote where you say you meant without IGP if it's from a post after I made my complaint about your straw man tactics.Quote:
Originally Posted by TESKATLIPOKA
you run to an alternative reality where you can find a 6C Llano ~3.7Ghz with <=125W TDP on working 32nm process but reality is way different..
You even started being rude, accusing me of a lie and made a statement ordering me what I can or can't use to prove my innocence. If I knew you were smart enough to think I meant 6C Llano with IGP instead of 6C Llano without IGP I would have included it in my first post but I realized your smartness after reading your straw man comment and thats why it was included in my second post as a reaction to your straw hat comment.
I never said I wrote Llano without IGP in my first(original) post I posted Yesterday 12:06 PM Why would I do that if I knew it was in my second comment:shrug:. I was referring to this
6C Llano ~3.7Ghz with <=125W TDP on working 32nm process and that was really in my original comment
then one of your comments to me was posted Yesterday 12:25 PM Last edited by -Boris-; Yesterday at 12:27 PM.
I made it clear that I meant Llano without IGP in my second comment posted Yesterday 01:39 PM and last edited Yesterday at 01:50 PM.
Then you posted another comment Today 01:50 AM
In this comment you had to know how I meant It, because you quoted my statement about Llano without IGP
Then I posted another comment Today 05:07 AM Last edited by TESKATLIPOKA; Today at 05:10 AM.Quote:
Now that's a straw man! I said THUBAN with Llanos IPC-improvements and with BD's or Llanos IMC. That's not the same thing. The GPU is defining Llano more than some IPC improvements. A Thuban with some IPC-improvements is not a Llano. But I understand it's tempting to call it a Llano since Llanos integration of NB has made it very hard to overclock, so it's tempting to make my suggestion look bad by comparing it with crippled products.Quote:
Originally Posted by TESKATLIPOKA
really and what is this
I think thats a 6C Llano, of course I meant without IGP. I don't think I am using straw man arguments here.
where I wrote a sentence you thought was false
So to comprehend what I meant you had enough time from my comment posted Yesterday at 01:50 PM. to my next comment posted Today 05:07 AM and you really understood what I meant in your comment posted Today 01:50 AMQuote:
BTW I clearly wrote 6C Llano without IGP and not deactivated so you had enough time to comprehend what I meant in my original post and that part wasn't even important compared to the rest.
Then the last comment come from you where you started accusing me of a lie and it was posted Today 08:03 AM
So thanks a bunch for your false accusation and I hope next time you won't be falsely accusing me or anyone else from a lie unless its true:mad:.
P.S. a quick correction before I start to ignore you
I already wrote in my original comment that FX4100(3.6Ghz) is 5% better on average than the strongest Llano(2.9Ghz)Quote:
And you tend to forget that Llano is faster than FX-4100.
link as proof
http://translate.googleusercontent.c...Qk--HLUQ2nZIsw
beautiful life irony!
the kid plays amd engineer and writes BIG LETTERS to prove his intelligence with green behaviour.
Im sure the thread would be better if we had sensible discussion. I dont think I needed to be called out if I thanked a guy for an opinion.
Again I say, all have different opinions and it is no need to attack even you dont disagree with other. Just to agree to disagree. Thats all.
BeepBeep2 BD's performance and frequencies are not what i was waiting for and lower IPC than K10 was also a cold shower for me, after so many statements about better IPC:(
If you compare at the same thread count and don't look at Llano then K10 is really better than BD and I have no problem acknowledging that because K10 has higher IPC and no penalty from sharing.
examples
4 threads Deneb 980 is 15% faster than FX 4100
6 threads Thuban 1100 is 10% faster than FX 6100
link as proof
http://translate.googleusercontent.c...Qk--HLUQ2nZIsw
Which one do you mean? Piledriver core with L3 or without L3 in Trinity.Quote:
So, what improvements do we expect to see for Piledriver?
Higher clocks for sure and I would like to see at least the same IPC as Thuban for Trinity piledriver but who knows:shrug:, the info from Chinese about perf in Cinebench would suggest ~10% IPC improvement and that is a lot considering Trinity doesn't have 8MB L3 cache but its suspicious. I hope we can see some more leaks before new year.
You PM'ed me today, I replied back.
How about you stop judging me due to my age, assuming I think I know everything and looking at my behaviour from an opposite standpoint.
Everything you need to know about me, even my views on life and where I stand as a person will be in your inbox, thanks.
Thanks for being reasonable with my argument...unlike Tomasis, with essential name calling and looking down upon myself as a lesser being due to my age. Hopefully he will take back his comments after reading my PM, he is being as equally insensible as I.
Piledriver with L3, the replacement for Orochi OR without in Trinity is alright with me. Since Trinity is closer, why don't we start the train up again with that. I'd like to see a 10% IPC improvement for sure. Couple that with a bit more clockspeed and better performance per watt and I'll start seeing this architecture as something useful instead of a waste of money. Right now AMD can not compete with the old arch. in single threads, and unfortunately that arch couldn't compete with intel...
Hopefully Trinity leaks out faster than Orochi did, or at least from more credible sources :p That "O" guy was right...had we believed leaks, children like I would not have been so butthurt :rolleyes:
I apologize to everyone for derailing this thread, though it takes two (or more!) men to start a fight. ;)
xbitlab : AMD to Start Production of Desktop "Trinity" APU in March
no 125w SKU when initial production?Quote:
Staring from early and middle March, 2012, AMD intends to mass produce its A-series "Trinity" accelerated processing units with 65W thermal design power (TDP), according to an AMD document seen by X-bit labs. In early May, 2012, the chip designer wants to initiate mass production of A-series "Trinity" APUs with 100W TDP and higher performance.
The 65W chips will belong to A10-5700, A8-5500, A6-5400 and A4-5300 families, whereas 100W microprocessors will only fit into A10-5800 and A8-5600 series.
http://www.xbitlabs.com/news/cpu/dis..._Document.html
undone I don't think AMD will release 125W part, Llano didn't have higher TDP than 100W either.
What I want to know is if there will be some with unlocked multiplier or not and what speeds will be present for mobile versions.
http://www.xbitlabs.com/news/cpu/dis..._Document.html
no idea why they dont build a 140w version, if they are not at the maximum clocks for the process, they are missing an opportunity to sell a few of the same chips for 100$ more
Production doesnt mean it will also be on the market. Probably 1-2 months afterwards, just like we saw with Llano. So going by that, realistic timeframe for Mobile Trinity Parts and Locked Desktop Parts is probably April/May.... with 100W unlocked parts showing in in July/Aug. Given AMDs track record as of late, I'd say this is pretty optimistic estimate.
Unlocked Llanos in January 2012.... I think the first mention of them was August/September and we still see nothing out there. At least the x4 651 has started showing up on Provantage/ShopBLT ... so maybe in a few weeks I am hoping. I really want to put together an overclocked Llano rig to play around with. Its either the x4 651 or a8-3870 / a6-3670
I'm not very optomistic about this comment either: "The 20% speed improvement represents AMD's projections "using digital media workload" and actual performance advantage over currently available Fusion A-series "Llano" vary depending on the applications and usage models.". Of course Trinity will be better fit for say... transcoding where the GPU portion can be leveraged, but it says nothing about raw cpu IPC. Llano may very well still beat it in that department. Wouldn't that be sad...
Miwo
They said +20% uplift vs Llano so it means CPU IPC+clocks but its questionable how accurate is that statement for example older slide said +30%, but I think it will be better than Llano not worse.
http://tof.canardpc.com/view/41eb1b2...78d0da2ca6.jpg
Even if Llano had better IPC as long as Trinity has high enough clocks it will win.
example is A8 3850 vs FX4100
BD has lower IPC and sharing penalty yet is 5% faster because it has +24% clock speed.
I disagree. FX 4100 is more like a dual core with CMT and FX 6100 is more like a tri-core with CMT. There is no true hex core BD. There is no true octo core BD. AMD marketing really screwed the pooch with the whole 8 core thing and it really makes BD look a lot worse than it really is.
It's the same as if Intel released their first single core hyperthreaded processor and called it a dual core and everyone went around and said that the dual core is no where near double the performance and sometimes it's even slower so it's a terrible processor. The original Intel P4s weren't all that great, but Intel saved themselves a ton of fail by not calling it a dual core.
AMD decided that catering to less informed individuals and calling a quad core with CMT an octo core would be a better business move, and arguably it is as it's still selling a lot of processors. If you were to compare FX8120 to a quad core Phenom 2, you'd see massive gains in multithreaded apps and that's probably the fairest way to compare these processors.
If single thread performance were higher that arguement would be valid. However, these CPU's aren't "more like" "with CMT", and you need to look at the die size. 8 BD threads take more space on the die than 8 STARS threads.
Can't really call BD a quad core by that logic, you'd expect single threads to perform much better.
sdlvx
I know what is a BD so your explanation wasn't necessary
Comparing FX8120 to Deneb I wouldn't call as fairest, then Thuban should be compared to what? there is no 6module BD and for a long time won't be.
For me the fairest performance comparison is comparing at the same thread count and then the best from current and previous generation.
BTW 2 and 3module could be at least as powerful as their 125W predecessors but the problem is they are only 95W parts, If AMD released models with 125W TDP and higher clocks lets say
FX 41** ~4.3Ghz and FX 61** ~3.8Ghz then they should be faster by a small margin in the test you quoted.
The biggest problem for BD is relatively low clocks and lower IPC, if Piledriver improves these then it could be a good product if the price is right, but we can forget about highend AMD in desktop.
Quite an interesting find, what do you say about ES mobile Trinity 2M/4C default 2.5Ghz turbo 3.2Ghz:clap:.
http://pics.computerbase.de/3/8/0/2/4/2.jpg
Here you can see Device ID 9900 is for mobile segment
http://www.rage3d.com/board/showthread.php?t=33982884Quote:
AMD9901.1 = "TRINITY DEVASTATOR DESKTOP"
AMD9904.1 = "TRINITY DEVASTATOR LITE DESKTOP"
AMD9903.2 = "TRINITY DEVASTATOR LITE MOBILE"
AMD9900.2 = "TRINITY DEVASTATOR MOBILE"
AMD9991.1 = "TRINITY SCRAPPER DESKTOP"
AMD9990.2 = "TRINITY SCRAPPER MOBILE"
FX4100 has +24% higher clocks than A8 3850 and is 5% faster. If Trinity IPC remains on BD level then this ES trinity has +56% higher clocks than A6 3410 and it should end up 33% faster.
Your discovery seems logical because there was already a news that this 9900 sample is nice compared to A8-3850.
http://www.xtremesystems.org/forums/...ady-Runs-Well&
33% faster than 1.6-2.3Ghz A6-3410 points to the level of A8-3850(2.9Ghz), everything looks right if true.Quote:
One of the AMD Linux engineering systems for Trinity is running nicely even on Ubuntu 11.04 with the Linux 2.6.38 kernel. The CPU string is AMD Eng Sample 2M252057C4450_32/25/16_9900_609 and its graphics are the Trinity Devastator Mobile with 512MB of video memory and an AMD Pumori motherboard. The PCI ID on the Trinity Devastator appears to be 0x9900. This Trinity APU is quad-core and running at 2.50GHz. The current quad-core Llano offerings are clocked at 2.6GHz (A6-3650) and 2.9GHz (A8-3850), while this Trinity part is clocked slower, it's numbers are nice compared to my A8-3850 Linux system.
undone33% faster doesn't point to A8-3850 levels, more like to a model with default clock 2.2Ghz and 3Ghz turbo.Quote:
33% faster than 1.6-2.3Ghz A6-3410 points to the level of A8-3850(2.9Ghz), everything looks right if true.
33% faster should be if the IPC is the same as BD with L3, but we don't know that yet, not even if they release mobile trinity clocked at 2.5Ghz or more.
I remember that article, but who knows what nice means to him, it could be the same performance, slower or even higher :shrug: and the scores were under Linux not windows.Quote:
Your discovery seems logical because there was already a news that this 9900 sample is nice compared to A8-3850.
Let's say 2.5Ghz(3.2Ghz turbo) Trinity offers the same performance as A8-3850(2.9Ghz) then I would be totally surprised.
Honestly, I don't think that will happen, because Trinity without L3 would need ~10-15% better IPC than Llano to perform like that and the best ES 3.8Ghz version would perform like 4.2-4.4Ghz Llano, I think that's too good to be true when BD+L3 IPC is at least ~5-10% worse than Llano.
I thought the turbo is working on each core even in full load, isn't it?
Yes it's surprise because Trinity is 2M4C and would have disavantages comparing to true quad core.Quote:
Let's say 2.5Ghz(3.2Ghz turbo) Trinity offers the same performance as A8-3850(2.9Ghz) then I would be totally surprised.
Honestly, I don't think that will happen, because Trinity without L3 would need ~10-15% better IPC than Llano to perform like that and the best ES 3.8Ghz version would perform like 4.2-4.4Ghz Llano, I think that's too good to be true when BD+L3 IPC is at least ~5-10% worse than Llano.
There was a leak which said A1 Trinity is on par with a8-3850 in Cinebench, but the frequency is still in doubt. Even if it's the top ES then Trinity must have much higher per-thread performance. According to this chart(http://www.hardware.fr/articles/842-...s-3-2-ghz.html), 2M4C has 25% disavantages compare to deneb, it may imply ES trinity's ipc is nearly the same or less than Llano because frequency different is about 30%.
EDIT: I found people is not that interested in trinity, right? It's sad there were too much arguement about zambezi but no more about trinity even right now.
undoneeven if some turbo worked on each core, it doesn't work all the time and certainly not at 2.3Ghz when 4C are under load. a8-3850 has 2.9Ghz without any turbo thats why I said the other model should perform 33% better and for a8-3850 you would need ~15% more.Quote:
I thought the turbo is working on each core even in full load, isn't it?
I think at best it will be on Llano level IPC.Quote:
Yes it's surprise because Trinity is 2M4C and would have disavantages comparing to true quad core.
There was a leak which said A1 Trinity is on par with a8-3850 in Cinebench, but the frequency is still in doubt. Even if it's the top ES then Trinity must have much higher per-thread performance. According to this chart(http://www.hardware.fr/articles/842-...s-3-2-ghz.html), 2M4C has 25% disavantages compare to deneb, it may imply ES trinity's ipc is nearly the same or less than Llano because frequency different is about 30%.
For me trinity is way more interesting than BD was even before we found out about the lacking performance because I need an affordable notebook but don't know which Trinity to go with, 2M/4C for 13-14' or 11.6' and 1M/2C tdp 17W:shrug:.Quote:
EDIT: I found people is not that interested in trinity, right? It's sad there were too much arguement about zambezi but no more about trinity even right now.
The bad thing about Trinity in notebook is you need to change default 1333Mhz memory for something faster, kingston offers 1866Mhz so-dimm and this will cost me +60 euro if I buy just the 4GB kit:down:
There'll be some fix when launched, even now ES 4.0Ghz Trinity is way more than my usual workload.
I think that's why amd take part in RAM business, they could lower the platform price with their own high-compatibility RAM.Quote:
For me trinity is way more interesting than BD was even before we found out about the lacking performance because I need an affordable notebook but don't know which Trinity to go with, 2M/4C for 13-14' or 11.6' and 1M/2C tdp 17W:shrug:.
The bad thing about Trinity in notebook is you need to change default 1333Mhz memory for something faster, kingston offers 1866Mhz so-dimm and this will cost me +60 euro if I buy just the 4GB kit:down:
(Latest news, amd has prepared 1866 fusion RAM for users, maybe we'll see some affortable 2000mhz next year)
undoneThere isn't any 4Ghz ES Trinity unless you meant turbo.Quote:
There'll be some fix when launched, even now ES 4.0Ghz Trinity is way more than my usual workload.
What fix did you mean?
Some fix like B0 to B2 Zambezi or B2 to B3(which is unknown yet), I personally guess this round there maybe around 15% difference comparing to ES trinity and final silicon.
(btw I still doubt problems with Zambezi would be totally solved in Trinity, more discussion about this while some actual benchs about trinity being leaked.)
undone IPC won't change and even if they release some scheduling patch I think it will help max 5%, so 10-15% must come from clocks to be true. I don't think they can clock it at 4.2-4.4Ghz and still stay in 100W TDP, remember the still unreleased FX4170? That is a 2M/4C BD clocked at 4.2GHz 125W TDP and doesn't have a big IGP just a lot of cache;).
but IIRC performance difference between B0 and B2 is larger than the clock difference, maybe there's either some problem of clocking abnormally in B0.
Yields could be improve, we need to wait.Quote:
I don't think they can clock it at 4.2-4.4Ghz and still stay in 100W TDP, remember the still unreleased FX4170? That is a 2M/4C BD clocked at 4.2GHz 125W TDP and doesn't have a big IGP just a lot of cache.
undonewhat is your source, I don't remember any comparison made between B0 a B2.Quote:
but IIRC performance difference between B0 and B2 is larger than the clock difference, maybe there's either some problem of clocking abnormally in B0.
that's true, we will see.Quote:
Yields could be improve, we need to wait.
B0 performance was pretty close to B2 performance according to OBR's results.
I like how threads got cleaned up because of his "bull:banana::banana::banana::banana:" rumors, results...talking down of AMD, but after release, everyone here is perfectly fine with the amount of performance given by these chips.
AFAIK FX-4000's are not native dual modules. Trinity comes with a real dual-module design.
Seems my bad again... fx8130p score around 9xxx in fritz and fx8150 around 11xxx, it could be just frequency difference. I remembered incorrectly that fx8150 score around 13xxx.
yes, native design should have somewhat lower tdp.
(btw ES Zambezi frequency is 15% lower than fx8150, so we could expect turbo up to near 4.7Ghz with retail Trinity.)
undone
ES 3100mhz, turbo 4.1GhzQuote:
(btw ES Zambezi frequency is 15% lower than fx8150, so we could expect turbo up to near 4.7Ghz with retail Trinity.)
Fx8150 3600Mhz, turbo 4.2Ghz
default clocks increased +500Mhz but turbo just +100Mhz.
and now here we have an ES Trinity 3.8Ghz, turbo 4.1Ghz
I think the best case is we will see a model with default 3.9-4.1Ghz, turbo 4.2-4.3Ghz, basically Trinity with FX4170 clocks but TDP 100W instead of 125W.
I am pretty sure turbo 4.6-4.7Ghz won't happen even if they released a 125W model, too much voltage needed, I don't think they are willing to set ~1.5V for max turbo even if they stayed in TDP.
When it comes to piledriver,we can expect good news since:
1)AMD lists IPC and power improvements(~10-15%;uncertainty probably comes from process node ability at the moment of starting the production). This can imply around 5% IPC boost and 10% clock boost. If process allows they can boost clock further.
2)AMD stated they are working with MS on Win7 scheduler update.Win8 already supports optimal thread scheduling for BD. This can lead to another ~5% more performance over current level of Zambezi.
All summed up: from 15% to 20% improvement is quite possible with top end PD(8270?) vs FX8150. This would be roughly enough to put new PD model at around 2600K or 2700K level. This probably won't be enough to beat 8T IB @3.5/3.9Ghz but should be pretty close overall. Remember that 2600K/2700K level ,on desktop workloads, is overall very close to 980x/990x Westmere(which is still number two desktop chip out there).
Informal- I'll believe it when I see it, and I honestly hope we do....
But I have my doubts.
We'll see.
remind me/us when desktop Piledriver will be released? this is called "Vishera" correct?
Edit: dang looks like Sept 2012 time frame. i dont think i cant wait that long. lol gotta take advantage of this Crosshair 5 investment....
hopefully Amd and MS are working on a nice patch for win7, fix steam game hiccups/BSOD's, and maybe a better stepping for FX-8170
here's to hoping :/
Yes I didn't mention it since 6C SB-E is 550-900$ product. There is 4C variant also,but it's not faster than 2600K and cost the same or more(+ the additional cost of s2011 board).
So looking at 6C SB-E models,you will be looking at 2x-3x the price difference(CPU only;board prices difference adds on top of that) and 15-20% more performance on desktop. Just doesn't make any sense.
Best of luck to all of us who are waiting.....
It can only be faster than current 8150.It should have at least SOME IPC improvement. It should get at least SOME boost from Win7 scheduler patch or Win8's new scheduler. It SHOULD at least has SOME clock advantage over 8150 or even 8170. All summed up: it will be faster than 8150 we have today,without any shadow of a doubt. How much faster? Its anybody's guess,but if you ask me it can be between 15% and 25% faster than 8150 level we have today. Bottom of this range is close enough to put it in 2600K zone(overall) while top of this range is enough to put it in 3770K(IvyBridge) overall performance zone. Single thread performance is intel's territory for the time being(and will remain to be for some foreseeable future). Good news is that single thread performance of PD core,with Turbo, will be good enough for 99.9% of desktop users,even those who consider themselves "enthusiasts".
You realize, if they had good competition...that would be mainstream :shrug:
That tiny high margin part of the market is rather quite large when you consider that Intel owns 100% of it and 80% of the overall market.
(That was after Phenom II caught AMD up in market share, I'm sure it will slip back a few percent)
Not necessarily. Intel's high end has always been expensive, even when they were getting beaten by the original FX their high-end chips were still expensive, and people still bought them.
What do you think matters more to AMD: OEM's who order in the thousands, or a tiny sliver of enthusiasts who moan about 10% in a benchmark? The only OEM's ordering high-end parts are specialist custom builders who would be lucky to sell 1,000 units a year. It's not "rather quite large", it absolutely pales in comparison.
True that!, this was do to Brand recognition ... Think BOSE audio... BaH, Man my BOSE home theater sounds best in the world...Sorry for the little rant there.
Intel has always had the majority Due to the facts I mentioned before, Companies like mine buy primarily Intel based HP units, well because that all our sales reps know. Who's AMD, they ask. Regardless of what Chip is better at what price etc, companies will mostly Buy Intel because everyone know's who they are.
Sennheiser!
To be honest, I've been recommending intel chips to consumers after BD's launch...of course, it depends on what you use your machine for and I explain all of that. I tell them that I even own a BD chip myself...but 2500K is much more responsive in all ST apps and much better for gaming (a lot of my friends are gamers like me) despite being a little slower in MT than Bulldozer. I built a Phenom II X4 840 (Propus) machine with 5670 for my girlfriend's father a few months ago, (it was $40 cheaper than comparative Llano route) that machine will likely last them several more years than the Pentium 4 HT machine it replaced. It would have been pointless to build an FX-4100 for that type of build, though.
I hope PD can bring improvements over BD, process should be much improved and hopefully the architecture is worked on considerably.
No, it wouldn't in a million years. The platform is crazy expensive to develop manufacture, SB-E also isn't even a desktop chip to begin with. It's a server platform ported over to desktop to satisfy the hunger of the tiny few. Simply put, no matter what competition Intel wouldn't want to (or be able to in all honesty) to price it in mainstream.
More than likely they won't. CPU's are at the point now where they can do everything the average Joe needs. Most people look at say a 2700K for $300 and a 3930K for $700 (rough prices just for example) and would rather spend the $400 on an iProduct so they can check Friendface 20 times more per day. OEM's know this, so they won't even bother putting the option there. Alienware fits into the specialist custom builders, they just happen to be owned by a large company now. It's a niche market, not where the big bucks are.
Alright, I understand now :)
Hope this turns out to be true
:)Quote:
Originally Posted by Charlie - Semiaccurate
wow, I really hope, it can be true. I wish :).
I am not going to pretend like I have any insight on how Trinity performs or it's power consumption, but this is what I've said since Bulldozer launched... It was an architecture built with power efficiency in mind so obviously something had to go wrong along the way. Looking forward to Trinity and Piledriver :)