AMD "Piledriver" refresh of Zambezi - info, speculations, test, fans

**TESKATLIPOKA** · 11-23-2011, 12:06 PM

-Boris-

so basically you can't show any proof because you ignore the current reality and instead you run to an alternative reality where you can find a 6C Llano ~3.7Ghz with <=125W TDP on working 32nm process but reality is way different..
I will now comment once more your false comments

Phenom II has higher performance per watt, twice(!) the performance per mm² (taking processes in to account) and higher IPC and is capable of almost the same- if not the same or higher - frequencies on the same process.

1. perf/w (Full load (Linpack))
Power consumption - total system
FX 4M/8C 32nm vs Thuban 6C 45nm vs Deneb 4C 45nm
AMD Phenom II X4 980 [3.7 GHz, 4 cores] 209 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] 213W 102%
AMD FX-8150 [3.6 GHz, 4 modules, CMT, turbo] 231W 111%
FX 2M/4C vs FX 3M/6C vs Llano 4C everything on 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] 165W 100%
AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] 165W 100%
AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] 172W 104%
------------------------------------------------------------------------------------------------------
Power consumption - CPU including converter
FX 4M/8C 32nm vs Thuban 6C 45nm vs Deneb 4C 45nm
AMD Phenom II X4 980 [3.7 GHz, 4 cores] 126W 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] 124W 98%
AMD FX-8150 [3.6 GHz, 4 modules, CMT, turbo] 137W 109%
FX 2M/4C vs FX 3M/6C vs Llano 4C everything on 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] 89W 100%
AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] 77W 87%
AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] 83W 93%
--------------------------------------------------------------------------------------------------------
Performance-Index
AMD A8-3850 [2.9 GHz, 4 core] 100%
AMD FX-4100 [3.6 GHz, 2 Module, CMT, Turbo] 105%
AMD FX-6100 [3.3 GHz, 3 Module, CMT, Turbo] 115%

AMD Phenom II X4 980 [3.7 GHz, 4 core] 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 core, Turbo] 106%
AMD FX-8150 [3.6 GHz, 4 Module, CMT, Turbo] 114%
-------------------------------------------------------------------------------------Final conclusion
link to the results I used in this summary http://translate.googleusercontent.c...Tl4pCGBZpRIH9g

perf/w with 1.total system or 2. CPU including converter
1. AMD A8-3850 100%; 2. AMD A8-3850 100%
1. AMD FX-4100 105%; 2. AMD FX-4100 121%
1. AMD FX-6100 111%; 2. AMD FX-6100 124%

1. AMD Phenom II X4 980 100%; 2. AMD Phenom II X4 980 100%
1. AMD Phenom II X6 1100T 104%; 2. AMD Phenom II X6 1100T 108%
1. AMD FX-8150 103%; 2. AMD FX-8150 105%

As you can see almost every BD model has a better ratio compared to Llano 32nm or Deneb 45nm.
BD FX 8150 is short on perf/W vs Thuban but its kinda understandable because it needs 8 threaded applications to perform best but Thuban 6 threads and Deneb only 4 threads, the same can be said about FX 6100 because it needs 6 threads while the rest of the group only 4 while the tests were a mix from low threaded applications to highly threaded.

To be continued....

**-Boris-** · 11-23-2011, 12:25 PM

Originally Posted by TESKATLIPOKA

-Boris-

so basically you can't show any proof because you ignore the current reality and instead you run to an alternative reality where you can find a 6C Llano ~3.7Ghz with <=125W TDP on working 32nm process but reality is way different..
I will now comment once more your false comments

1. perf/w (Full load (Linpack))
Power consumption - total system
FX 4M/8C 32nm vs Thuban 6C 45nm vs Deneb 4C 45nm
AMD Phenom II X4 980 [3.7 GHz, 4 cores] 209 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] 213W 102%
AMD FX-8150 [3.6 GHz, 4 modules, CMT, turbo] 231W 111%
FX 2M/4C vs FX 3M/6C vs Llano 4C everything on 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] 165W 100%
AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] 165W 100%
AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] 172W 104%
------------------------------------------------------------------------------------------------------
Power consumption - CPU including converter
FX 4M/8C 32nm vs Thuban 6C 45nm vs Deneb 4C 45nm
AMD Phenom II X4 980 [3.7 GHz, 4 cores] 126W 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] 124W 98%
AMD FX-8150 [3.6 GHz, 4 modules, CMT, turbo] 137W 109%
FX 2M/4C vs FX 3M/6C vs Llano 4C everything on 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] 89W 100%
AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] 77W 87%
AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] 83W 93%
--------------------------------------------------------------------------------------------------------
Performance-Index
AMD A8-3850 [2.9 GHz, 4 core] 100%
AMD FX-4100 [3.6 GHz, 2 Module, CMT, Turbo] 105%
AMD FX-6100 [3.3 GHz, 3 Module, CMT, Turbo] 115%

AMD Phenom II X4 980 [3.7 GHz, 4 core] 100%
AMD Phenom II X6 1100T [3.3 GHz, 6 core, Turbo] 106%
AMD FX-8150 [3.6 GHz, 4 Module, CMT, Turbo] 114%
-------------------------------------------------------------------------------------Final conclusion
link to the results I used in this summary http://translate.googleusercontent.c...Tl4pCGBZpRIH9g

perf/w with 1.total system or 2. CPU including converter
1. AMD A8-3850 100%; 2. AMD A8-3850 100%
1. AMD FX-4100 105%; 2. AMD FX-4100 121%
1. AMD FX-6100 111%; 2. AMD FX-6100 124%

1. AMD Phenom II X4 980 100%; 2. AMD Phenom II X4 980 100%
1. AMD Phenom II X6 1100T 104%; 2. AMD Phenom II X6 1100T 108%
1. AMD FX-8150 103%; 2. AMD FX-8150 105%

As you can see almost every BD model has a better ratio compared to Llano 32nm or Deneb 45nm.
BD FX 8150 is short on perf/W vs Thuban but its kinda understandable because it needs 8 threaded applications to perform best but Thuban 6 threads and Deneb only 4 threads, the same can be said about FX 6100 because it needs 6 threads while the rest of the group only 4 while the tests were a mix from low threaded applications to highly threaded.

To be continued....

I never said anything about 6core Llanos. You are using straw man arguments here. Why do you feel that you need such tricks?
Deneb and Llano isn't interesting here, I talk Thuban, and Thuban has higher performance per watt than BD? How do you think Thuban on 32nm would perform?

**BeepBeep2** · 11-23-2011, 12:36 PM

Originally Posted by -Boris-

How do you calculate die size? 32nm Thuban would be much smaller, in theory as small as half the size of 45nm Thuban.

Sorry, it seems my math is wrong. You are right, I calculated area wrong, neglecting a simple formula.

I had calculated 0.7111 * 346, however the correct formula would be 0.7111 * 0.7111 * 346 (A = L*W), meaning Thuban's die on 32nm if cache structure and IMC were left the same would be ~175.36 mm^2.

A "theoretical" eight core STARS design couldn't be much bigger than 250mm^2...giving 9.69mm^2 (x2) for extra (Llano's) cores and a generous 55mm^2 for extra L2 cache and other improvements. It would be impossible for this CPU to be larger than 300mm^2.

Considering Thuban is beating BD in EVERY x86-64 single threaded application I've see yet but WinRAR and AES-encryption benchmarks (if they happened to run in a single thread, that is), both stock and overclocked, also is near BD performance at equal or lesser power usage while at a deficit of 2 cores, it seems Thuban would be about 80% better in performance per mm^2 ignoring power consumption as that would be an unknown at 32nm.

Also, one would have to think that yeilds would be much better at 32nm with the older, smaller architecture. Smaller dies are easier (not to mention cheaper!) to produce, and chances are that the chips would perform better as well as AMD has worked with K10 for 4 years now.

On server side, since Magny Cours is an MCM package with 2 Instabul dies, its area is 724mm^2. On 32nm, this would translate to ~366mm^2...
A twelve core Magny Cours CPU, just 40mm^2 (about 15%) larger than the current 8 core Bulldozer design, has a four thread benefit (50% more cores/threads for 15% size, and that is the desktop chip)...this defeats Tomasis's arguement about BD being "designed for server".

In fact, that CPU already performs almost as well, sometimes even greater than the 16 core MCM Orochi design while a whole node behind.

AMD was able to pull 2.3 Ghz on 45nm with just a 140w TDP on the old architecture, and 2.5 Ghz at 140w now if you look at numbers before process improvements. 2.2 Ghz was possible with 115w TDP. (Opteron 6176 SE, more recent 6180 SE, 6174.)

To sum up, with (correct me if I'm wrong, like Tomasis said I am a "kid") correct math:

Thuban @ 32nm would be around 175mm^2, up to 80% improvement in performance per mm^2 (315mm^2 being 80% larger than 175mm^2)...no less than 40-50% in worst case scenario.

Magny Cours @ 32nm would be only 40mm^2 (<15%) larger than the current Orochi design, and performs in best case scenario equal to the 16 core Orochi MCM design and worst case 33% lesser. The Orochi MCM design would be 1.7x size of this "theoretical" Magny Cours.

A "theoretical" 8/16 core "STARS" MCM design would be no larger than 250mm^2/500mm^2, so we end up with a 16 core STARS design at ~500mm^2, 130mm^2 smaller than Orochi 16 core MCM. This design would be smaller, more efficient per mm^2, and keep the same performance as Orochi MCM in worst case scenarios (where Orochi MCM has pulled ahead of Magny Cours by 33%) even if clocked at a mere 1.8 Ghz due to GloFo's 32nm process.

Yeilds would also be better, since die sizes would be smaller, chips would be produced much cheaper and AMD/GloFo has been producing K10 for 4 years.

Did I mention that the old uarch runs much cooler as well? (Not known for sure, since smaller node means heat is more concentrated, but less should be produced)

I'm sure wez, TESKATLIPOKA, Tomasis, informal and others will still find a way to blame the process for all of this. If AMD hadn't let go of the fab it would still be AMD's fault and nobody would give a

about that arguement. I did the math, where is yours?

**undone** · 11-23-2011, 01:08 PM

WTF is going on with this thread? I dont see any Trinity news, instead theres lots of BS.

**TESKATLIPOKA** · 11-23-2011, 01:39 PM

2. perf/mm2
Savantu did pretty much the same calculations
4M/8C BD FX 8150, 32nm, 315mm2
6C Thuban, 45nm, 346mm2
Ideal shrink: 346*(32^2/45^2)= 175mm2

http://www.xtremehardware.it/images/..._die_Llano.jpg
LLano ~228mm2
The link shows you a Llano die shot, If you remove the IGP and add what you want with the same amount of cache you will end up with something like this
http://img521.imageshack.us/img521/5651/002diellano.jpg
and that is ~210-220mm2.
I used a 6C Llano on 32nm with the same amount of cache as Thuban.
And now back to perf/mm2
AMD Phenom II X6 1100T [3.3 GHz, 6 core, Turbo] 100%
AMD FX-8150 [3.6 GHz, 4 Module, CMT, Turbo] 108%
Llano vs Deneb on average from this link is 3.23% better, I had to do an average value, what a hassle

http://www.anandtech.com/bench/Product/403?vs=85
so its 103.23% vs 108% add the 3MB L3 cache and you have ~105% but you are still under BD performance not to mention the same problem as before with the mix of differently threaded applications.
So the reality is not 2x as you said but rather 315/210=1.5, so better by 50% but BD performs better.

3. frequencies
APU 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] no turbo
ES model Trinity 2M/4C 3.8Ghz turbo 4.1Ghz
CPUs
45nm AMD Phenom II X4 980 [3.7 GHz, 4 cores] no turbo
45nm AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] turbo 3.6Ghz
32nm AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] turbo 3.9Ghz
32nm AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] turbo 3.9Ghz

As you can see IGP doesn't affect the cpu frequencies if it affected it then you wouldn't see Trinity with the same speed as classic cpu models without any IGP, it only affects TDP because some needs to be reserved for IGP so you need to lower default clocks but turbo can make up for it if IGP is idling.
Basically K10 on the same process can't work on the same frequencies while staying in the same TDP, Llano drawing the same or more than a higher clocked BD(default +25%, turbo +34%) in Linpack(IGP is power gated).

I never said anything about 6core Llanos. You are using straw man arguments here. Why do you feel that you need such tricks?

really

and what is this

I think it's fairly safe that Thuban would reach a bit higher frequencies at early 32nm at almost half the size of BD, and with BDs or Llanos better IMC and Llanos IPC improvements it would already there equal a few hundred MHz extra performance. There you have at least 10% higher performance than Thuban at almost half the size of BD, and that with plenty of headroom to grow in!

I think thats a 6C Llano, of course I meant without IGP. I don't think I am using straw man arguments here

.

Deneb and Llano isn't interesting here, I talk Thuban, and Thuban has higher performance per watt than BD? How do you think Thuban on 32nm would perform?

Why not? you never said just Thuban, you said Phenom 2

"Phenom II has higher performance per watt, twice(!) the performance per mm² (taking processes in to account)."

Deneb is also Phenom II and I wanted to compare Llano what is practically a better Deneb vs BDs on the same 32nm node.
Which Thuban you mean?, just a shrink or Thuban based on Llano cores? In my opinion frequencies would be lower at the current 32nm compared to 45nm.
I still don't know what good is talking about something what will be never released

.

**TESKATLIPOKA** · 11-23-2011, 01:48 PM

BeepBeep2 Thanks for remembering me

and your claim about shrinking to 170mm2 is wrong, look at this
http://img521.imageshack.us/img521/5651/002diellano.jpg
its >200mm2 and not 170mm2 what is by the way better than the ideal shrink 175mm2

I'm sure wez, TESKATLIPOKA, Tomasis, informal and others will still find a way to blame the process for all of this. If AMD hadn't let go of the fab it would still be AMD's fault and nobody would give a about that arguement. I did the math, where is yours?

The reality is GLOFO's 32nm should be working better and BD needs much tweaking. With Trinity we will see what they did or didn't.

**muzz** · 11-23-2011, 02:08 PM

Originally Posted by freeloader

Clock throttling due to heat.

That probably should have been mentioned, don't you think?

**demonkevy666** · 11-23-2011, 03:03 PM

now back to piledriver trinity.

Immersion lithography is what made 45nm good I was wonder if 32nm is still making use of it?

**BeepBeep2** · 11-23-2011, 03:34 PM

Originally Posted by TESKATLIPOKA

BeepBeep2 Thanks for remembering me

and your claim about shrinking to 170mm2 is wrong, look at this
http://img521.imageshack.us/img521/5651/002diellano.jpg
its >200mm2 and not 170mm2 what is by the way better than the ideal shrink 175mm2

The reality is GLOFO's 32nm should be working better and BD needs much tweaking. With Trinity we will see what they did.

That is Llano's die with some crude photoshop work. I'm talking about shrinking the existing Thuban die. You copied and pasted extra cores and L2 on Llano's die, which makes absolutely no sense considering it is of different shape and L2 capacity.

The chip is around 22.2mm long and 15.6mm wide (equaling 346mm^2).
32nm / 45nm = .71 repeating, (22.2 * .7111)(15.6 * .7111) ... 15.7842 * 11.09316 =175mm^2 so you are right. I rounded to .7 in my original calculations.

If Llano's core is 9.69 mm^2, and they got the "ideal shrink", the 45nm core would be 19.16mm^2...but Llano's core isn't exactly the same as the 45nm core, I estimate the 45nm core to be around 16mm^2 looking at images. (Estimated by overlaying a ruler on thuban's die and looking at core perimeter)
Llano's core is also more square than Thuban's core, due to refinements made for IPC gain...I would have to guess that the extra length/width (depending on how you look at it) is what accounts for the 3mm^2 difference. Even if Llano's core IS the same as Thuban's (I know it's not), then you are looking at a 18.75% increase. 1.1875 * 175 does leave us at 207 mm^2 for a Thuban die shrink.

Still, Llano's core is a noticeable difference taller, there isn't really any way the shrink could be more than 200mm^2 regardless of circumstances.

Still, everything else I said stands, take a few percent in regards to what I said about performance per mm^2. I'll update my post with correct math.

@demonkevy
They have to, they went to immersion lithography to help them shrink easier. Intel uses it on their 32nm now as well.

**muzz** · 11-23-2011, 03:40 PM

If anyone is guessing how good BD is, it's bad enough that it's making ardent AMD Fanboys gnaw at each other.
I must say that I've never seen that before, so that should tell ya something.

**freeloader** · 11-23-2011, 04:02 PM

Originally Posted by muzz

That probably should have been mentioned, don't you think?

You would be surprised how many reviewers don't even know where to disable throttling in the BIOS. It's a good possibility that's what happened.

**muzz** · 11-23-2011, 04:35 PM

Originally Posted by freeloader

You would be surprised how many reviewers don't even know where to disable throttling in the BIOS. It's a good possibility that's what happened.

Hence the reason why I said what I said about them.

**sergiojr** · 11-23-2011, 08:19 PM

Originally Posted by savantu

It's a general rule of thumb in the industry. Moving to a new process brings you two advantages :
-die size reduction, maximum is 50% (0.7*0.7 )
-20% more frequency for the same power

All new processes ussually claim 20-50% power reduction or alternatively 20-40% more clocks for the same power consumption.

We are talking about AMD, not about the industry as a whole. So do you have such example of shrink in AMD/GF history besides Deben on 45nm?
Anyway it is not reduced transistor size, that brings improvements, it is R&D done during 2 years between two nodes that does it. If this R&D is applied to the old process two, then difference will be smaller. And defects rate is in direct correlation with process performance, as defects also cause process variability, but manufacturer's claims are for defectless transistors. If you have only one "slow" transistor on a die you should regard all die as slow. So with high defect rate it really doesn't matter how fast your process's defectless transistor is. It is strange that you have not mentioned it, as you pretend to understand industry.

**TESKATLIPOKA** · 11-24-2011, 12:00 AM

BeepBeep2

That is Llano's die with some crude photoshop work. I'm talking about shrinking the existing Thuban die. You copied and pasted extra cores and L2 on Llano's die, which makes absolutely no sense considering it is of different shape and L2 capacity.

I know its crude, changes done in windows paint. I just added 2 cores and cache and it makes much much more sense than just calculating ideal die size shrink. The sum of L2 and L3 is 9MB(L2 6MB and L3 3MB), the same as Thuban(L2 3MB and L3 6MB) so there is no problem
All this is no longer important because I made some calculations based on real shrink to prove my point.

Here is an interesting comparison between Deneb and agena

http://img.tomshardware.com/us/2008/...phenom-die.jpg
http://www.xtremesystems.org/forums/...8&d=1308242603
Deneb 45nm 258mm2
Agena 65nm 285mm2
It should have been 45^2/65^2=0.48 so ideal Agena shrink would mean 285*0.48=136.8mm2 but it ended 285*0.905=258mm2.
Its true Deneb has +4MB L3 cache so lets look what it does after removing it from Deneb or adding it to Agena and then doing the shrink

Deviation from Ideal Scaling: 90nm-> 0%, 65nm->14%, 45nm->39%
Equal Die Size Cache: 90nm 1MB, 65nm 1.75MB, 45nm 2.89MB
http://people.ac.upc.edu/rcanal/pdf/Liang-intel08.pdf

Our first exposure to the Athlon 64 X2 came in the form of the 4800+ model. That chip is code-named "Toledo," and it packs 1MB of L2 cache per processor core, as do the dual-core Opterons. Toledo-core chips sport a transistor count of about 230 million, all crammed into a die size of 199 mm2.

AMD also makes several models of Athlon 64 X2 that have only 512K of L2 cache. In the past, CPUs with smaller caches have sometimes been based on the exact same chip as the ones with more cache, but they'd have half of the L2 cache disabled for one reason or another. That's not the case with the X2 3800+. AMD says this "Manchester"-core part has about 154 million transistors and a die size of 147 mm2, so it's clearly a different chip.

What you may or may not have noticed in that paragraph above is that the 3800+ features a "Manchester" core, not the "Toledo core used in the rest of the X2 line. The difference? The Manchester core features fewer transistors (154M compared to the Toledo's 233.2M) and a smaller die size (147mm^ compared to the Toledo's 199mm^2), which also definitely gives it a far better thermal numbers than its siblings (89W as opposed to 110W).

1MB L2 die size is 52mm2 on 90nm

Agena on 65nm with 6MB L3 cache
4/1.75*52mm2=119mm2
285+119=404mm2 -> 258/404=0.64 die shrink instead of ideal 0.48
Deneb on 45nm with 2MB L3 cache
4/2.89*52mm2=72mm2
258-72=186mm2 -> 186/285=0.65 die shrink instead of ideal 0.48
So back to Thuban shrink, ideal is 32^2/45^2=0.51 so its a bit worse than 65nm->45nm -> 0.51-0.48=0.03
ideal Thuban shrink 346mm2*0.51= 176.5mm2
close to reality thuban shrink 346mm2* ((0.64 or 0.65) +0.03)=232~235mm2

I made my point and I will no longer continue in this debate which is just killing my time, I would much more have a debate about Trinity.

**behrouz** · 11-24-2011, 12:15 AM

nice that this thread went to war instead of talk about Piledriver

**Sunfire** · 11-24-2011, 12:27 AM

Originally Posted by demonkevy666

now back to piledriver trinity.

Immersion lithography is what made 45nm good I was wonder if 32nm is still making use of it?

Maybe it did 45nm good, but I don't think that was the key to success only. Immersion is a musthave for 45nm and denser transistor technology, because they (and all modern/up-to-date semiconductors) are using 193 nm light source for patterning. You can't build 45nm structures with that, you must use immersion (ultra-pure water) to focus the light. Even with immersion, they use (they have to use) double-patterning too for critical circuitry. Both things bring you advantages, and of course more costs.

The two next 'big thing' for the semiconductor industry seems a bit far away from now (450mm wafers and EUV technology). AMD already has an EUV-tool at NY for testing purposes, but I didn't heard anything about them and 450mm wafers.

**-Boris-** · 11-24-2011, 01:50 AM

Originally Posted by TESKATLIPOKA

2. perf/mm2
Savantu did pretty much the same calculations
4M/8C BD FX 8150, 32nm, 315mm2
6C Thuban, 45nm, 346mm2
Ideal shrink: 346*(32^2/45^2)= 175mm2

http://www.xtremehardware.it/images/..._die_Llano.jpg
LLano ~228mm2
The link shows you a Llano die shot, If you remove the IGP and add what you want with the same amount of cache you will end up with something like this
http://img521.imageshack.us/img521/5651/002diellano.jpg
and that is ~210-220mm2.
I used a 6C Llano on 32nm with the same amount of cache as Thuban.
And now back to perf/mm2
AMD Phenom II X6 1100T [3.3 GHz, 6 core, Turbo] 100%
AMD FX-8150 [3.6 GHz, 4 Module, CMT, Turbo] 108%
Llano vs Deneb on average from this link is 3.23% better, I had to do an average value, what a hassle

http://www.anandtech.com/bench/Product/403?vs=85
so its 103.23% vs 108% add the 3MB L3 cache and you have ~105% but you are still under BD performance not to mention the same problem as before with the mix of differently threaded applications.
So the reality is not 2x as you said but rather 315/210=1.5, so better by 50% but BD performs better.

Fine, we use your die-size estimates, you do have a point there. So we agree that Thuban is more like 50% effective per mm². Higher single thread performance and smaller size making more cores possible for better multithread performance is a winner in my eyes.

Originally Posted by TESKATLIPOKA

3. frequencies
APU 32nm
AMD-3850 A8 [2.9 GHz, 4 cores] no turbo
ES model Trinity 2M/4C 3.8Ghz turbo 4.1Ghz
CPUs
45nm AMD Phenom II X4 980 [3.7 GHz, 4 cores] no turbo
45nm AMD Phenom II X6 1100T [3.3 GHz, 6 cores, turbo, AM3 +] turbo 3.6Ghz
32nm AMD FX-4100 [3.6 GHz, 2 modules, CMT, turbo] turbo 3.9Ghz
32nm AMD FX-6100 [3.3 GHz, 3 modules, CMT, turbo] turbo 3.9Ghz

As you can see IGP doesn't affect the cpu frequencies if it affected it then you wouldn't see Trinity with the same speed as classic cpu models without any IGP, it only affects TDP because some needs to be reserved for IGP so you need to lower default clocks but turbo can make up for it if IGP is idling.

That's an unfair comparision since Llano is a real quadcore. It's not a suprise if a dual module get's higher frequencies. But on the other hand, if they manage to get Piledriver to outperform Llano with lower power consumption I guess you are right. We'll have to wait and see. Even if I do think Llano is capable of higher frequencies, especially in the spring when Piledriver arrives. It's not unusual that hardware makers hold old tech back to give room for successors.

Originally Posted by TESKATLIPOKA

really

and what is this

I think thats a 6C Llano, of course I meant without IGP. I don't think I am using straw man arguments here

.

Now that's a straw man! I said THUBAN with Llanos IPC-improvements and with BD's or Llanos IMC. That's not the same thing. The GPU is defining Llano more than some IPC improvements. A Thuban with some IPC-improvements is not a Llano. But I understand it's tempting to call it a Llano since Llanos integration of NB has made it very hard to overclock, so it's tempting to make my suggestion look bad by comparing it with crippled products.

Originally Posted by TESKATLIPOKA

Why not? you never said just Thuban, you said Phenom 2
Deneb is also Phenom II and I wanted to compare Llano what is practically a better Deneb vs BDs on the same 32nm node.
Which Thuban you mean?, just a shrink or Thuban based on Llano cores? In my opinion frequencies would be lower at the current 32nm compared to 45nm.

I've said Thuban many times during this discussion, and Deneb is an older and less performing version of Phenom II. You can't just use a product with lesser performance when we are talking about how capable a line up is. It's like saying Fords are faster than Ferraris just because Fords fastest car is faster than Ferraris worst. And that's definitely a straw man argument. Let us compare the best of Phenom II to the best of Bulldozer!
And I still see no reason why Phenom II would have lower frequencies on 32nm. Llano shows that 32nm brings a big drop in power consumption. Why would the frequencies be worse? We can't tell until we have an truly unlocked Llano, and even then we still don't know if Llano suffers from tradeoffs from being coupled with a GPU not made for the same type of process.

Originally Posted by TESKATLIPOKA

I still don't know what good is talking about something what will be never released

.

Because the question was why people didn't like what AMD did with BD, and the answer is that we don't feel it's the best thing they could have done. Of course it's easy to say that in retrospect, but that is what many of us feel.

**Lightman** · 11-24-2011, 02:00 AM

Originally Posted by Sunfire

Maybe it did 45nm good, but I don't think that was the key to success only. Immersion is a musthave for 45nm and denser transistor technology, because they (and all modern/up-to-date semiconductors) are using 193 nm light source for patterning. You can't build 45nm structures with that, you must use immersion (ultra-pure water) to focus the light. Even with immersion, they use (they have to use) double-patterning too for critical circuitry. Both things bring you advantages, and of course more costs.

The two next 'big thing' for the semiconductor industry seems a bit far away from now (450mm wafers and EUV technology). AMD already has an EUV-tool at NY for testing purposes, but I didn't heard anything about them and 450mm wafers.

Thanks for that!
I have to do small correction though, Intel didn't use immersion litho for 45nm, only double patterning and dry litho

**TESKATLIPOKA** · 11-24-2011, 05:07 AM

-Boris-

Fine, we use your die-size estimates, you do have a point there. So we agree that Thuban is more like 50% effective per mm². Higher single thread performance and smaller size making more cores possible for better multithread performance is a winner in my eyes.

I don't agree about perf/mm2
I made new calculations and it ended with
6C Thuban cache 9MB 32nm 232mm2
4M/8C BD cache 16MB 32nm 315mm2
Thats 36% difference in die size but BD is 8% faster.
The thing is you are talking about the whole chip and I think you know cache doesn't give nowhere near as much performance as the size it occupies.
Second, if you want to compare it that badly compare just the cores vs modules it would be more accurate

. 1M/2C is more or less equal to 2C Llano ~ 2 Deneb cores in size.
K10 can have better perf/mm2 actually I think it has but nowhere near as much as you want(think).

Now that's a straw man! I said THUBAN with Llanos IPC-improvements and with BD's or Llanos IMC. That's not the same thing. The GPU is defining Llano more than some IPC improvements. A Thuban with some IPC-improvements is not a Llano.

what you are doing is trying to convince us that Thuban with Llano IMC and Llano Improvements is not Llano just because it doesn't have the IGP

, you can call It Thuban 2 if you prefer I don't care because its not important and BTW I clearly wrote 6C Llano without IGP and not deactivated so you had enough time to comprehend what I meant in my original post and that part wasn't even important compared to the rest.

But I understand it's tempting to call it a Llano since Llanos integration of NB has made it very hard to overclock, so it's tempting to make my suggestion look bad by comparing it with crippled products.

Yeah its really tempting and unfair to call a 6 core chip with Llano cores and IMC as Llano

Your point about OC is pointless because I never said anything about that, I was always comparing on default frequency.

I've said Thuban many times during this discussion, and Deneb is an older and less performing version of Phenom II. You can't just use a product with lesser performance when we are talking about how capable a line up is. It's like saying Fords are faster than Ferraris just because Fords fastest car is faster than Ferraris worst. And that's definitely a straw man argument. Let us compare the best of Phenom II to the best of Bulldozer!

For your information Deneb and Thuban have the same core and IMC, everything is the same, L1,L2 cache per core even L3 cache, the only difference is Thuban has 2 more cores with L2 cache nothing more.
BTW I still don't know what's your problem. Did I compare 4M BD versus 6C Thuban? yes I did, but I had the audacity to include the highest(best) Deneb and even compare lower BD models to Llano(IGP was powergated) because they are on the same process. The best thing would have been if I also compared Deneb vs FX4100 (4threads vs 4threads) and Thuban vs FX6100(6threads vs 6threads).

And I still see no reason why Phenom II would have lower frequencies on 32nm. Llano shows that 32nm brings a big drop in power consumption. Why would the frequencies be worse? We can't tell until we have an truly unlocked Llano, and even then we still don't know if Llano suffers from tradeoffs from being coupled with a GPU not made for the same type of process.

I will tell you one last time and you don't need an unlocked Llano.

4C 32nm Llano 2.9Ghz TDP 100W (has higher power draw than FX4100 with TDP 95W while the IGP is power-gated so the TDP 100W should be correct for CPU)
4C 45nm Phenom II X4 B99(Deneb) 3.3Ghz TDP 95W (+400Mhz)
4C 45nm Deneb 3.7Ghz TDP 125W (+800Mhz)
TDP 95->125W is 400Mhz for +30W in TDP
So increasing TDP to 130W would mean 3.3Ghz Llano(+400mhz), the difference is still 400Mhz between the older and current process on the same architecture.

I don't think IGP has a noticeable impact on frequency, because if it had then already produced ES Trinity with bigger IGP than Llano wouldn't have default +200Mhz(turbo +300Mhz) than FX 4100 on the same process and I wouldn't be surprised if it's not the final frequency during launch.

**savantu** · 11-24-2011, 05:44 AM

Originally Posted by TESKATLIPOKA

-Boris-
I don't agree about perf/mm2
I made new calculations and it ended with
6C Thuban cache 9MB 32nm 232mm2
4M/8C BD cache 16MB 32nm 315mm2
Thats 36% difference in die size but BD is 8% faster.

BD is 8% faster than the 45nm Thuban. For 32nm, you'd have 36% size difference and likely 10-20% performance difference in Thuban's favour ( higher clocks at the same power if nothing else ). So 50% more performance/sq mm isn't unreasonable.

**TESKATLIPOKA** · 11-24-2011, 06:25 AM

savantu

BD is 8% faster than the 45nm Thuban. For 32nm, you'd have 36% size difference and likely 10-20% performance difference in Thuban's favour ( higher clocks at the same power if nothing else ). So 50% more performance/sq mm isn't unreasonable.

I think 50% is too much, the difference in size is 36% as you said but if you add 10-20% from frequency on Thuban you still need to substract 8% for BD so it will end up 2-12% and that is 38-48%, but first of all you need to have Thuban 6C on 32nm with default clocks 3.63-3.96Ghz(turbo: 4070-4440Mhz) to have 38-48% better perf/mm2 and thats unreasonable on the current 32nm process, when its mature enough you will probably get that high, but then BD will have higher clocks as well.
Look at the last part of my previous post, if you think I am right or not is up to you.

P.S. comparing Thuban vs BD will result just in perf/mm2 between different models but nowhere near the perf/mm2 between architectures, you would need to have at least the same amount of cache, same process and the same TDP to be more or less accurate. I almost forgot about the same thread count, 2module vs 4core and so on.

**-Boris-** · 11-24-2011, 08:03 AM

Originally Posted by TESKATLIPOKA

-Boris-
I don't agree about perf/mm2
I made new calculations and it ended with
6C Thuban cache 9MB 32nm 232mm2
4M/8C BD cache 16MB 32nm 315mm2
Thats 36% difference in die size but BD is 8% faster.
The thing is you are talking about the whole chip and I think you know cache doesn't give nowhere near as much performance as the size it occupies.
Second, if you want to compare it that badly compare just the cores vs modules it would be more accurate

. 1M/2C is more or less equal to 2C Llano ~ 2 Deneb cores in size.
K10 can have better perf/mm2 actually I think it has but nowhere near as much as you want(think).

It depends, I think 32nm would bring som nice frequency improvements. A thuban with 6% higher IPC from Llano-optimizations and 3% from BD IMC and a base clock at 3.8GHz seems extremely resonable to me. Add Turbo Core 2.0 and it will beat Bulldozer even more in single thread performance. You can read further down in this post about why I think 32nm has such potential.

Originally Posted by TESKATLIPOKA

what you are doing is trying to convince us that Thuban with Llano IMC and Llano Improvements is not Llano just because it doesn't have the IGP

, you can call It Thuban 2 if you prefer I don't care because its not important and BTW I clearly wrote 6C Llano without IGP and not deactivated so you had enough time to comprehend what I meant in my original post and that part wasn't even important compared to the rest.

You did not write the part without IGP, that's just lies, you can't just pretend that you said that from the beginning. Here it is:

Originally Posted by TESKATLIPOKA

you run to an alternative reality where you can find a 6C Llano ~3.7Ghz with <=125W TDP on working 32nm process but reality is way different..

And that's just a straw man. The whole thing with Llano is the IGP, if you talk about Llanos without IGP you have to say that. If you pretend or insinuate that my arguments is about something totally different from what they really is about. That is making a straw man. And don't dare get me a quote where you say you meant without IGP if it's from a post after I made my complaint about your straw man tactics.

Originally Posted by TESKATLIPOKA

Yeah its really tempting and unfair to call a 6 core chip with Llano cores and IMC as Llano

Your point about OC is pointless because I never said anything about that, I was always comparing on default frequency.

No you didn't say it, but Llano, due to it's design clocks like turds. So by calling my suggestion a Llano without mentioning you meant without the parts that cripples Llanos frequency potential, you made it seem like my argument was about full Llano APUs with extra cores and higher frequency. Which of course is pretty stupid. A straw man is to "misinterpret" someones argument to something stupid and then argue against that made up stupid stand point instead of the real one. That's why I protested. But if you want we can call it Thuban II or Phenom III from now on. But then we have to be clear what we are arguing about. My definition is a shrinked Thuban core, with the same caches, and with Llanos IPC-improvements and IMC.

Originally Posted by TESKATLIPOKA

For your information Deneb and Thuban have the same core and IMC, everything is the same, L1,L2 cache per core even L3 cache, the only difference is Thuban has 2 more cores with L2 cache nothing more.
BTW I still don't know what's your problem. Did I compare 4M BD versus 6C Thuban? yes I did, but I had the audacity to include the highest(best) Deneb and even compare lower BD models to Llano(IGP was powergated) because they are on the same process. The best thing would have been if I also compared Deneb vs FX4100 (4threads vs 4threads) and Thuban vs FX6100(6threads vs 6threads).

When I talk about how efficient Phenom II is I of course meant the most efficient modell. And Thuban has more performance per mm², higher performance per watt and higher IPC for the whole die than Deneb. It should be obvius that I talk about Thuban, since I mentioned it tens of times in this thread already. So please stop making my arguments seem to be about things they aren't.

Originally Posted by TESKATLIPOKA

I will tell you one last time and you don't need an unlocked Llano.

4C 32nm Llano 2.9Ghz TDP 100W (has higher power draw than FX4100 with TDP 95W while the IGP is power-gated so the TDP 100W should be correct for CPU)
4C 45nm Phenom II X4 B99(Deneb) 3.3Ghz TDP 95W (+400Mhz)
4C 45nm Deneb 3.7Ghz TDP 125W (+800Mhz)
TDP 95->125W is 400Mhz for +30W in TDP
So increasing TDP to 130W would mean 3.3Ghz Llano(+400mhz), the difference is still 400Mhz between the older and current process on the same architecture.

I don't think IGP has a noticeable impact on frequency, because if it had then already produced ES Trinity with bigger IGP than Llano wouldn't have default +200Mhz(turbo +300Mhz) than FX 4100 on the same process and I wouldn't be surprised if it's not the final frequency during launch.

Some very creative math there with lots of things that can go wrong. You can't calculate that way, I can show math that 700MHz is 0w extra TDP. Or that 2 extra cores is 30W less TDP! So drop that please.
The thing is, Llano uses 5-20W less power than Athlon II on 45nm! So your argument is invalid. And power gated doesn't mean it doesn't affect performance, if you make a large chip and turn half of the chip off it won't be nearly as cool and fast as a chip made half as big from the beginning. That said Llano was the first of it's kind, it's not surprising if tradeoffs has been made. And you tend to forget that Llano is faster than FX-4100.
And again, we don't know anything about Llanos capabilities, since it's locked and is an architecture that don't allow high buses. Just like locked SB! And it's not unreasonable that Llano is held back because of Piledriver. If they released Llano at 3.4GHz and it beats Piledriver, how would that look? It might be the Tualatin syndrome all over again.

Since 32nm uses less power than 45nm despite an extra GPU you will have a hard time proving that 32nm wouldn't work good with Thuban. And that math of your can "prove" just about anything, try harder.

**Smartidiot89** · 11-24-2011, 08:32 AM

So discussion went from constructive to "who wrote what, and you're using straw man arguments". The most important thing is being right, not actually bringing something to the table

GO GO THE INTERNETZ

!

**desnudopenguino** · 11-24-2011, 08:37 AM

Originally Posted by Smartidiot89

So discussion went from constructive to "who wrote what, and you're using straw man arguments". The most important thing is being right, not actually bringing something to the table

GO GO THE INTERNETZ

!

I agree. And isn't this thread supposed to be about piledriver, not BD and Thuban anyway?

**BeepBeep2** · 11-24-2011, 08:40 AM

Originally Posted by Smartidiot89

So discussion went from constructive to "who wrote what, and you're using straw man arguments". The most important thing is being right, not actually bringing something to the table

GO GO THE INTERNETZ

!

+1 internet for Smartidiot

Despite TESKATLIPOKA's more "correct" calculations (though I measured the 45nm core and compared it to 32nm Llano core) I'm pretty sure even teskatlipoka even agrees that performance is currently better on STARS than BD. It seems we've settled around 30% per mm2, in worst case. What if we did have that 8 core STARS? Is it even possible for single thread to be slower than BD? By the way, in some single threaded applications, BD needs ~6 Ghz+ to keep up with the stock 2600K...

So, what improvements do we expect to see for Piledriver?

Thread: AMD "Piledriver" refresh of Zambezi - info, speculations, test, fans

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions