AMD Zambezi news, info, fans !

Printable View

Show 100 post(s) from this thread on one page

10-05-2011, 12:34 AM
bamtan2

unless the 6100 unlocks to be 8100, and unless 8150 is lower voltage than 8120, everybody is just going to get 8120 :)

15% extra money is worth 30% extra cores. and higher stock speed is not worth $40 if they both overclock the exact same.
10-05-2011, 01:05 AM
Opteron146

Quote:

Originally Posted by tifosi

Well yes, the parts are indeed different. However, your point was it would be shipped to OEM's and retail wouldn't see any (if at all). Unless it is a OEM special, like the 960T, where AMD clarified as much, i don't have a reason to believe that the 95W chip won't be seen in retail. If you look at it, Phenom II 945 is merely locked version of 940 at lower TDP. This being FX, i don't think AMD would muck about.

Well the 945 was later in the C3 revision a plain, normal 95W part. The first 940/945 models with 125W were still C2. Nothing exciting about that, that's the normal process/stepping optimizing benefit. If you want to compare it to FX, then my point would be that the 8120 95W model will be scarce as hen's teeth in etail, because it will be an OEM only model and etail=we have to live with the 125W part or the 8100 95W model, instead. Then next year, when they'll launch 8170, there's maybe enough 95W 8120 for everybody. We'll see...
10-05-2011, 02:10 AM
wez

Quote:

Originally Posted by bamtan2

unless the 6100 unlocks to be 8100, and unless 8150 is lower voltage than 8120, everybody is just going to get 8120 :)

15% extra money is worth 30% extra cores. and higher stock speed is not worth $40 if they both overclock the exact same.

With the previous launches (agena/thuban) there was a notable difference in overclocking between the top and lower models. And with the rumored somewhat limited production of BD, I dont think it too far fetched to assume the top model will be the one to get for max clocks this time around as well.

And with 8 cores, the chance of getting a crap core or two is not exactly smaller than before. So you'd assume binning plays and even bigger role than before. But yea, we'll see soon enough :D
10-05-2011, 02:55 AM
Leeghoofd

Anyone heard about Windows 8 seeming to steer the cores better than windows 7 ? Current Task scheduler seems to mess up a bit the performance of our beloved BD... Looking forward to fire it up tonite...
10-05-2011, 02:59 AM
dess

Quote:

Originally Posted by Leeghoofd

Anyone heard about Windows 8 seeming to steer the cores better than windows 7 ? Current Task scheduler seems to mess up a bit the performance of our beloved BD... Looking forward to fire it up tonite...

An update to fix it is expected sometime soon.
10-05-2011, 03:06 AM
wez

Quote:

Originally Posted by Leeghoofd

Anyone heard about Windows 8 seeming to steer the cores better than windows 7 ? Current Task scheduler seems to mess up a bit the performance of our beloved BD... Looking forward to fire it up tonite...

You are such a tease... :stick:
10-05-2011, 03:16 AM
Smartidiot89

Quote:

Originally Posted by dess

An update to fix it is expected sometime soon.

Really, where have you heard this? Not supposed to be "fixed" until you get Windows 8:shakes:
10-05-2011, 03:19 AM
undone

http://tipidpc.com/viewtopic.php?tid=172787&page=4183

http://i31.photobucket.com/albums/c3...r/P1000373.jpg

:D:D:D
10-05-2011, 03:45 AM
Opteron146

Quote:

Originally Posted by Smartidiot89

Really, where have you heard this? Not supposed to be "fixed" until you get Windows 8:shakes:

I assume he meant the fix will come sometime for Win7, but Win8 will have it from the start. Just guessing from the - we don't speak about - guy's statement that Win8 is logging in BD as 4core/8thread machine.
10-05-2011, 04:18 AM
informal

I think we should call Zambezi 8150 an 8 threaded CPU. This way we avoid any core vs not-a-core arguments. It has 8 strong threads,according to AMD,so let's call it 8T capable CPU. Thuban is 6T chip,SB 2600k is 8T chip. Whether it's a weak or strong thread is debatable though.

On another note,we have some shops listing FX models and mobo and FX bundles. So it seems 12th is the date. Won't be long now :).
10-05-2011, 04:45 AM
Dimitriman

Quote:

Originally Posted by informal

I think we should call Zambezi 8150 an 8 threaded CPU. This way we avoid any core vs not-a-core arguments. It has 8 strong threads,according to AMD,so let's call it 8T capable CPU. Thuban is 6T chip,SB 2600k is 8T chip. Whether it's a weak or strong thread is debatable though.

On another note,we have some shops listing FX models and mobo and FX bundles. So it seems 12th is the date. Won't be long now :).

I just call it a 4 module. I like the term and it "implies" 8 threads (or integer units).

So 2 Cores>Module>Hyperthreading, works for me.
10-05-2011, 04:45 AM
FlanK3r

oh, I like this package....
10-05-2011, 04:51 AM
dess

Quote:

Originally Posted by Smartidiot89

Really, where have you heard this? Not supposed to be "fixed" until you get Windows 8:shakes:

There were some talks earier that an update is needed to optimize taskscheduling for BD, and it's to come.

BTW, AFAIK there must be also an update to come to enable FMA4 and XOP under Win7 (just like AVX were enabled by SP1).
10-05-2011, 04:58 AM
undone

Quote:

Originally Posted by dess

There were some talks earier that an update is needed to optimize taskscheduling for BD, and it's to come.

BTW, AFAIK there must be also an update to come to enable FMA4 and XOP under Win7 (just like AVX were enabled by SP1).

If it's true I would assume such these patches wont be release until Zambezi shipping to the reviewers. Wait and see.....
10-05-2011, 05:33 AM
Particle

Windows itself doesn't need to be aware of FMA4 and XOP for developers to use them of course. A Windows patch would only address what Windows itself uses which can increase core Windows performance.

--

Can we not get into the core vs thread thing again?
10-05-2011, 05:56 AM
Opteron146

Quote:

Originally Posted by dess

BTW, AFAIK there must be also an update to come to enable FMA4 and XOP under Win7 (just like AVX were enabled by SP1).

No, it was needed for AVX because it introduced new/wider registers. The OS has to save these between context switches now, too. Think about what would happen if the OS forgets to save your data ... *g*

XOP and FMA4 however do not introduce new registers, they use the SSE or AVX' registers, therefore no extra patch needed - as long as the AVX patch is in place everything is fine.
10-05-2011, 05:59 AM
undone

Quote:

Originally Posted by Opteron146

No, it was needed for AVX because it introduced new/wider registers. The OS has to save these between context switches now, too. Think about what would happen if the OS forgets to save your data ... *g*

XOP and FMA4 however do not introduce new registers, they use the SSE or AVX' registers, therefore no extra patch needed - as long as the AVX patch is in place everything is fine.

How about the new register of second integer core?
10-05-2011, 06:12 AM
PatRaceTin

i see bundle package in extreme news section
10-05-2011, 06:14 AM
tifosi

Quote:

Originally Posted by Opteron146

...there's maybe enough 95W 8120 for everybody. We'll see...

Well, process is only going to improve with time. This will allow some headroom and a chip with a lower TDP is not so far fetched an idea, as you seem to propose. There were also server MC chips, some of which now are there with lower TDP.
10-05-2011, 06:40 AM
Opteron146

Quote:

Originally Posted by tifosi

Well, process is only going to improve with time. This will allow some headroom and a chip with a lower TDP is not so far fetched an idea, as you seem to propose. There were also server MC chips, some of which now are there with lower TDP.

Hmm exactly what I meant.
You wrote "improve with time", that's correct, hence I wrote "next year":

Quote:

Then next year, when they'll launch 8170, there's maybe enough 95W 8120 for everybody. We'll see...

I don't see a problem here, only a misunderstanding ;-)

Quote:

Originally Posted by undone

How about the new register of second integer core?

As long as these cores are only using good,old, standard-x86 registers, no problem at all ;-)
10-05-2011, 08:09 AM
chew*

Quote:

Originally Posted by undone

http://tipidpc.com/viewtopic.php?tid=172787&page=4183

http://i31.photobucket.com/albums/c3...r/P1000373.jpg

:D:D:D

Boxxes are so last year.

I want to see tray shots full of chips ;)
10-05-2011, 09:18 AM
Dumo

Quote:

Originally Posted by chew*

Boxxes are so last year.

I want to see tray shots full of chips ;)

Does that mean we should bin retail?:D
10-05-2011, 09:21 AM
Mechanical Man

Quote:

Originally Posted by chew*

Boxxes are so last year.

I want to see tray shots full of chips ;)

Then show us some!
10-05-2011, 09:39 AM
Voodoo²

Look at the production date of that chip. "1136" that is september right?
10-05-2011, 09:51 AM
dess

Quote:

Originally Posted by Opteron146

No, it was needed for AVX because it introduced new/wider registers. The OS has to save these between context switches now, too. Think about what would happen if the OS forgets to save your data ... *g*

XOP and FMA4 however do not introduce new registers, they use the SSE or AVX' registers, therefore no extra patch needed - as long as the AVX patch is in place everything is fine.

I know, but it could be that perhaps some state attributes needs to be stored. Can't find where I've read about it.

Regarding task-scheduling, it's from JF-AMD:

Quote:

Performance is based on:
The silicon
The microcode in the silicon
The BIOS
The compiler updates
The drivers
The OS optimizations
Performance tuning by engineers

Also, there were some slides on how Windows' scheduler needs to be changed to accomodate to BD, and IIRC it was about Win7. Can't find it now, either.
10-05-2011, 09:53 AM
undone

Quote:

Originally Posted by Voodoo²

Look at the production date of that chip. "1136" that is september right?

Some other thing is more weird and surprising:

http://semiaccurate.com/forums/showp...5&postcount=13

Quote:

stepping A1, week 36, year 2011.

A1 chips production underwayed in September? wtf?
10-05-2011, 10:17 AM
Voodoo²

I´m not sure what "FA1" stands for but as long as I remember the steppings always were indicated by one of the letters here "FD8150FRW8KGU". For Example:

Phenom II X4 955 C2 stepping HDZ955FBK4DGI

Phenom II X4 955 C3 stepping HDZ955FBK4DGM
10-05-2011, 10:38 AM
imamage

Quote:

Originally Posted by Opteron146

I red but do you have a BD? So why should I believe you?
Cinebench11.5 scores are rather bad even CB10 is doing better. Hence I do not believe that the FPU is maxed out at all, especially as there is neither FMAC nor XOP/"MMX" code (the other 2 pipes in the FPU) used. Thus I think there is enough headroom for the 3,9Ghz Turbo stage. Anyways, we'll know in less than 1 week ;-)

Dang, I wish I have one before Retail Launch !!!
10-05-2011, 11:43 AM
BeepBeep2

A1 is not the stepping unless they changed the naming scheme...

CACDC for Deneb translates FA1 on these new processors...
10-05-2011, 11:52 AM
undone

http://www.abload.de/img/fx4pricefutw.png

http://www.shopblt.com/cgi-bin/shop/...r_id=296538691
10-05-2011, 11:57 AM
Manicdan

whats the difference between box and try? the 4100 for 121$ sounds very desirable. but just seems a little strange looking
10-05-2011, 12:03 PM
undone

Quote:

Originally Posted by Manicdan

whats the difference between box and try? the 4100 for 121$ sounds very desirable. but just seems a little strange looking

Someone guess it's a copy-paste typo.

http://www.planet3dnow.de/vbulletin/...&postcount=625
10-05-2011, 12:15 PM
Manicdan

yeah i was expecting something like that. the 189$ should be 4C, and the 121$ should be an x4 non FX model cpu, for it all to make sense
10-05-2011, 12:23 PM
Mechanical Man

Quote:

Originally Posted by Manicdan

whats the difference between box and try? the 4100 for 121$ sounds very desirable. but just seems a little strange looking

I dont think that is real price. Its some kind of error.

But, difference between box and tray is cooler. Box has cooler with it, tray does not have cooler with it.
10-05-2011, 12:52 PM
bamtan2

I get concerned when I see people touting review units that only include one chip.

there are 4 chips to review. if we get only one chip reviewed on the 12th I will be smashing things.
10-05-2011, 01:12 PM
Apokalipse

Quote:

Originally Posted by Musho

Also, the cores are running at 80% efficiency when both cores in a single module are loaded

AMD said 80% more performance than single threaded (in the same module, also presuming same frequency), meaning 180% of single threaded, or (180/2) 90% for each core.
10-05-2011, 01:20 PM
Leeghoofd

start smashing then, till now I only know about 8150 models being shipped in press kits...
10-05-2011, 01:25 PM
informal

Quote:

Originally Posted by Apokalipse

AMD said 80% more performance than single threaded (in the same module, also presuming same frequency), meaning 180% of single threaded, or (180/2) 90% for each core.

Actually they officially said (in presentations) 80% of CMP design which was presumably CMP-type Bulldozer with nothing shared(except maybe L3). But we have been over this before.
I suspect the biggest hit will be running fp heavy code and that the 80% figure comes from that. It's logical when you think about it : instead of replicating "full" cores in order to get 8 FPUs,you invest in each FPU more resources,increase the BW to the unit and make it shareable between 2 integer cores.In the process you make the unit in the way so that it uses SMT for 2 threads running on 2 dedicated pieces of hardware inside it. This way you have 4 new FPUs ,now shared, that produce only 25% less throughput than 8 "full" ones in CMP (without SMT probably) and all this saves you considerable die area and grants you some TDP and clock headroom. Pretty neat idea isn't it? :)
10-05-2011, 06:37 PM
duron

look at all those goodies that come with it :D

hmmm some got them early(scroll down to last part)
http://www.tipidpc.com/viewtopic.php...2787&page=4183
10-05-2011, 06:44 PM
tbone8ty

results anybody? nda?

plenty of press kits around, give us a tease
10-05-2011, 09:54 PM
Daveburt714

You know, I'm really excited that FX is so close now.... :clap:

Reguardless of final performance compared to Intel, I can't help but think that once we get our hands on these
chips all the crazy fud/benchies are going to seem ridiculous....

I've been reading all this stuff for the last 9 months, and you wouldn't believe how bad I've been biting my tounge. :rolleyes:

Some may have been right, some may have been wrong, but once I (we) can test for ourselves all the questions will finally be answered! :up:

I'm sure there's some Firmware/Software/Hardware/OS tweaks that need to be done to get the best results from this new uARCH, but at least
it will finally be out there and worked on.

BRING'EM ON BABY..... :D

If nothing else, I need a new adventure, and this chip looks like fun!
10-05-2011, 10:06 PM
BeepBeep2

Quote:

Originally Posted by Daveburt714

You know, I'm really excited that FX is so close now.... :clap:

Reguardless of final performance compared to Intel, I can't help but think that once we get our hands on these
chips all the crazy fud/benchies are going to seem ridiculous....

I've been reading all this stuff for the last 9 months, and you wouldn't believe how bad I've been biting my tounge. :rolleyes:

Some may have been right, some may have been wrong, but once I (we) can test for ourselves all the questions will finally be answered! :up:

I'm sure there's some Firmware/Software/Hardware/OS tweaks that need to be done to get the best results from this new uARCH, but at least
it will finally be out there and worked on.

BRING'EM ON BABY..... :D

If nothing else, I need a new adventure, and this chip looks like fun!

This architecture seems very voltage friendly as well :)

So is Llano, GF's 32nm is very voltage hungry...2v+ for CPU-Z validations on LN2/LHe with BD.

It will be fun :D
10-05-2011, 10:23 PM
Apokalipse
Quote:

Originally Posted by informal

Actually they officially said (in presentations) 80% of CMP design which was presumably CMP-type Bulldozer with nothing shared(except maybe L3). But we have been over this before.

CMP (Chip Multi Processor) is two "full" cores, and CMT (Cluster-based MultiThreading) is what they call the modules idea:
http://data5.blog.de/media/732/3663732_9bc35365d1_l.png
That's the slide I was referencing, where they said 80% gain

Although in retrospect it also says 50% area investment, so I'm not sure if that exactly describes the actual BD modules used in Zambezi, which AMD said have 12% larger die area than a "full" core (hypothetical BD "full" core, not K10.5).

Quote:

Originally Posted by informal

I suspect the biggest hit will be running fp heavy code and that the 80% figure comes from that. It's logical when you think about it : instead of replicating "full" cores in order to get 8 FPUs,you invest in each FPU more resources,increase the BW to the unit and make it shareable between 2 integer cores.In the process you make the unit in the way so that it uses SMT for 2 threads running on 2 dedicated pieces of hardware inside it. This way you have 4 new FPUs ,now shared, that produce only 25% less throughput than 8 "full" ones in CMP (without SMT probably) and all this saves you considerable die area and grants you some TDP and clock headroom. Pretty neat idea isn't it? :)

Bulldozer's FlexFP:
http://blogs.amd.com/work/2010/10/25/the-new-flex-fp/
Basically it is two 128-bit FMAC's with a shared scheduler, which works alongside two integer cores.
Quote:
The Flex FP unit is built on two 128-bit FMAC units. The FMAC building blocks are quite robust on their own. Each FMAC can do an FMAC, FADD or a FMUL per cycle. When you compare that competitive solutions that can only do an FADD on their single FADD pipe or an FMUL on their single FMUL pipe, you start to see the power of the Flex FP – whether 128-bit or 256-bit, there is flexibility for your technical applications. With FMAC, the multiplication or addition commands don’t start to stack up like a standard FMUL or FADD; there is flexibility to handle either math on either unit. Here are some additional benefits:

Non-destructive DEST via FMA4 support (which helps reduce register pressure)
Higher accuracy (via elimination of intermediate round step)
Can accommodate FMUL OR FADD ops (if an app is FADD limited, then both FMACs can do FADDs, etc), which is a huge benefit

The new AES instructions allow hardware to accelerate the large base of applications that use this type of standard encryption (FIPS 197). The “Bulldozer” Flex FP is able to execute these instructions, which operate on 16 Bytes at a time, at a rate of 1 per cycle, which provides 2X more bandwidth than current offerings.

By having a shared Flex FP the power budget for the processor is held down. This allows us to add more integer cores into the same power budget. By sharing FP resources (that are often idle in any given cycle) we can add more integer execution resources (which are more often busy with commands waiting in line). In fact, the Flex FP is designed to reduce its active idle power consumption to a mere 2% of its peak power consumption.

The Flex FP gives you the best of both worlds: performance where you need it yet smart enough to save power when you don’t need it.

The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, OR each of the integer cores can execute 128-bit commands simultaneously. This is not something hard coded in the BIOS or in the application; it can change with each processor cycle to meet the needs at that moment. When you consider that most of the time servers are executing integer commands, this means that if a set of FP commands need to be dispatched, there is probably a high likelihood that only one core needs to do this, so it has all 256-bit to schedule.

Floating point operations typically have longer latencies so their utilization is typically much lower; two threads are able to easily interleave with minimal performance impact. So the idea of sharing doesn’t necessarily present a dramatic trade-off because of the types of operations being handled.

Here are the 4 likely scenarios for each cycle:

https://sites.google.com/site/apokalipse/FlexFP.png
It looks like it almost has enough FP resources to get the same performance as two "full" cores, the exception being if two 256-bit instructions were issued at once - though the capability to do that requires much more (largely unused) die area.
So I would think two thread scaling (in one module) is largely a matter of the shared front-end's capability to feed the execution resources (as well as memory bandwidth, latencies etc which needs to be improved the more cores you have)
10-05-2011, 11:52 PM
-Boris-

Quote:

Originally Posted by informal

This way you have 4 new FPUs ,now shared, that produce only 25% less throughput than 8 "full" ones in CMP (without SMT probably) and all this saves you considerable die area and grants you some TDP and clock headroom. Pretty neat idea isn't it? :)

Only neat if you get more than 25% higher frequencies. I doubt that doubling the FPUs would make a big hit on frequencies. The FPUs count for a very small part of total die area so saving mm² isn't worth it. And I doubt higher power consumtion would would lower the frequencies much.
10-05-2011, 11:58 PM
Apokalipse

Quote:

Originally Posted by -Boris-

Only neat if you get more than 25% higher frequencies. I doubt that doubling the FPUs would make a big hit on frequencies.

Doubling the FPU's takes massively more die area, meaning more power usage, and you can't clock it as high if you want to remain within a certain TDP.

Quote:

The FPUs count for a very small part of total die area

Floating point units are much more complex than integer units, and take up much more die area.
instruction sets like SSE, AVX use the FPU primarily.
10-06-2011, 01:04 AM
informal

@Apokalipse
The slide about cmt is from 2005,long before amd had any real HW im their hands. I stick with what they said at FAD 2010 and that's 80% of cmp design in less die area. The rest of the stuff you quoted is well known information and doesn't go against what I wrote. I even believe there won't be massive hit in integer throughput from running 2 threads on a module. Fp may see the best numbers if threads are 1st scheduled on different modules but this has to be verified.

@Boris
You do realize that in order to get 8 *full* fpus the old way, you have to replicate front ends ,integer exec. units and L1 and L2 caches ,right? This leaves you with wasted and doubled die area that will mostly sit idle (especially fp unit). Beaty of bulldozer is exactly in maximizing perf./watt/mm^2. Btw ,the most power hungry part of the core is usually fpu...
10-06-2011, 02:02 AM
Apokalipse

Quote:

Originally Posted by informal

@Apokalipse
The slide about cmt is from 2005,long before amd had any real HW im their hands. I stick with what they said at FAD 2010 and that's 80% of cmp design in less die area. The rest of the stuff you quoted is well known information and doesn't go against what I wrote. I even believe there won't be massive hit in integer throughput from running 2 threads on a module. Fp may see the best numbers if threads are 1st scheduled on different modules but this has to be verified.

My point is that I don't think the FlexFP will much less performance than two conventional "full" 256-bit FPU's if the frontend can do its job and keep the execution resources fed.
I think the only case where it is limited in execution resources is if there are two 256-bit instructions from two threads at once, but that's a very rare case.

So yes it won't be as fast as two "full" cores. I'm just saying that I don't think available execution resources is the main reason for this (for either integer or FP). The FlexFP looks very efficient and much less transistor/die-area wasteful than two conventional 256-bit FPU's in two "full" cores.

The frontend is very much beefed up vs K10.5 though; which it has to be to feed the extra execution resources for two threads.
10-06-2011, 02:44 AM
xdan

Quote:

Originally Posted by informal

@Apokalipse
The slide about cmt is from 2005,long before amd had any real HW im their hands. I stick with what they said at FAD 2010 and that's 80% of cmp design in less die area. The rest of the stuff you quoted is well known information and doesn't go against what I wrote. I even believe there won't be massive hit in integer throughput from running 2 threads on a module. Fp may see the best numbers if threads are 1st scheduled on different modules but this has to be verified.

@Boris
You do realize that in order to get 8 *full* fpus the old way, you have to replicate front ends ,integer exec. units and L1 and L2 caches ,right? This leaves you with wasted and doubled die area that will mostly sit idle (especially fp unit). Beaty of bulldozer is exactly in maximizing perf./watt/mm^2. Btw ,the most power hungry part of the core is usually fpu...

I am not against BD CMT design, but without stronger IPC it's just useless.
So with CMT we have 80% performance of a true core. But the problem is that it's not a 100% performance core + 80% CMT core( comparing Intel + HT), it's 80% performance for both cores in module so...
If we calculate 0.8(80%) * 8 = 6.4 so 6.4 true cores performance, so a bigh hit. :down:
This desing it doesn't scale well the more cores you put.
If we put that the IPC isn't much better- may be the same, not to be pesimist to say lower, than wat we got?
A 6.4 cores with a 10% speed bump, may be a 6.8-7 true cores performance.
So what "maximizing perf./watt/mm^2" - not performance anyway,

I have my info, and BD it's a disappoiment. For an " 8core" . As it' price, overall performance is between 2500K and 2600K, and will be hoter on air cooling than SB.
10-06-2011, 02:51 AM
Leeghoofd

Quote:

Originally Posted by xdan

and will be hoter on air cooling than SB.

You sure ? got data to back that statement ?
10-06-2011, 02:53 AM
dess

Quote:

Originally Posted by informal

Do you guys even read what I wrote? In floating point heavy code that employes all 8 threads Turbo will almost never engage. Turbo will engage accross all 8 integer cores though,but cinebench will use flexfp coprocessors most of the time where tdp will be maxed out. You can read all about bd exec. units power draw and clock characteristics at amd blogs past isscc event.

You have a point, but I think Cinebench uses mostly scalar maths, utilizing only the 1/4 or 1/2 of the 128 bit wide engines (depending on that if it uses single or double precision).

Also, it doesn't use FMA, so the underlaying FADD and FMUL units in an FMAC never work at once (or at least only one execution starts, per cycle).

0.5 x 0.5 = 0.25 -> 1/4 FPU utilization/thread (with MT)
0.25 x 0.5 = 0.125 -> 1/8 FPU utilization/thread (with MT)

Of course, it's quite theoretical as the sharing of the FPU is not exactly 50% per thread per module all the time, and these are the peak values.

Quote:

Originally Posted by Apokalipse

It looks like it almost has enough FP resources to get the same performance as two "full" cores, the exception being if two 256-bit instructions were issued at once - though the capability to do that requires much more (largely unused) die area.
So I would think two thread scaling (in one module) is largely a matter of the shared front-end's capability to feed the execution resources (as well as memory bandwidth, latencies etc which needs to be improved the more cores you have)

Depends on if FMA is utilized or not and that if only one or two threads run in a given module, I think. AFAIK the FADD and FMUL units in the K10 cores are capable of working (or starting/finishing) parallelly. With BD, with regular code you can't have the underlaying FADD and FMUL units utilized (or new execution started/finished) at once, in a given FMAC, unless you use FMA code. And you have only one FMAC per thread in case both threads needs them at once...

So, with a single-threaded (or one thread per module) regular code it will perform comparable to K10 (because the second FMAC can be utilized anytime), but if more than 4 threads are running scaling will be worse.

But, perhaps I'm wrong somewhere. Feel free to correct me, then.
10-06-2011, 03:12 AM
Opteron146

Quote:

Originally Posted by xdan

I am not against BD CMT design, but without stronger IPC it's just useless.

Well phrase the sentence a bit less dramatic and you are right. Low IPC does not make your CPU useless, but it obviously hinders your applications as long as they use less than 8 threads. There's turbo, but it can probably only help a bit.

In the end, we have a brand new design. Nobody has done CMT before. Just because it's first version is not "da über CPU" doesn't make the whole approach "useless".
They've already have a 2nd and 3rd version in the queue so let's see what will happen with IPC.

The very first P4 (Williamete, Socket 423) was really useless. Clock was still low around ~1.5GHz, and a P3 was always faster, not to mentiond AMD's K7. The 2nd version Northwood (S428, later versions with Hyperthreading) was actually quite good, SSE2 was used more often, too, but then the 3rd generation was Prescott aka Preshott. That was then really the time to pull the plug.

So far in my opinion, BD is much better than the first P4. Let's see how the story will end :)
10-06-2011, 03:18 AM
Evantaur

as long as it outperforms thuban i 'm happy :up:
10-06-2011, 04:01 AM
xdan

Quote:

Originally Posted by Leeghoofd

You sure ? got data to back that statement ?

I do not have data, but i have confirmation that it will be a "radiator for cold days" compared to SB, will need stronger cooling.
It's easy to understand that a 330mm^ will be hoter than a 216mm^ chip.
Think what you want, in a few days we will all see.
The adecvate slogan for BD will be "Long live the super tormented price/performance raport".
10-06-2011, 04:15 AM
xdan

Cooler on bios or in case? :)
Let's not forghet that Phenom and Thuban have the senzors on package not on die, the readings are not accurate, and so the normal temperature is by AMD at 62 degrees. http://products.amd.com/en-gb/Deskto...il.aspx?id=682
SB as Nehalem have intern senzors on die, the max TDP temp is 72.6.
Usually when a Phenom pass 70-75C starts trolling.

If BD uses same tipe of senzors, than the air in the case, and the radiator of the cooler indicates the real temperature.
10-06-2011, 04:21 AM
Apokalipse

Quote:

Originally Posted by xdan

I am not against BD CMT design, but without stronger IPC it's just useless.

You seem to forget that single threaded IPC != two threaded IPC in the same module.
the figure applies when both cores in the module are used, when compared to one core in the module being used.
You could compare one module to one hyper-threaded core. Both are sort of an "extended" core, designed to process two threads, getting a significant gain in multithreaded performance for a small increase in die area, although they do it in very different ways.
But there should be more gain with a BD module than with hyper-threading (which gets about 20% gain on average).

Of course we still don't know if BD has higher IPC than SB (with a single thread per core/module), but you can't rule it out.

Although you also have to keep in mind IPC is only part of the picture; it isn't the be-all end-all. For single threaded performance, you want a good combination of IPC and frequency.
Optimising your architecture for higher frequency isn't itself a bad idea, unless you simply ignore IPC (eg Netburst).
The opposite of that is focusing entirely on IPC and ignoring frequency; you don't want to do that either - increasing IPC can often require adding additional hardware/logic, which increases power consumption and die area etc.
10-06-2011, 04:29 AM
xdan

As i wrote earlier you have two cores at 80% performance, equal 160% core performance, not that much qain comparing to 120% Intel core +HT.
10-06-2011, 04:36 AM
Apokalipse

Quote:

Originally Posted by xdan

As i wrote earlier you have two cores at 80% performance, equal 160% core performance, not that much qain comparing to 120% Intel core +HT.

I think we'll have to wait for benches to see what figure is most accurate (if any of them are)
10-06-2011, 04:51 AM
FlanK3r

Quote:

Originally Posted by xdan

I do not have data, but i have confirmation that it will be a "radiator for cold days" compared to SB, will need stronger cooling.
It's easy to understand that a 330mm^ will be hoter than a 216mm^ chip.
Think what you want, in a few days we will all see.
The adecvate slogan for BD will be "Long live the super tormented price/performance raport".

So...how explain me the fact Thuban at 45nm with "big" die size is colder than most of CPUs (Sandy Bridge, Lynfields....)?:) No, your idea is wrong;). SB topped at more than 95 C in load with the same cooler and voltage at 1.46V, Thuban 60 C (in coretemp 50C!)....Difference between area sensors can not be so big.
10-06-2011, 05:02 AM
informal

@dess
This is what dresdenboy wrote on at forum regarding pipeline capability of flexfp:

Quote:

Originally Posted by Dresdenboy

According to the BD SOM, all 4 FP pipelines do integer SSE stuff with different capabilities:
Pipe 0: simd, mmx, multiplier
Pipe 1: shuffles, packs, permutes
Pipe 2: simd, mmx, ALU
Pipe 3: simd, mmx, ALU, store

And move ops are eliminated.
10-06-2011, 05:17 AM
flyck

Quote:

Originally Posted by xdan

I do not have data, but i have confirmation that it will be a "radiator for cold days" compared to SB, will need stronger cooling.
It's easy to understand that a 330mm^ will be hoter than a 216mm^ chip.
Think what you want, in a few days we will all see.
The adecvate slogan for BD will be "Long live the super tormented price/performance raport".

It is the reverse.

A big surface has an easier time transmitting all the energy and thus being cooler.
10-06-2011, 05:19 AM
Oese

I understand, however, single thread per module is 100% (of course, how can a single thread be 80%, compared to what? There is possibility that it would be even 110 - 120% compared to a hypothetical divided-in-half BD module with only one 128bit FMAC), second thread will scale with 80% (compared to Intel 20%), so there is 180% for two threads, equalling 90% per thread (as I think informal told it before).

"Calculated core count" will then be 8*0.9 = 7.2, which is not too far away from 8 cores...

This has nothing to do with IPC at all, only with scaling of the second thread in relation to the first that doesnt have to share anything (and thus must be set 100% - everything else would be strange logic).

As for the Power/Temps... I am rather optimistic?
10-06-2011, 05:25 AM
undone

http://www.reddit.com/r/hardware/com..._gonna_say_it/

Quote:

I'm sitting in on a press briefing for AMD Bulldozer right now, and while everything is embargoed, I will say this: If you're building a gaming PC, this is going to be the way to go.

Edit 1 We're gonna be covering the normal stuff (Benchmarks, etc.) but we're also going to talk about value proposition against Intel as well as some of the exciting new advancements that Bulldozer brings to the table. On October 12th, 12:01am CST.

Edit 2 "We" means Icrontic. I'm not trying to shill my site or anything; we do have a Bulldozer on the testbench, we sat in on a press briefing tonight, and we will have a launch-day piece about it. Of course, you'll also find reviews and other awesome content at [H], AnandTech, TechReport, and so on. Please consider us in your content rotation, we're a small but very, very dedicated team who have been doing this since 2000. Thanks!

EDIT:
http://www.hartware.net/media/news/52000/52945_2b.jpg

http://www.hartware.de/news_52945.html

Quote:

http://www.planet3dnow.de/vbulletin/...&postcount=668
10-06-2011, 05:35 AM
hydr0x

Quote:

Originally Posted by undone

I'm sitting in on a press briefing for AMD Bulldozer right now, and while everything is embargoed, I will say this: If you're building a gaming PC, this is going to be the way to go.

i really hope this guy is not a Troll.

roll on October 12th
10-06-2011, 05:43 AM
imamage

Quote:

Originally Posted by tbone8ty

results anybody? nda?

plenty of press kits around, give us a tease

From what I heard
Top-end model
DO come with Water-Cooling kit similar with Antec H920

But for those who overclock to the max , just ignore it ;)

EDIT : It looks like only available to Press Review Kit , Water cooling kit is not for Retail :(
10-06-2011, 05:43 AM
-Boris-

Quote:

Originally Posted by Apokalipse

Doubling the FPU's takes massively more die area, meaning more power usage, and you can't clock it as high if you want to remain within a certain TDP.
Floating point units are much more complex than integer units, and take up much more die area.
instruction sets like SSE, AVX use the FPU primarily.

You just repeated the things I questioned without any arguments. Each FPU takes 1% of total die are, for a total of 4% of an full 8c BD. Of course it needs other stuff as well in the front end, but even if you say that all that takes as much space as the FPUs themselves (which is absurd) there is still just 8% larger die. And Turbo is made for just these kind of things, so frequencies shouldn't be a problem. Besides, do you honestly think it would make such a large impact on frequencies? You can chop of half the power usage with lower current and a few hundred MHz lower clocks. Even if power usage rose by 25% (again absurd) it wouldn't mean to much in lost base frequencies, and probably close to nothing in max frequencies.

Quote:

Originally Posted by informal

You do realize that in order to get 8 *full* fpus the old way, you have to replicate front ends ,integer exec. units and L1 and L2 caches ,right? This leaves you with wasted and doubled die area that will mostly sit idle (especially fp unit). Beaty of bulldozer is exactly in maximizing perf./watt/mm^2. Btw ,the most power hungry part of the core is usually fpu...

No, you wouldn't need to duplicate most of the processor. SB has an full AVX unit per core, didn't need to duplicate most of the core to get that working. The same when Phenom got an 128bit FPU. Of course you need som extra circuits to make it work, but not more space than the entire FPU. And as I just said, the FPUs themselves eats up just 4% of a full module.
10-06-2011, 05:58 AM
PatRaceTin

6 day count
10-06-2011, 06:08 AM
informal

@Boris
SB cannot do 2 256 loads per cycle so its exec. potential is just theoretical. At best in avx tuned code,one can expect 20 to max 50 percent speedup. Oh and u shouldn't mix sse and avx instructions due to the way how intel implemented it. So SB has in no way ''better'' designed fpu compared to bd.
As for implementing double sized flexfp in bd vs current one,you have to realize that current one IS already beefed up version. In order to support 2x256 bit ops,the load/store capability and therefore complexity would have to be dramaticaly increased. Who needs such an fpu if your L/S system can't feed it(ah yes,intel made one:) )
10-06-2011, 06:26 AM
Opteron146

Quote:

Originally Posted by -Boris-

No, you wouldn't need to duplicate most of the processor. SB has an full AVX unit per core, didn't need to duplicate most of the core to get that working. The same when Phenom got an 128bit FPU. Of course you need som extra circuits to make it work, but not more space than the entire FPU. And as I just said, the FPUs themselves eats up just 4% of a full module.

Intel can reuse their INTcore datapaths for AVX (which is FP only), because INT&FP is tightly coupled in their design. AMD has the opposite approach, INT and FP are separated, already since the K7 days, this actually enabled the CMT approach. For Intel it would be rather impossible, they would need a totally new architecture. Well maybe Haswell will deliver that.
Anyways, back to BD: Because AMD's FPU is not tightly coupled, they would have needed much more space than Intel. If you compare K8/K10, you will see that K10's FPU is nearly double size. It is bit less than that, because it was upgraded from 80bit -> 128, not from 64-> 128.
However, FP code is generally not used very often. To combine now 2x128bit units, for one AVX256bit pass every cycle was definitely the best, smart and most efficient way.
10-06-2011, 06:48 AM
xdan

Quote:

Originally Posted by FlanK3r

So...how explain me the fact Thuban at 45nm with "big" die size is colder than most of CPUs (Sandy Bridge, Lynfields....)?:) No, your idea is wrong;). SB topped at more than 95 C in load with the same cooler and voltage at 1.46V, Thuban 60 C (in coretemp 50C!)....Difference between area sensors can not be so big.

Yes it is, the seonsors on AMD indicate a 15-20C lower temperature than it is.
Have you seen an Thuban working at 75-80C? No because than it already entered in throttle.

Quote:

It is the reverse.

A big surface has an easier time transmitting all the energy and thus being cooler.

Well BD at same surface at Thuban will have much more tranzistors in it.
What you say works in a closed case for a short time, but in a long time the air inside is getting hotter.
What i'm saying is that BD will need a strong cooler like Noctua DH-14.
10-06-2011, 06:54 AM
JF-AMD

Quote:

Originally Posted by xdan

I am not against BD CMT design, but without stronger IPC it's just useless.
So with CMT we have 80% performance of a true core. But the problem is that it's not a 100% performance core + 80% CMT core( comparing Intel + HT), it's 80% performance for both cores in module so...
If we calculate 0.8(80%) * 8 = 6.4 so 6.4 true cores performance, so a bigh hit. :down:
This desing it doesn't scale well the more cores you put.
If we put that the IPC isn't much better- may be the same, not to be pesimist to say lower, than wat we got?
A 6.4 cores with a 10% speed bump, may be a 6.8-7 true cores performance.
So what "maximizing perf./watt/mm^2" - not performance anyway,

I have my info, and BD it's a disappoiment. For an " 8core" . As it' price, overall performance is between 2500K and 2600K, and will be hoter on air cooling than SB.

Sorry, your math is not right.

And for the other guys talking about 256-bit AVX, here is an extenisve list of all of the client apps that I am aware of that will utilize 256-bit AVX (please update if you know of some):
10-06-2011, 06:56 AM
Leeghoofd

Quote:

Originally Posted by undone

If you're building a gaming PC, this is going to be the way to go.

Really looking forward to your test game results...
10-06-2011, 07:13 AM
liberato87

http://semiaccurate.com/forums/showp...&postcount=118

Quote:

Originally Posted by dahakon

Some Dutch magazine had a review online premature accidentally.

http://translate.google.nl/translate...x-8150&act=url

Google Cache is our friend, translate too.

One point of interest: Cinebench 11.5 does not give a 5.xx score for BD FX-8150
10-06-2011, 07:28 AM
Manicdan

Quote:

Originally Posted by xdan

Yes it is, the seonsors on AMD indicate a 15-20C lower temperature than it is.
Have you seen an Thuban working at 75-80C? No because than it already entered in throttle.

Well BD at same surface at Thuban will have much more tranzistors in it.
What you say works in a closed case for a short time, but in a long time the air inside is getting hotter.
What i'm saying is that BD will need a strong cooler like Noctua DH-14.

the best sensor to test with is in a WC loop, make it small and use the water to see how much heat is coming out of the cpu and into the water, if all variables are the same besides the cpu switch, you can find the C/W ratio for each.

also ive had a Deneb chip boil water in my loop before due to the pump failing. the throttle point is ~90C for the MB sensor and ~60C for the internal sensor.
10-06-2011, 07:52 AM
Olivon

Quote:

Originally Posted by liberato87

http://semiaccurate.com/forums/showp...&postcount=118

Cinebench 10 ST
FX-8150 : 4074
2500k/2600K : 5800
i7-965 : 4900

Cinebench 10 MT
FX-8150 : 20615
2500k : 18615
2600k : 22615

Cinebench 11.5 MT
FX-8150 : 6.01
2500k : 5.37
2600K : 6.75
i7-965 : 5.73

3DMark Vantage CPU Score :
FX-8150 : 19119
2600K : 22500

3DMark Vantage Total Score :
FX-8150 : 21949
2600K : 25500

3DMark 11 Total Score :
FX-8150 : 6616
2600K/i7 965 : 7385

Dirt 3
FX-8150 : 105avg/75min
i7-965 : 93avg/71min

Mafia II
FX-8150 : 68.3 avg
i7-965 : 76 avg

Far Cry II
FX-8150 : 111avg/23min
i7-965 : 126avg/75min
10-06-2011, 08:06 AM
flyck

Quote:

Originally Posted by liberato87

http://semiaccurate.com/forums/showp...&postcount=118

As i already mentionned in that thread, a difference of 0,06 is nothing in C11,5.

More importantly is that BD doesn't perform any better than the 1100T in single threaded C10!! Thats with a 500MHz clockspeed advantage...

The FX4100 (3,6-3,8Ghz) will be slower than the current Deneb lineup by the looks of the dutch review.
10-06-2011, 08:11 AM
crazydiamond

Did review oc ? Nb + ram oc?
10-06-2011, 08:11 AM
xdan

Quote:

Originally Posted by JF-AMD

Sorry, your math is not right.

And for the other guys talking about 256-bit AVX, here is an extenisve list of all of the client apps that I am aware of that will utilize 256-bit AVX (please update if you know of some):

Oh, really? You say so... :down:
Probably it's more like 0.75*8 or so.
You just keep saying lies, here, i wonder what you will say on 12 octomber.
The numbers in the post written by Olivon are correct, let say 5% more or less.
The sad thing is that because BD is more or less a fail Piledriver will be to.
And so we are finished with AMD until 2013 when 3 generation BD arrives.

The even more stupid thing is that Thuban 8 core design, more L3 cache, faster IMC, speed like BD, would have done it better i think , in same die size, same overclocking capabilities( i mean all Thubans can do 4.2-4.3Ghz 24/7 on 45nm, on 32nm would have do 4.5-4.7ghz 24.7), and may be even with better yelds than BD. Llano is an exception because it's APU.
10-06-2011, 08:13 AM
Dimitriman

Quote:

Originally Posted by Olivon

Cinebench 10 ST
FX-8150 : 4074
2500k/2600K : 5800
i7-965 : 4900

Cinebench 10 MT
FX-8150 : 20615
2500k : 18615
2600k : 22615

Cinebench 11.5 MT
FX-8150 : 6.01
2500k : 5.37
2600K : 6.75
i7-965 : 5.73

3DMark Vantage CPU Score :
FX-8150 : 19119
2600K : 22500

3DMark Vantage Total Score :
FX-8150 : 21949
2600K : 25500

3DMark 11 Total Score :
FX-8150 : 6616
2600K/i7 965 : 7385

Dirt 3
FX-8150 : 105avg/75min
i7-965 : 93avg/71min

Mafia II
FX-8150 : 68.3 avg
i7-965 : 76 avg

Far Cry II
FX-8150 : 111avg/23min
i7-965 : 126avg/75min

Well that kinda sucks? :shakes:

But the post above mine is wording of trolls imo.
10-06-2011, 08:29 AM
SEA

Quote:

Originally Posted by xdan

Oh, really? You say so... :down:
Probably it's more like 0.75*8 or so.

If you took 0.8 from cinebench results - you forgot the turbo frequency impact when calculating multiprocessor speedup.

Quote:

i mean all Thubans can do 4.2-4.3Ghz 24/7 on 45nm, on 32nm would have do 4.5-4.7ghz 24.7

Stop spreading bullsh1t... I have thuban ;)
10-06-2011, 08:54 AM
xdan

Quote:

Originally Posted by Dimitriman

Well that kinda sucks? :shakes:

But the post above mine is wording of trolls imo.

You say so :). Well it will be hard times for AMD fans to accept that they were lied all the year and that BD is a fail.
Many people defended and made excuses for BD all the summer.
JF AMD keep giving false hopes. Nobody had the guts to tell the truth.

I am let's say more of am Intel fan. But i really want BD to crush a little SB to have something new on market, to have lower prices from Intel.
Intel can because of that release cpu's whenever he wants, what he wants, at what price wants.
We can all say thank you to AMD to their "strong competition".

Quote:

If you took 0.8 from cinebench results - you forgot the turbo frequency impact when calculating multiprocessor speedup.

I was talking about overall performance, without TURBO wich anyway doesn't count in all multithread aplications.
10-06-2011, 09:07 AM
informal

Looking at those c10 and c11.5 numbers from 8150 and 1100T,all i want to know is how in the world is interlagos with same or less clockspeed going to have 35% higher throughput in legacy fp code?! AMD claims it can do 50% more SP flops then MC,even in legacy code. With what magic?
10-06-2011, 09:10 AM
radaja

Quote:

Originally Posted by Leeghoofd

Snip

hey Lee,have you gotten it 8 cores stable at ?GHz yet? and are you using an air cooler? i know its only been a few hours
but im very curious as to how it goes:D
10-06-2011, 09:28 AM
radaja

Quote:

Originally Posted by Leeghoofd

Snip

thanks Lee for the quick update,it looks like BD will be so much fun and plenty powerful,and we will have good time ahead:D
10-06-2011, 09:54 AM
PerryR

Quote:

Well it will be hard times for AMD fans to accept that they were lied all the year and that BD is a fail.

How is BD a "fail?"

Quote:

I am let's say more of am Intel fan.

Big surprise there.
10-06-2011, 10:15 AM
EniGmA1987

Quote:

Originally Posted by xdan

The even more stupid thing is that Thuban 8 core design, more L3 cache, faster IMC, speed like BD, would have done it better i think , in same die size, same overclocking capabilities

lol wut? That is just so funny to me. How the heck do you toss in 2 more cores and more L3 and come up with the same size? And then you want more core speed and NB speed on top of that added complexity? I suppose you want a ruduced TDP to top it all off too amirite? Ill just get right on that. lol
10-06-2011, 10:17 AM
PerryR

Also, more stuff from that other leak:

http://www.reddit.com/r/hardware/com..._gonna_say_it/

Quote:

Vithren 1 point 4 hours ago
Do tell, are all the leaks we have seen so far simply a part of a one, gigantic AMD fud campaign?

Quote:

primesuspect

No, they're sites who are capitalizing on pure rumor and hype traffic

(Sigh) Just six more days.
10-06-2011, 11:01 AM
xdan

Quote:

Originally Posted by EniGmA1987

lol wut? That is just so funny to me. How the heck do you toss in 2 more cores and more L3 and come up with the same size? And then you want more core speed and NB speed on top of that added complexity? I suppose you want a ruduced TDP to top it all off too amirite? Ill just get right on that. lol

Thuban 6cores has 346mm^ and a TDP of 125W but on 45nm.
On 32nm should have 240-260mm^, see Lynnfield 296mm^(45nm) -> and SB 216mm^ with IGP(32nm).
So it's quite posible that a Thuban with 8 cores and let sau 8MB L3 cache + 8MB L2 cache on 32nm to have 330-346mm^.
And the TDP why should be biger if the die size is the same, and may be the number of the tranzistors would be the same.
And if i remeber AMD launched a Phenom X4 960/965 at 140W TDP so what is the problem. Next revision will fix it.
The performance is more important.

Quote:

How is BD a "fail?"

Because an architecture of cpu's waited for 3-4 years, fails to beat the mainstream of Intel.
They are no threat to even 2010 Intel hexa cores and now it's soon 2012.
Because AMD remains again in the back.
Because they had the performance of SB from january or even earlier, they delay 3-4 months and they couldn't do anything to improve much more the performance to at least equal SB 2600K.
Because marketing BD as an 8core is just lame to be equal to an intel quad.
I would have been less harsh if they would have called a quad with 8 threads.

Anyway i'm waisting my time trying to convince some hard AMD fans.
When IB cames, all FX 8XXX will fall under 200$, as Thuban when SB appeared.
So, we will be back with two generation as usually.
10-06-2011, 11:09 AM
flyck

Quote:

Originally Posted by xdan

Thuban 6cores has 346mm^ and a TDP of 125W but on 45nm.
On 32nm should have 240-260mm^, see Lynnfield 296mm^(45nm) -> and SB 216mm^ with IGP(32nm).
So it's quite posible that a Thuban with 8 cores and let sau 8MB L3 cache + 8MB L2 cache on 32nm to have 330-346mm^.
And the TDP why should be biger if the die size is the same, and may be the number of the tranzistors would be the same.

With their current 32nm they wouldn't be able to run 6 llano cores at 3GHz and stay under the 100W while having a decent yield.... Not sure how BD does, but the process issue also affect BD (less than llano). The fact that BD is competitive with intel fastest at the moment is alot more than what they had or what an hypotetical 8core llano would be able to do with the state of their process...
10-06-2011, 11:24 AM
xdan

Why you just put Llano in comparision, Llano has 40% of the die GPU that's why it has that TDP, not to mention that doing GPU on SOI was wery hard. Llano problems will be much lighter on a cpu design without GPU.
http://lab501.ro/wp-content/uploads/...ze-580x241.jpg
10-06-2011, 11:28 AM
Dimitriman

Quote:

Originally Posted by xdan

You say so :). Well it will be hard times for AMD fans to accept that they were lied all the year and that BD is a fail.
Many people defended and made excuses for BD all the summer.
JF AMD keep giving false hopes. Nobody had the guts to tell the truth.

I am let's say more of am Intel fan. But i really want BD to crush a little SB to have something new on market, to have lower prices from Intel.
Intel can because of that release cpu's whenever he wants, what he wants, at what price wants.
We can all say thank you to AMD to their "strong competition".
I was talking about overall performance, without TURBO wich anyway doesn't count in all multithread aplications.

I am pretty sure I don't need to explain how so many of your arguments are purely trying to stirr up some brown mud.

But anyway, myself like many in here are not in this thread to suck up to AMD regardless of how bad/good their product is but we are actually excited that they are putting something new in the market and we are looking at it with a critical eye.

I'm excited for Bulldozer, doesn't mean I am going to buy it. My money goes where performance is higher for my budget.

I suppose many will be dissapointed if Bulldozer won't beat i7 2600k but calling it a complete fail and making wild claims about bad future performance and what ifs from old processors as if they are facts, they are not facts, its your opinion. Bulldozer will be a fail for someone with a 3000$ budget, but if you are looking for an i5 2500k system, you will not be able to avoid comparing it to BD, and the later might end up a little bit better bang for buck.

All is relative.
10-06-2011, 11:30 AM
Manicdan

Quote:

Originally Posted by flyck

With their current 32nm they wouldn't be able to run 6 llano cores at 3GHz and stay under the 100W while having a decent yield.... Not sure how BD does, but the process issue also affect BD (less than llano). The fact that BD is competitive with intel fastest at the moment is alot more than what they had or what an hypotetical 8core llano would be able to do with the state of their process...

i just imagine what would happen if they took 2 Llano chips and connected them together. 8 cores, dual gpu, and can run in less than 140W if they dont go all out. but also make them unlocked it could be quite a fun all-in-one chip for a not so insane price. but that also gives a pretty good idea of the clock limitations of stars cores. id also be willing to bet that overclocking such a chip would kill any motherboards VRMs. its quite clear the old architecture is getting too old. but i fear the IPC of BD is going to feel old way too quickly.
10-06-2011, 11:32 AM
flyck

Quote:

Originally Posted by xdan

Why you just put Llano in comparision, Llano has 40% of the die GPU that's why it has that TDP, not to mention that doing GPU on SOI was wery hard. Llano problems will be much lighter on a cpu design without GPU.
http://lab501.ro/wp-content/uploads/...ze-580x241.jpg

Because it is the Cpu that consumes the power budget, not the gpu. The gpu is actually very clean and extremely efficient. (it is a complete marvell... far exceeding the efficieny of SB or any other gpu we know at the moment). They need the high voltages to get yields on the cpu, not the gpu. While it probably would do better without the gpu in the yield department. BD is also suffering issues on the 32nm node. So doubling the llano cores, adding fast l3cache will explode on the current process.... Currently having 50W for 4cores@2,6GHz with proper yields is pushing it for llano.... try double that, add cache and 1,5Ghz and see where that would get you. (most likely to a nuclair generator as power supply..).

I am not talking about the possibilities on a good working process, because that would affect BD also in a positive way.
10-06-2011, 12:03 PM
FlanK3r

wow, if is it right, 5 GHz with only 1.45V...! With a bit luck I could get 5.2 GHz 1.5V :)
10-06-2011, 12:19 PM
Dr. Vodka

Talking about 24/7, how does GF's 32nm process cope with voltage? I see llano APUs everywhere at 3.6 ghz and north of 1.4v, near 1.5v for those clocks... anyway, BD will be made on the same process, how durable would that be? 5+ ghz on air is cool and a nice sign too, but is it realistic for 24/7 use at such high voltages, even with under control temps? Dunno, 1.5v seems too high for that process... yeah, AMD's (now GF) SOI 45nm is a tank when coping with voltage, but what about their 32nm? Any ideas on this based on current llano chips?

Having said this, things are looking good, really good. 6 more days! I am already impressed. 4.5 ghz 24/7 at reasonable voltages seem to be completely possible! It games well! You could just take my money now AMD :D
10-06-2011, 12:26 PM
Baam

Gaming benchmarks look pretty good and to me personally that's all
I really care about. Can't wait for BD!! :yepp:
10-06-2011, 12:38 PM
hydr0x

6 days. just 6 more days... oh someone let me pre-order it so i can do other things with my life!
10-06-2011, 12:41 PM
-Boris-

Quote:

Originally Posted by flyck

Because it is the Cpu that consumes the power budget, not the gpu. The gpu is actually very clean and extremely efficient. (it is a complete marvell... far exceeding the efficieny of SB or any other gpu we know at the moment). They need the high voltages to get yields on the cpu, not the gpu. While it probably would do better without the gpu in the yield department. BD is also suffering issues on the 32nm node. So doubling the llano cores, adding fast l3cache will explode on the current process.... Currently having 50W for 4cores@2,6GHz with proper yields is pushing it for llano.... try double that, add cache and 1,5Ghz and see where that would get you. (most likely to a nuclair generator as power supply..).

I am not talking about the possibilities on a good working process, because that would affect BD also in a positive way.

How do you know that it's the CPU and not the GPU that consumes the power budget? Besides, the silicon and design could be limited by the compatibility with the GPU.

A Phenom II X6 at 32nm would be almost half the size of a Thuban if caches scales as good as cores when shrinking the process. BD must significantly outperform Thuban to justify this change in architecture.
10-06-2011, 12:43 PM
Manicdan

Quote:

Originally Posted by -Boris-

How do you know that it's the CPU and not the GPU that consumes the power budget? Besides, the silicon and design could be limited by the compatibility with the GPU.

A Phenom II X6 at 32nm would be almost half the size of a Thuban if caches scales as good as cores when shrinking the process. BD must significantly outperform Thuban to justify this change in architecture.

run a cpu only benchmark
run a gpu only benchmark
run both

the power increase running a gpu only benchmark is like 20w increase, the cpu only is like 60w increase, and both is like 65w increase
these numbers are from memory and not to be considered accurate.
10-06-2011, 12:45 PM
dess

Quote:

Originally Posted by informal

@dess
This is what dresdenboy wrote on at forum regarding pipeline capability of flexfp:

Well, there is a similar one in the Opt. Guide - one that also shows one FMAC unit on Pipe 0 and one on Pipe 1, so no separate pipes (and ports) for the FMUL and FADD units in a given FMAC. It means independent FADD and FMUL operations cannot be started per cycle per FMAC. JF also wrote in a blog that it's FADD or FMUL or FMA, not (FADD and FMUL) or FMA.

AFAIK K10's FPU is capable of it and SB definitely can do it. I don't know how much it impacts performance, though.
10-06-2011, 12:49 PM
radaja

Quote:

Originally Posted by Leeghoofd

Snip

nice,looking forward to your NB findings.BD is going to be a fun OCing chip for sure:D

Show 100 post(s) from this thread on one page

All times are GMT -8. The time now is 06:41 PM.

XtremeSystems