unless the 6100 unlocks to be 8100, and unless 8150 is lower voltage than 8120, everybody is just going to get 8120 :)
15% extra money is worth 30% extra cores. and higher stock speed is not worth $40 if they both overclock the exact same.
Printable View
unless the 6100 unlocks to be 8100, and unless 8150 is lower voltage than 8120, everybody is just going to get 8120 :)
15% extra money is worth 30% extra cores. and higher stock speed is not worth $40 if they both overclock the exact same.
Well the 945 was later in the C3 revision a plain, normal 95W part. The first 940/945 models with 125W were still C2. Nothing exciting about that, that's the normal process/stepping optimizing benefit. If you want to compare it to FX, then my point would be that the 8120 95W model will be scarce as hen's teeth in etail, because it will be an OEM only model and etail=we have to live with the 125W part or the 8100 95W model, instead. Then next year, when they'll launch 8170, there's maybe enough 95W 8120 for everybody. We'll see...
With the previous launches (agena/thuban) there was a notable difference in overclocking between the top and lower models. And with the rumored somewhat limited production of BD, I dont think it too far fetched to assume the top model will be the one to get for max clocks this time around as well.
And with 8 cores, the chance of getting a crap core or two is not exactly smaller than before. So you'd assume binning plays and even bigger role than before. But yea, we'll see soon enough :D
Anyone heard about Windows 8 seeming to steer the cores better than windows 7 ? Current Task scheduler seems to mess up a bit the performance of our beloved BD... Looking forward to fire it up tonite...
I think we should call Zambezi 8150 an 8 threaded CPU. This way we avoid any core vs not-a-core arguments. It has 8 strong threads,according to AMD,so let's call it 8T capable CPU. Thuban is 6T chip,SB 2600k is 8T chip. Whether it's a weak or strong thread is debatable though.
On another note,we have some shops listing FX models and mobo and FX bundles. So it seems 12th is the date. Won't be long now :).
oh, I like this package....
Windows itself doesn't need to be aware of FMA4 and XOP for developers to use them of course. A Windows patch would only address what Windows itself uses which can increase core Windows performance.
--
Can we not get into the core vs thread thing again?
No, it was needed for AVX because it introduced new/wider registers. The OS has to save these between context switches now, too. Think about what would happen if the OS forgets to save your data ... *g*
XOP and FMA4 however do not introduce new registers, they use the SSE or AVX' registers, therefore no extra patch needed - as long as the AVX patch is in place everything is fine.
i see bundle package in extreme news section
Hmm exactly what I meant.
You wrote "improve with time", that's correct, hence I wrote "next year":
I don't see a problem here, only a misunderstanding ;-)Quote:
Then next year, when they'll launch 8170, there's maybe enough 95W 8120 for everybody. We'll see...
As long as these cores are only using good,old, standard-x86 registers, no problem at all ;-)
Look at the production date of that chip. "1136" that is september right?
I know, but it could be that perhaps some state attributes needs to be stored. Can't find where I've read about it.
Regarding task-scheduling, it's from JF-AMD:
Also, there were some slides on how Windows' scheduler needs to be changed to accomodate to BD, and IIRC it was about Win7. Can't find it now, either.Quote:
Performance is based on:
The silicon
The microcode in the silicon
The BIOS
The compiler updates
The drivers
The OS optimizations
Performance tuning by engineers
Some other thing is more weird and surprising:
http://semiaccurate.com/forums/showp...5&postcount=13
A1 chips production underwayed in September? wtf?Quote:
stepping A1, week 36, year 2011.
I´m not sure what "FA1" stands for but as long as I remember the steppings always were indicated by one of the letters here "FD8150FRW8KGU". For Example:
Phenom II X4 955 C2 stepping HDZ955FBK4DGI
Phenom II X4 955 C3 stepping HDZ955FBK4DGM
A1 is not the stepping unless they changed the naming scheme...
CACDC for Deneb translates FA1 on these new processors...
whats the difference between box and try? the 4100 for 121$ sounds very desirable. but just seems a little strange looking
Someone guess it's a copy-paste typo.
http://www.planet3dnow.de/vbulletin/...&postcount=625
yeah i was expecting something like that. the 189$ should be 4C, and the 121$ should be an x4 non FX model cpu, for it all to make sense
I get concerned when I see people touting review units that only include one chip.
there are 4 chips to review. if we get only one chip reviewed on the 12th I will be smashing things.
start smashing then, till now I only know about 8150 models being shipped in press kits...
Actually they officially said (in presentations) 80% of CMP design which was presumably CMP-type Bulldozer with nothing shared(except maybe L3). But we have been over this before.
I suspect the biggest hit will be running fp heavy code and that the 80% figure comes from that. It's logical when you think about it : instead of replicating "full" cores in order to get 8 FPUs,you invest in each FPU more resources,increase the BW to the unit and make it shareable between 2 integer cores.In the process you make the unit in the way so that it uses SMT for 2 threads running on 2 dedicated pieces of hardware inside it. This way you have 4 new FPUs ,now shared, that produce only 25% less throughput than 8 "full" ones in CMP (without SMT probably) and all this saves you considerable die area and grants you some TDP and clock headroom. Pretty neat idea isn't it? :)
look at all those goodies that come with it :D
hmmm some got them early(scroll down to last part)
http://www.tipidpc.com/viewtopic.php...2787&page=4183
results anybody? nda?
plenty of press kits around, give us a tease
You know, I'm really excited that FX is so close now.... :clap:
Reguardless of final performance compared to Intel, I can't help but think that once we get our hands on these
chips all the crazy fud/benchies are going to seem ridiculous....
I've been reading all this stuff for the last 9 months, and you wouldn't believe how bad I've been biting my tounge. :rolleyes:
Some may have been right, some may have been wrong, but once I (we) can test for ourselves all the questions will finally be answered! :up:
I'm sure there's some Firmware/Software/Hardware/OS tweaks that need to be done to get the best results from this new uARCH, but at least
it will finally be out there and worked on.
BRING'EM ON BABY..... :D
If nothing else, I need a new adventure, and this chip looks like fun!
CMP (Chip Multi Processor) is two "full" cores, and CMT (Cluster-based MultiThreading) is what they call the modules idea:
http://data5.blog.de/media/732/3663732_9bc35365d1_l.png
That's the slide I was referencing, where they said 80% gain
Although in retrospect it also says 50% area investment, so I'm not sure if that exactly describes the actual BD modules used in Zambezi, which AMD said have 12% larger die area than a "full" core (hypothetical BD "full" core, not K10.5).
Bulldozer's FlexFP:
http://blogs.amd.com/work/2010/10/25/the-new-flex-fp/
Basically it is two 128-bit FMAC's with a shared scheduler, which works alongside two integer cores.
It looks like it almost has enough FP resources to get the same performance as two "full" cores, the exception being if two 256-bit instructions were issued at once - though the capability to do that requires much more (largely unused) die area.Quote:
The Flex FP unit is built on two 128-bit FMAC units. The FMAC building blocks are quite robust on their own. Each FMAC can do an FMAC, FADD or a FMUL per cycle. When you compare that competitive solutions that can only do an FADD on their single FADD pipe or an FMUL on their single FMUL pipe, you start to see the power of the Flex FP – whether 128-bit or 256-bit, there is flexibility for your technical applications. With FMAC, the multiplication or addition commands don’t start to stack up like a standard FMUL or FADD; there is flexibility to handle either math on either unit. Here are some additional benefits:
- Non-destructive DEST via FMA4 support (which helps reduce register pressure)
- Higher accuracy (via elimination of intermediate round step)
- Can accommodate FMUL OR FADD ops (if an app is FADD limited, then both FMACs can do FADDs, etc), which is a huge benefit
The new AES instructions allow hardware to accelerate the large base of applications that use this type of standard encryption (FIPS 197). The “Bulldozer” Flex FP is able to execute these instructions, which operate on 16 Bytes at a time, at a rate of 1 per cycle, which provides 2X more bandwidth than current offerings.
By having a shared Flex FP the power budget for the processor is held down. This allows us to add more integer cores into the same power budget. By sharing FP resources (that are often idle in any given cycle) we can add more integer execution resources (which are more often busy with commands waiting in line). In fact, the Flex FP is designed to reduce its active idle power consumption to a mere 2% of its peak power consumption.
The Flex FP gives you the best of both worlds: performance where you need it yet smart enough to save power when you don’t need it.
The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, OR each of the integer cores can execute 128-bit commands simultaneously. This is not something hard coded in the BIOS or in the application; it can change with each processor cycle to meet the needs at that moment. When you consider that most of the time servers are executing integer commands, this means that if a set of FP commands need to be dispatched, there is probably a high likelihood that only one core needs to do this, so it has all 256-bit to schedule.
Floating point operations typically have longer latencies so their utilization is typically much lower; two threads are able to easily interleave with minimal performance impact. So the idea of sharing doesn’t necessarily present a dramatic trade-off because of the types of operations being handled.
Here are the 4 likely scenarios for each cycle:
https://sites.google.com/site/apokalipse/FlexFP.png
So I would think two thread scaling (in one module) is largely a matter of the shared front-end's capability to feed the execution resources (as well as memory bandwidth, latencies etc which needs to be improved the more cores you have)
Only neat if you get more than 25% higher frequencies. I doubt that doubling the FPUs would make a big hit on frequencies. The FPUs count for a very small part of total die area so saving mm² isn't worth it. And I doubt higher power consumtion would would lower the frequencies much.
Doubling the FPU's takes massively more die area, meaning more power usage, and you can't clock it as high if you want to remain within a certain TDP.
Floating point units are much more complex than integer units, and take up much more die area.Quote:
The FPUs count for a very small part of total die area
instruction sets like SSE, AVX use the FPU primarily.
@Apokalipse
The slide about cmt is from 2005,long before amd had any real HW im their hands. I stick with what they said at FAD 2010 and that's 80% of cmp design in less die area. The rest of the stuff you quoted is well known information and doesn't go against what I wrote. I even believe there won't be massive hit in integer throughput from running 2 threads on a module. Fp may see the best numbers if threads are 1st scheduled on different modules but this has to be verified.
@Boris
You do realize that in order to get 8 *full* fpus the old way, you have to replicate front ends ,integer exec. units and L1 and L2 caches ,right? This leaves you with wasted and doubled die area that will mostly sit idle (especially fp unit). Beaty of bulldozer is exactly in maximizing perf./watt/mm^2. Btw ,the most power hungry part of the core is usually fpu...
My point is that I don't think the FlexFP will much less performance than two conventional "full" 256-bit FPU's if the frontend can do its job and keep the execution resources fed.
I think the only case where it is limited in execution resources is if there are two 256-bit instructions from two threads at once, but that's a very rare case.
So yes it won't be as fast as two "full" cores. I'm just saying that I don't think available execution resources is the main reason for this (for either integer or FP). The FlexFP looks very efficient and much less transistor/die-area wasteful than two conventional 256-bit FPU's in two "full" cores.
The frontend is very much beefed up vs K10.5 though; which it has to be to feed the extra execution resources for two threads.
I am not against BD CMT design, but without stronger IPC it's just useless.
So with CMT we have 80% performance of a true core. But the problem is that it's not a 100% performance core + 80% CMT core( comparing Intel + HT), it's 80% performance for both cores in module so...
If we calculate 0.8(80%) * 8 = 6.4 so 6.4 true cores performance, so a bigh hit. :down:
This desing it doesn't scale well the more cores you put.
If we put that the IPC isn't much better- may be the same, not to be pesimist to say lower, than wat we got?
A 6.4 cores with a 10% speed bump, may be a 6.8-7 true cores performance.
So what "maximizing perf./watt/mm^2" - not performance anyway,
I have my info, and BD it's a disappoiment. For an " 8core" . As it' price, overall performance is between 2500K and 2600K, and will be hoter on air cooling than SB.
You have a point, but I think Cinebench uses mostly scalar maths, utilizing only the 1/4 or 1/2 of the 128 bit wide engines (depending on that if it uses single or double precision).
Also, it doesn't use FMA, so the underlaying FADD and FMUL units in an FMAC never work at once (or at least only one execution starts, per cycle).
0.5 x 0.5 = 0.25 -> 1/4 FPU utilization/thread (with MT)
0.25 x 0.5 = 0.125 -> 1/8 FPU utilization/thread (with MT)
Of course, it's quite theoretical as the sharing of the FPU is not exactly 50% per thread per module all the time, and these are the peak values.
Depends on if FMA is utilized or not and that if only one or two threads run in a given module, I think. AFAIK the FADD and FMUL units in the K10 cores are capable of working (or starting/finishing) parallelly. With BD, with regular code you can't have the underlaying FADD and FMUL units utilized (or new execution started/finished) at once, in a given FMAC, unless you use FMA code. And you have only one FMAC per thread in case both threads needs them at once...
So, with a single-threaded (or one thread per module) regular code it will perform comparable to K10 (because the second FMAC can be utilized anytime), but if more than 4 threads are running scaling will be worse.
But, perhaps I'm wrong somewhere. Feel free to correct me, then.
Well phrase the sentence a bit less dramatic and you are right. Low IPC does not make your CPU useless, but it obviously hinders your applications as long as they use less than 8 threads. There's turbo, but it can probably only help a bit.
In the end, we have a brand new design. Nobody has done CMT before. Just because it's first version is not "da über CPU" doesn't make the whole approach "useless".
They've already have a 2nd and 3rd version in the queue so let's see what will happen with IPC.
The very first P4 (Williamete, Socket 423) was really useless. Clock was still low around ~1.5GHz, and a P3 was always faster, not to mentiond AMD's K7. The 2nd version Northwood (S428, later versions with Hyperthreading) was actually quite good, SSE2 was used more often, too, but then the 3rd generation was Prescott aka Preshott. That was then really the time to pull the plug.
So far in my opinion, BD is much better than the first P4. Let's see how the story will end :)
as long as it outperforms thuban i 'm happy :up:
I do not have data, but i have confirmation that it will be a "radiator for cold days" compared to SB, will need stronger cooling.
It's easy to understand that a 330mm^ will be hoter than a 216mm^ chip.
Think what you want, in a few days we will all see.
The adecvate slogan for BD will be "Long live the super tormented price/performance raport".
Cooler on bios or in case? :)
Let's not forghet that Phenom and Thuban have the senzors on package not on die, the readings are not accurate, and so the normal temperature is by AMD at 62 degrees. http://products.amd.com/en-gb/Deskto...il.aspx?id=682
SB as Nehalem have intern senzors on die, the max TDP temp is 72.6.
Usually when a Phenom pass 70-75C starts trolling.
If BD uses same tipe of senzors, than the air in the case, and the radiator of the cooler indicates the real temperature.
You seem to forget that single threaded IPC != two threaded IPC in the same module.
the figure applies when both cores in the module are used, when compared to one core in the module being used.
You could compare one module to one hyper-threaded core. Both are sort of an "extended" core, designed to process two threads, getting a significant gain in multithreaded performance for a small increase in die area, although they do it in very different ways.
But there should be more gain with a BD module than with hyper-threading (which gets about 20% gain on average).
Of course we still don't know if BD has higher IPC than SB (with a single thread per core/module), but you can't rule it out.
Although you also have to keep in mind IPC is only part of the picture; it isn't the be-all end-all. For single threaded performance, you want a good combination of IPC and frequency.
Optimising your architecture for higher frequency isn't itself a bad idea, unless you simply ignore IPC (eg Netburst).
The opposite of that is focusing entirely on IPC and ignoring frequency; you don't want to do that either - increasing IPC can often require adding additional hardware/logic, which increases power consumption and die area etc.
As i wrote earlier you have two cores at 80% performance, equal 160% core performance, not that much qain comparing to 120% Intel core +HT.
So...how explain me the fact Thuban at 45nm with "big" die size is colder than most of CPUs (Sandy Bridge, Lynfields....)?:) No, your idea is wrong;). SB topped at more than 95 C in load with the same cooler and voltage at 1.46V, Thuban 60 C (in coretemp 50C!)....Difference between area sensors can not be so big.
I understand, however, single thread per module is 100% (of course, how can a single thread be 80%, compared to what? There is possibility that it would be even 110 - 120% compared to a hypothetical divided-in-half BD module with only one 128bit FMAC), second thread will scale with 80% (compared to Intel 20%), so there is 180% for two threads, equalling 90% per thread (as I think informal told it before).
"Calculated core count" will then be 8*0.9 = 7.2, which is not too far away from 8 cores...
This has nothing to do with IPC at all, only with scaling of the second thread in relation to the first that doesnt have to share anything (and thus must be set 100% - everything else would be strange logic).
As for the Power/Temps... I am rather optimistic?
http://www.reddit.com/r/hardware/com..._gonna_say_it/
EDIT:Quote:
I'm sitting in on a press briefing for AMD Bulldozer right now, and while everything is embargoed, I will say this: If you're building a gaming PC, this is going to be the way to go.
Edit 1 We're gonna be covering the normal stuff (Benchmarks, etc.) but we're also going to talk about value proposition against Intel as well as some of the exciting new advancements that Bulldozer brings to the table. On October 12th, 12:01am CST.
Edit 2 "We" means Icrontic. I'm not trying to shill my site or anything; we do have a Bulldozer on the testbench, we sat in on a press briefing tonight, and we will have a launch-day piece about it. Of course, you'll also find reviews and other awesome content at [H], AnandTech, TechReport, and so on. Please consider us in your content rotation, we're a small but very, very dedicated team who have been doing this since 2000. Thanks!
http://www.hartware.net/media/news/52000/52945_2b.jpg
http://www.hartware.de/news_52945.html
You just repeated the things I questioned without any arguments. Each FPU takes 1% of total die are, for a total of 4% of an full 8c BD. Of course it needs other stuff as well in the front end, but even if you say that all that takes as much space as the FPUs themselves (which is absurd) there is still just 8% larger die. And Turbo is made for just these kind of things, so frequencies shouldn't be a problem. Besides, do you honestly think it would make such a large impact on frequencies? You can chop of half the power usage with lower current and a few hundred MHz lower clocks. Even if power usage rose by 25% (again absurd) it wouldn't mean to much in lost base frequencies, and probably close to nothing in max frequencies.
No, you wouldn't need to duplicate most of the processor. SB has an full AVX unit per core, didn't need to duplicate most of the core to get that working. The same when Phenom got an 128bit FPU. Of course you need som extra circuits to make it work, but not more space than the entire FPU. And as I just said, the FPUs themselves eats up just 4% of a full module.
6 day count
@Boris
SB cannot do 2 256 loads per cycle so its exec. potential is just theoretical. At best in avx tuned code,one can expect 20 to max 50 percent speedup. Oh and u shouldn't mix sse and avx instructions due to the way how intel implemented it. So SB has in no way ''better'' designed fpu compared to bd.
As for implementing double sized flexfp in bd vs current one,you have to realize that current one IS already beefed up version. In order to support 2x256 bit ops,the load/store capability and therefore complexity would have to be dramaticaly increased. Who needs such an fpu if your L/S system can't feed it(ah yes,intel made one:) )
Intel can reuse their INTcore datapaths for AVX (which is FP only), because INT&FP is tightly coupled in their design. AMD has the opposite approach, INT and FP are separated, already since the K7 days, this actually enabled the CMT approach. For Intel it would be rather impossible, they would need a totally new architecture. Well maybe Haswell will deliver that.
Anyways, back to BD: Because AMD's FPU is not tightly coupled, they would have needed much more space than Intel. If you compare K8/K10, you will see that K10's FPU is nearly double size. It is bit less than that, because it was upgraded from 80bit -> 128, not from 64-> 128.
However, FP code is generally not used very often. To combine now 2x128bit units, for one AVX256bit pass every cycle was definitely the best, smart and most efficient way.
Yes it is, the seonsors on AMD indicate a 15-20C lower temperature than it is.
Have you seen an Thuban working at 75-80C? No because than it already entered in throttle.
Well BD at same surface at Thuban will have much more tranzistors in it.Quote:
It is the reverse.
A big surface has an easier time transmitting all the energy and thus being cooler.
What you say works in a closed case for a short time, but in a long time the air inside is getting hotter.
What i'm saying is that BD will need a strong cooler like Noctua DH-14.
http://semiaccurate.com/forums/showp...&postcount=118
Quote:
Originally Posted by dahakon
the best sensor to test with is in a WC loop, make it small and use the water to see how much heat is coming out of the cpu and into the water, if all variables are the same besides the cpu switch, you can find the C/W ratio for each.
also ive had a Deneb chip boil water in my loop before due to the pump failing. the throttle point is ~90C for the MB sensor and ~60C for the internal sensor.
Cinebench 10 ST
FX-8150 : 4074
2500k/2600K : 5800
i7-965 : 4900
Cinebench 10 MT
FX-8150 : 20615
2500k : 18615
2600k : 22615
Cinebench 11.5 MT
FX-8150 : 6.01
2500k : 5.37
2600K : 6.75
i7-965 : 5.73
3DMark Vantage CPU Score :
FX-8150 : 19119
2600K : 22500
3DMark Vantage Total Score :
FX-8150 : 21949
2600K : 25500
3DMark 11 Total Score :
FX-8150 : 6616
2600K/i7 965 : 7385
Dirt 3
FX-8150 : 105avg/75min
i7-965 : 93avg/71min
Mafia II
FX-8150 : 68.3 avg
i7-965 : 76 avg
Far Cry II
FX-8150 : 111avg/23min
i7-965 : 126avg/75min
As i already mentionned in that thread, a difference of 0,06 is nothing in C11,5.
More importantly is that BD doesn't perform any better than the 1100T in single threaded C10!! Thats with a 500MHz clockspeed advantage...
The FX4100 (3,6-3,8Ghz) will be slower than the current Deneb lineup by the looks of the dutch review.
Did review oc ? Nb + ram oc?
Oh, really? You say so... :down:
Probably it's more like 0.75*8 or so.
You just keep saying lies, here, i wonder what you will say on 12 octomber.
The numbers in the post written by Olivon are correct, let say 5% more or less.
The sad thing is that because BD is more or less a fail Piledriver will be to.
And so we are finished with AMD until 2013 when 3 generation BD arrives.
The even more stupid thing is that Thuban 8 core design, more L3 cache, faster IMC, speed like BD, would have done it better i think , in same die size, same overclocking capabilities( i mean all Thubans can do 4.2-4.3Ghz 24/7 on 45nm, on 32nm would have do 4.5-4.7ghz 24.7), and may be even with better yelds than BD. Llano is an exception because it's APU.
You say so :). Well it will be hard times for AMD fans to accept that they were lied all the year and that BD is a fail.
Many people defended and made excuses for BD all the summer.
JF AMD keep giving false hopes. Nobody had the guts to tell the truth.
I am let's say more of am Intel fan. But i really want BD to crush a little SB to have something new on market, to have lower prices from Intel.
Intel can because of that release cpu's whenever he wants, what he wants, at what price wants.
We can all say thank you to AMD to their "strong competition".
I was talking about overall performance, without TURBO wich anyway doesn't count in all multithread aplications.Quote:
If you took 0.8 from cinebench results - you forgot the turbo frequency impact when calculating multiprocessor speedup.
Looking at those c10 and c11.5 numbers from 8150 and 1100T,all i want to know is how in the world is interlagos with same or less clockspeed going to have 35% higher throughput in legacy fp code?! AMD claims it can do 50% more SP flops then MC,even in legacy code. With what magic?
How is BD a "fail?"Quote:
Well it will be hard times for AMD fans to accept that they were lied all the year and that BD is a fail.
Big surprise there.Quote:
I am let's say more of am Intel fan.
lol wut? That is just so funny to me. How the heck do you toss in 2 more cores and more L3 and come up with the same size? And then you want more core speed and NB speed on top of that added complexity? I suppose you want a ruduced TDP to top it all off too amirite? Ill just get right on that. lol
Also, more stuff from that other leak:
http://www.reddit.com/r/hardware/com..._gonna_say_it/
Quote:
Vithren 1 point 4 hours ago
Do tell, are all the leaks we have seen so far simply a part of a one, gigantic AMD fud campaign?
Quote:
primesuspect
No, they're sites who are capitalizing on pure rumor and hype traffic
(Sigh) Just six more days.
Thuban 6cores has 346mm^ and a TDP of 125W but on 45nm.
On 32nm should have 240-260mm^, see Lynnfield 296mm^(45nm) -> and SB 216mm^ with IGP(32nm).
So it's quite posible that a Thuban with 8 cores and let sau 8MB L3 cache + 8MB L2 cache on 32nm to have 330-346mm^.
And the TDP why should be biger if the die size is the same, and may be the number of the tranzistors would be the same.
And if i remeber AMD launched a Phenom X4 960/965 at 140W TDP so what is the problem. Next revision will fix it.
The performance is more important.
Because an architecture of cpu's waited for 3-4 years, fails to beat the mainstream of Intel.Quote:
How is BD a "fail?"
They are no threat to even 2010 Intel hexa cores and now it's soon 2012.
Because AMD remains again in the back.
Because they had the performance of SB from january or even earlier, they delay 3-4 months and they couldn't do anything to improve much more the performance to at least equal SB 2600K.
Because marketing BD as an 8core is just lame to be equal to an intel quad.
I would have been less harsh if they would have called a quad with 8 threads.
Anyway i'm waisting my time trying to convince some hard AMD fans.
When IB cames, all FX 8XXX will fall under 200$, as Thuban when SB appeared.
So, we will be back with two generation as usually.
With their current 32nm they wouldn't be able to run 6 llano cores at 3GHz and stay under the 100W while having a decent yield.... Not sure how BD does, but the process issue also affect BD (less than llano). The fact that BD is competitive with intel fastest at the moment is alot more than what they had or what an hypotetical 8core llano would be able to do with the state of their process...
Why you just put Llano in comparision, Llano has 40% of the die GPU that's why it has that TDP, not to mention that doing GPU on SOI was wery hard. Llano problems will be much lighter on a cpu design without GPU.
http://lab501.ro/wp-content/uploads/...ze-580x241.jpg
I am pretty sure I don't need to explain how so many of your arguments are purely trying to stirr up some brown mud.
But anyway, myself like many in here are not in this thread to suck up to AMD regardless of how bad/good their product is but we are actually excited that they are putting something new in the market and we are looking at it with a critical eye.
I'm excited for Bulldozer, doesn't mean I am going to buy it. My money goes where performance is higher for my budget.
I suppose many will be dissapointed if Bulldozer won't beat i7 2600k but calling it a complete fail and making wild claims about bad future performance and what ifs from old processors as if they are facts, they are not facts, its your opinion. Bulldozer will be a fail for someone with a 3000$ budget, but if you are looking for an i5 2500k system, you will not be able to avoid comparing it to BD, and the later might end up a little bit better bang for buck.
All is relative.
i just imagine what would happen if they took 2 Llano chips and connected them together. 8 cores, dual gpu, and can run in less than 140W if they dont go all out. but also make them unlocked it could be quite a fun all-in-one chip for a not so insane price. but that also gives a pretty good idea of the clock limitations of stars cores. id also be willing to bet that overclocking such a chip would kill any motherboards VRMs. its quite clear the old architecture is getting too old. but i fear the IPC of BD is going to feel old way too quickly.
Because it is the Cpu that consumes the power budget, not the gpu. The gpu is actually very clean and extremely efficient. (it is a complete marvell... far exceeding the efficieny of SB or any other gpu we know at the moment). They need the high voltages to get yields on the cpu, not the gpu. While it probably would do better without the gpu in the yield department. BD is also suffering issues on the 32nm node. So doubling the llano cores, adding fast l3cache will explode on the current process.... Currently having 50W for 4cores@2,6GHz with proper yields is pushing it for llano.... try double that, add cache and 1,5Ghz and see where that would get you. (most likely to a nuclair generator as power supply..).
I am not talking about the possibilities on a good working process, because that would affect BD also in a positive way.
wow, if is it right, 5 GHz with only 1.45V...! With a bit luck I could get 5.2 GHz 1.5V :)
Talking about 24/7, how does GF's 32nm process cope with voltage? I see llano APUs everywhere at 3.6 ghz and north of 1.4v, near 1.5v for those clocks... anyway, BD will be made on the same process, how durable would that be? 5+ ghz on air is cool and a nice sign too, but is it realistic for 24/7 use at such high voltages, even with under control temps? Dunno, 1.5v seems too high for that process... yeah, AMD's (now GF) SOI 45nm is a tank when coping with voltage, but what about their 32nm? Any ideas on this based on current llano chips?
Having said this, things are looking good, really good. 6 more days! I am already impressed. 4.5 ghz 24/7 at reasonable voltages seem to be completely possible! It games well! You could just take my money now AMD :D
Gaming benchmarks look pretty good and to me personally that's all
I really care about. Can't wait for BD!! :yepp:
6 days. just 6 more days... oh someone let me pre-order it so i can do other things with my life!
How do you know that it's the CPU and not the GPU that consumes the power budget? Besides, the silicon and design could be limited by the compatibility with the GPU.
A Phenom II X6 at 32nm would be almost half the size of a Thuban if caches scales as good as cores when shrinking the process. BD must significantly outperform Thuban to justify this change in architecture.
Well, there is a similar one in the Opt. Guide - one that also shows one FMAC unit on Pipe 0 and one on Pipe 1, so no separate pipes (and ports) for the FMUL and FADD units in a given FMAC. It means independent FADD and FMUL operations cannot be started per cycle per FMAC. JF also wrote in a blog that it's FADD or FMUL or FMA, not (FADD and FMUL) or FMA.
AFAIK K10's FPU is capable of it and SB definitely can do it. I don't know how much it impacts performance, though.