Plz get a room you two lovebirds :p
Printable View
@ Flanker, while I agree the performance is "decent", its no where close to what the marketing slides say. Well, even with that performance I am fine with it. But the power draw is what is now keeping me away.
yeah i'd like to see the actual test where they show 50% more performance for just 33% more cores at the same clockspeed
Well I did, and the review you linked to is nothing special... if anything it lacks some thing others do... the most sophisticated bench suites are on Toms Hardware and HT4u, toms because it test video edeting and compiler performance (which noone des beside hardware.fr) and HT4U especial for various audio encoding apps. I also like Techreport for testing Scientific apps.
And before screaming foul better take a look at this first:
Some of the larger reviewsites compared what they bench
Apps/Synth:
http://img847.imageshack.us/img847/2...wappswm.th.png
Games:
http://img831.imageshack.us/img831/1...gameswm.th.png
Combined:
http://img21.imageshack.us/img21/662...wfullwm.th.png
Like this one?
Attachment 121249
Thanks Hornet, good to see a summary in a nice sleek table.
Indeed but nobody here cares, he's one of the 'illogical' redhead club that will see a light in every dark tunnel, and try to tell them is the train that's going to run him over ... nope you don't understand, but you know, when they drown they will try to hold anything to keep them above water. I mean, come on ... 8 "Half Cores" + OC it to get a 'better' performance and suddenly a Power Hog appears, and its Super Effective.
Someone Prove me wrong, atm I, based on all reviews on all available pages, see only a mediocre product that is a good competition to bottom end Nehalem Series.
Only done it for the full one:
http://imageshack.us/photo/my-images...appsnumbe.png/
So you say AMD lied when they've presented BD's uarch with one 4-way decoder in the frontend, and it's really two 2-way ones in the integer clusters? And so there is also an x-way one in the FPU, or what?
Or, are you speaking about that while peak IPC/thread = peak IPC/core with SMT, it's (peak IPC/module)/2 per thread with CMT? Well, that I've also pointed out earlier in this (edit: the other) topic, and asked what could be the rationale behind it.
Anyway, I think the peak IPC is really 3.0/thread here (normal integer x86/x64 instructions wise), because of code-fusion Opteron146 has mentioned already.
Too bad some of Tom's tests are flawed.
BTW, five more to include, if you will: TechSpot, Legion Hardware, Hi Tech Legion, oZeros, VR-Zone.
I would never dare to say that AMD is lieing, What I am saying is that something on the front end is holding back each thread decoders to a max of 2 , nothing more than than, nothing less. And no, I can't find case where it goes up to 3. Don't put evil words in my mouth.
Please stop speculating, get a CPU and try.
Francois
It looked like you said there is a 2-way decoder in every integer core, which is against the official communication of the uarch. But probably we misunderstood you.
It's the decoding or the execution, really? There are two ALU's per integer core, so obviously it's a limiting factor. Do you mean this or that the decoder behaves indeed like if there were two separate 2-way (non-SIMD) integer decoders, as well?Quote:
What I am saying is that something on the front end is holding back each thread decoders to a max of 2 , nothing more than than, nothing less.
I was referring to the Branch Fusion that the Opt. Guide is speaking about, but I guess I was wrong.Quote:
And no, I can't find case where it goes up to 3.
That was a question only, but I'm sorry for the wording.Quote:
Don't put evil words in my mouth.
Well, it's not that easy to get one, right now. (And I think I will wait and see, for a while, anyway.)Quote:
Please stop speculating, get a CPU and try.
Any kernel linux for BD ? how about Performance of BD on Linux compare to Win ? There's hope for BD if Linux supports it?
hornet: yes, u right, here is more interesting reviews. But I saw also some "sh1t" reviews with only syntetics comparsions. Some people read this one or two and they have different look at product. I think, its average product in line from 1100t up to 2500k.
Some Linux benchmarks:
http://openbenchmarking.org/result/1...LI-BULLDOZER29
Seems like OpenSSL and Gcrypt don't love Bulldozer as much as TrueCrypt for Windows loves it.
Also take into consideration Flanker that some websites only had the CPU for a few days... so running a full real suite is difficult. Plus if you want to compare with older gear, synthetic is usually the way to go as results hardly are influenced ( due to patches etc...) Only with 3D it can get difficult (newer drivers) I honestly have got nothing against synthetic benchmarks as usually they already show the strenghts and weaknesses of a new CPU architecture. I think many reviewers are working on an update to include more tests... as this CPU architecture works best with newer apps...
If this FX-8150 ran at 4.5Ghz stock with a power draw of less than 200W then I would buy one myself...
Nice to see the drama continues :)
AMD slides and comments about future products aren't meant for the average joe, they are meant for investors and partners, and to those you always give the best case scenario not the worst, it works that way in every company, the difference is if you deliver or not. Look at Intel, despite being a lot better and greater CPU’s, Nehalem and Sandy weren’t the great leap Intel promised in their slides, i mean Nehalem was supposed to be the most revolutionary CPU in the history of CPU's and is was just core duo without the ancient FSB.
Then you have blogs and sites, that from slides and the cryptic performance comments by AMD, come to the conclusion that a monster is being built, then you have AMD fans waiting for another miracle, K8, the outcome is inevitable. I understand the disappointment, but it's not like it didn't happen before. Barcelona was supposed to destroy Intel, even the lowly AM2 was supposed to smash Intel to oblivion with it''s Reverse-HT. And people are doing it again, from an AMD slide with a cryptic mention to 10% x86 improvement, everyone is already saying piledriver is going to to have 10-15% Better IPC.
The i have to say i found amazing how from straightforwards things like numbers and graphs different people reach different conclusions. I saw 5\6 reviews, my impression was zambezi won the large majority of the tests vs thuban, but when reading the comments on this thread i doubted myself, so i had to double check, and review the reviews I've seen. I stopped at the third, it was pointless to go on, Techreport 20-5, X-bit labs 21-6, TomsHardware 31-10, bringing the total of 72-21 benchmarks in favor of FX-8150, it's not even close. How does that translate to the FX-8150 being 40% slower? or 1100T being quite faster? or a a worse launch than Barcelona for that matter, Phenom 9600 lost the majority of the benchmarks to the X2 6400.
And please let K10 die already, How do people know a 32nm would be better? A K10 in 32nm would have zero issues? Would be able to reach higher frequencies? Let’s say yes for argument sake, what about 2012, 2013, 2014,2015, 2016. Bulldozer is kinda of modular, AMD can cut and paste modules according to the segment, server, desktop, mobile, k10 doesn't allow that. Bulldozer is future proof, K10 isn't, increasing thread count with Bulldozer design it’s a lot easier and doable than K10, and so on and so on… Sorry to say bu AMD makes server cpu's, even the beloved K8 was designed for servers.
I read this in mid 2010. Why are people surprised that FX have lower IPC by end of 2011? Specially with a borked stepping.Quote:
However, in determining project goals for Bulldozer single threaded performance was consciously sacrificed to meet what the team determined was a more optimal overall design point.
AMD missed the target clocks, it’s obvious and no benchmark was in fact needed to understand that.
Hint nº1, K10 has a 12 stage pipeline, we don’t know how many stages Bulldozer have, but let’s go with the conservative number of 50% more, putting it one stage longer than Nehalem. So the pipeline increase plus all the tweaks that were made hurting IPC to allow more frequency, plus the smaller node, allows AMD a 300mhz faster CPU?
Hint nº2, faster models are coming Q1 2012
Hint nº3, in less than 6 months AMD won’t be selling these models. They'l be gone faster than the original Phenom's.
The power consumption on the benchmarks was just proof. If AMD was were Intel is there’s no way they would put this stepping on the market, but AMD needs to make money, and like i said AMD focus is servers and it's almost certain that the best bulldozer's are going there.
where's the UD7 review bro :D
I agree this drama is going on for 4+ years, and apparently it resets with every new amd arch launch.
So K10 was supposed to be awesome kill Intel and all, and it was crap, so came the promises of Bulldozer with SSE in 2009 and it will be awesome and kill Intel.
Now it is the end of 2011 Bulldozer arrived 2 years late and it is crap again, but we have promises that something will come which...
I can see where it is going, ever since they got conroed all they do is make wonderful slides of products which will be late and deliver performance what Intel already offered when their cpu was supposed to come out years ago.
No doubt they will be fixing this mess just like K10 and probably make something useful out of it, but this is already too little too late. I will not believe another AMD slide, will not believe JF, until they can start to deliver on their promises for a change.
Well nice of you, but we have a clear binary case here, either it's 1 or it's 0. One statement has to be true, either AMD is lying, or your measurements are wrong. I thought a bit more and googled a bit.
Questions that arouse:
a) How many threads did you run on a module? 1 or 2?
b) How did you order your instructions? Did you write higher-level C code and let a compiler optimize it, or did you write assembler? It looks like Bulldozer is le gourmet in the field of processors, it wants to have it's data properly cooked and nicely arranged:
http://gcc.gnu.org/ml/gcc/2010-06/msg00402.html
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00717.html
Maybe you are right with unoptimized code and AMD is right with optimized & aligned code? If you didn't write assembler try a newer GCC version, I think 4.6 should be ok.
As I said before, I would like to, but would you pay money for such a CPU? Well .. you already have one, I guess I should ask somebody else ^^Quote:
Please stop speculating, get a CPU and try.
now there might be issues with k10 on 32nm but a lot less then what bulldozer is having...its pretty straightforward you have a working tech shrink it, increase clocks, add some new features have a much better product, keep bulldozer arch back until its perfected a bit better and actually beats something..i think this approach would of been a lot better and probably could of had a release a lot earlier with a more competing product..
sort of like amd's 4-5-6 series..shrink it increase speed add features and the product keeps producing while becoming more efficient.
It does beat stuff, just not on consumer market and margins are on enterprice products, not consumer, and I am pretty sure OEM will still sell this and go "8 CORES SUPER PERFORMANCE" and then we only got entausiasts left and they are a very slim margin of sales I suppose.
K10 is shorter pipeline design and there is no guarantee whatsoever that on 32nm it can clock higher than Bulldozer. Bulldozer on the other hand was designed with 30% more clock target and pipeline was completely redesigned in order to achieve this goal. AMD will fix the power draw issues in time ,maybe with PD or maybe even with the next stepping (C0 8150/8170?) and they will be able to scale up this core to MUCH higher clock speeds with smaller nodes (think 28nm and lower). K10 started at 2.3GHz @ 65nm at first and ended up at 3.7Ghz @45nm. Keep in mind that K10 was not designed with high clock speed in mind,while Bulldozer is. So if K10 hit 3.7Ghz on mature 45nm node, Bulldozer++ on 28nm node should be able to hit much higher than 4Ghz and stay within 125/95W brackets. Couple this with core(IPC) and uncore improvements and AMD is set for next 4 years,maybe even more ,when it comes to competing with the giant intel.
Another interesting thing is Zambezi's Linux performance. This is what I posted in News section thread:
How is 8150 performing under Linux in Phronix test suite? This is what Michael Larabel @ Phoronix posted yesterday (thanks to dresdenboy's blog):
Quote:
Posted by Michael Larabel on October 14, 2011
Here's the first Linux benchmarks of AMD's FX-Series Bulldozer desktop CPUs that launched on Tuesday. Specifically, it's Gentoo Linux performance results for an AMD FX-8150 Bulldozer.
The AMD FX-8150 Linux benchmark results can be found on OpenBenchmarking.org. It's an eight-core AMD FX-8150 on an ASUS Sabertooth 990FX motherboard with 4GB of RAM. Gentoo Linux was used with the Linux 3.0.6 kernel and GCC 4.5.3. Unfortunately, this system is not under my control and there's no direct comparisons available for this hardware system to any other AMD processors.
While there may not be any direct comparisons and these Bulldozer Linux benchmarks are coming in from an independent user running the Phoronix Test Suite and uploading the results to OpenBenchmarking.org, you can compare your system to this FX-8150 Gentoo desktop by running phoronix-test-suite benchmark 1110131-LI-BULLDOZER29 from the latest Phoronix Test Suite client.
Though thanks to the unique OpenBenchmarking.org feature-set, the OpenBenchmarking.org Performance Classifications (OPC) and OpenBenchmarking.org Performance Classification Index (PCI), you can see how this eight-core AMD Bulldozer compares to other Linux systems. Visit this link for the performance classification of this new octal-core processor.
With the OPC results, the "Processor Tests" are the important ones. The FX-8150 results overlayed on the OPC heat-maps indicate that the performance is high-end compared to all of the other systems on OpenBenchmarking.org that have run these tests in the past 120 days. The 7-Zip, NPB, OpenSSL, Tachyon, and Smallpt results highlight this processor the best while the performance in Crafty and EP.B NPB is not as desirable.
So what you are saying is all we have to do is wait and amd will give us good performance, can't see any reason to doudt that lol.
8 cores, 2 billion transistors, 5 years of hype and it is slower than not only the competitions products but also there previous gen while consuming more power. Still the same people try to paint a rosy picture.
Remember when the intel fanboys said the p4d was better and shouted about the one or two benchmarks that it won in while ignoring power and heat, funny.
Actually someone counted it (from all the reviews) and 8150 is ahead of 1100T(usually between 15 and 30%) in 70 or 71(cannot recall the exact number) individual reviews while being behind 1100T in 20,usually single threaded and spanning across very few applications like lame and itunes. So more facts and less imagination please.
Right, so by definition didn't ipc decrease? The question that people are having with this design is despite its modularity, wouldn't an 8 core k10.5 processor have been a better option on an immature 32nm process? Clearly power leakage, transistor density, and size are major issues right now.
Given that we know AMD is purposely sacrificing ipc in favor of more cores with cmt (JF himself said 180% of performance for 35% die space, now clearly those numbers are skewed, but that was the idea in mind) why not just wait for a node where its easy to slap on 8 modules? Now I can see BD becoming intriguing for servers once you hit 16 integer cores per die, but right now it just doesn't make sense imo to buy their product over an intel one. Sales are the final goal right? So you always put out your best line up, not what sounds best on paper imo.
Now the review here where the guy disabled all the secondary integer cores proves that ipc is taking a massive hit from CMT (and thus it has been inferred that the 2 ALUs just aren't enough). Once again, with a die shrink I'm sure AMD could go to 4 2-way ALUs instead over the 4 1-way design and that likely would fix all their ipc problems.
You're talking to the same people that expected Zambezi to give Sandy Bridge a run for its money in multi-threaded apps while being 5% slower at max in ST, so no surprise they're in full damage control now.
Zambezi is so slow in ST that even with the second core in each module disabled (4M/4T) it cant reach Deneb IPC, what a shame.
http://www.hardware.fr/medias/photos...IMG0033907.gif
There are some people expecting huge IPC jumps in Piledriver, but they seem to forget this piece of the Tom's Hardware review.
"How will Piledriver get its projected 10 to 15 percent speed up? AMD says one-third will come from IPC improvements like structure size increases (so, three to five percent) and two-thirds will come from power optimizations that reduce consumption, enabling higher frequencies at a constant TDP (another six to 10 percent)."
http://www.tomshardware.com/reviews/...fx,3043-9.html
AMD better reach their 30% higher clocks than K10.5 goal really fast, if they do so and lower power comsumptions it may give us Thuban users a reason not to go to Intel.
Just few years ago AMD said that Intel was going in a wrong direction with lower IPC, long pipeline, and higher clock. And not AMD just said that, they also proved that. Moreover Intel completely understood its own mistakes and fixed its direction. Isn't it incredibly stupid to take this route again?
I would not count on this. Not only the tech. process is broken, but the "speed demon" cpu design has been proven as generally inefficient. I remember Andy Grove saying "truly sorry" for Pentium 4 inability to reach 4GHz. I doubt someone at AMD have (or will have) the guts to say this.Quote:
AMD will fix the power draw issues in time ,maybe with PD or maybe even with the next stepping (C0 8150/8170?) and they will be able to scale up this core to MUCH higher clock speeds with smaller nodes (think 28nm and lower).
You seem to forget that AMD is at least one process node behind so they need a design that can reach higher frequencies over longer periods of time. They managed to do this with short pipeline K10,they will manage it with 15h,just a matter of time I guess. Also,IPC did decrease somewhat but not always and not by a huge amount. Problem is thread bouncing and inefficient scheduling. AMD is also stubborn and they opted for maximum Turbo with threads grouping on same modules(CUs) over more limited one with threads scheduling on individual modules (CUs) first .
Bottom line is that obviously 15h has a lot of room to grow,it's just a first one in the long family of CPUs. They will bring the IPC up every year,the thing we didn't have with K10 (if you remember ,we got 6% from 65nm->45nm after 2 years and this was mostly L3 cache and a few % was pure core improvements; Llano @ 32nm gets another 3-6% on average with pure core improvements and this is after another 2 years!). So each year they expect to raise IPC (sub 10% or so) and increase clock while maintaining power draw or lowering it via node shrinks. They can add CUs easily and can make a next-gen Fusion type 15h based chip via coupling the "graphics core Next " with the FP coprocessor. So design is still in its infancy and has a lot of room to grow (unlike P4 which didn't grow anywhere ,it went to history).
As modular as their design is, why didn't they just include a K10.5 core to boost single thread performance? If M-Space lets you mix and match with GPU/CPU why not include it and have the best of both worlds?
Now you are drifting into denial mode again...
P4 had improvments in each iteration, Northwood addad a nice IPC gain over willamette + also netted higher frequencies but still was not able to do anything to the upcoming A64 and struggeld agains the athlon XP (at a much lower freq). But with Prescott intel has gone into full retarded mode making the pipeline even longer and even more cache couldn't save it´, because the cache had high latencies and even lost to Northwood im some cases...
The thing that i hope AMD doesn't do is to try to get mhz at all cost or we see the same thing intel stumbled into...
Wait wat.. you just said P4 had no room to grow... but in fact it did, but it wasn't enough... and personally I see the same for BD or every other iteration of it. It has growth potential, but it will be never enough to match anything intel will offer as long as they stick to that.
Hell per core performanec on piledriver will be at the same level as deneb is now... and piledriver is still 6-9 month away...
And Intel is also talking about IPC improvements for Ivy Bridge, probably the same ~5-10% Piledriver will receive. GF's 32nm will eventually allow for much higher clocks, but I expect 22nm Tri-Gate to do the same for the competition, all at lower TDP than Sandy Bridge.
you are backing an opinion with facts. personally i doubt that nodes will permit faster clockspeeds over the next few years. if there is any trend it's probably slower. bulldozer mainly has room for improvement because they messed it up. i would speculate that gobalfoundries screwed up 32nm as well, how much is hard to know. i do agree that BD will likely improve, but that's ways off. the world of technology lives in the urgent.
i'm no cpu designer/engineer
but write is horrible and L1 data cache is write threw L1 write look to be half of what he should be, L2 half and even L3 is half.
maybe it's the fact it's still 2 way for that 64Kbytes of cache maybe they should bump it to 4 way.
cost less then a a gpu lol jk.
why that's genius!
Do you know what is intel planning for Haswell? Is Haswell entirely new design or Core evolution? I think intel stated it's a clean slate new design. So if by a far stretch haswell fails to match IB per clock will you call it a failure too?
Bulldozer 1 was probably made with some compromises in mind. They will improve the design,it's logical thing to do. They have room now to increase both clock and IPC. Whether or not it will be enough to match what intel offers,especially in server space,remains to be seen. I think they will do just fine if GloFo doesn't stumble in the process side of things.
Yes , for me, it will be failure if it cant reach the same IPC as its predecessor, same as P4 was compared to P3. Because every singel speed demon design for the consumer market was a failure. ST performance is still of significant relevance for the Consumer market.
Even if it achieves higher performance than 4C/6C IB via more cores/caches/ISA extensions and clockspeed?
BTW consumer market is changing ;) ,although the pace is rather slow.Give it a year or so and it will get better. More and more applications are being designed with multicore in mind,it's just a matter of time when we will have more MT aware apps versus ST ones. Also,we will have GPU compute powers put to good use. This will pave a way for hybrid CPU/GPU chips in 4-5 years and they will be able to use that enormous compute power for client and server workloads.
yeah ST is very important, i would need to see 20% more single threaded perf (either through clocks or IPC), before i would be willing to purchase it.
multi threaded perf isnt horrendous but its so spread out, and CMT didnt give us 180%, it was ~150% and i wonder how much more potential is there.
fixing those 2 things and it would be worth 315mm2.
I had a nice long post ready to go but freaking PC BSODed on me; ill keep it shorter this time. The problem with AMD can be summed up like this; Their board of directors are :banana::banana::banana::banana:ing idiots. They run the company they bankroll most of what goes on and they pick the CEOs. The only way AMD is going to survive is if the board goes and or the company is taken private. If you look at AMD's history they are chasing a dream cooked up by the marketing folks there who are convinced that the Pentium 4 is the reason they are failing. Not that they dont advertise worth a damn. Not that THEY had failed to make a compelling argument based on the benefits of their option being better overall. NO no no no no this was a MESSAGE issue NOT the delivery; why the sales and marketing people are infallable; THEY didnt screw this up; the ENGINEERS did.
You may think thats absurd but look at it from their shoes:
When the A64 was successful people in droves bought P4s. Why? Was it because they were being paid off to do so; yes but that was stopped. After it stopped people STILL bought P4s and I can just picture some pencil neck asswipe in their marketing branch who thinks its because of HT and because of the mhz. Because they had a higher freq and because it showed more than 1 thread in there that people believed it was better and that in order for AMD to win they have to behave that way.
So what is their response? Well if we cant beat them on process we MUST beat them on speed because people are retards and wont look up the fact sheet all they see are "cores" (which I'm sure was focus grouped) and Ghz. Thats it. They thought people might look into their products more but have concluded people are worthless sheeple as evidence by their lack of marketshare gain in the A64 days.
So instead of focuing on their niche group who buys them religously they said :banana::banana::banana::banana: it; its all or nothing we're going all in.
And what did we end up with? The one thing they've been coveting for years; the pentium 4. Now all their little marketing drones can herp derp about more cores and more speed; real performance be damned. Reality doesnt mean a damn thing as long as it looks good on a power point; just look at all the reality TV out there. What happened to AMD is a sad reflection on how ignorance has been celebrated in this country.
I will make a prediction. I gaurntee you will see all kinds of reports saying that this is the highest grossing processor they've ever brought to market. And sadly they may be right; if this chip is a success god help us all.
The consumer market is changing, but nowhere near as fast or in exactly the same direction as many expected. We've been assuming the advance of GPGPU-based software for years, and it still is fairly limited. Development time of software is increasing as the technical details become more advanced and complexity of operating systems and APIs increases. Ingenuity and creativity can abound, but then the application requires a surrounding context; new technology can breed new ideas, new "needs", and new direction. AMD made a bet on where the future is headed with computing when designing the architecture (whether or not it turned out exactly as they hoped), and it is yet to be seen whether or not that will pay off. At some point, businesses look at their current needs and direction, with currently available software, to make their purchases. If the architecture and resulting power matches the needs, then they'll buy it, else not. AMD's bet is one that assumes that the future payoff will be greater, while the immediate payoff not as overwhelming. Their job becomes harder if they have to push developers to new directions when they can still take an easier route (And sometimes that easier route is the best route! It's not mutually exclusive.). The processor isn't nearly as bad as it seems based on some of the reviews (yes, I have first hand experience), but it isn't going to meet the requirements for everyone either. If AMD want's their approach and architecture to do well, they are going to have to not only improve it and work out bugs, they are going to have to build a software and API (developers "enjoy" the iOS-type API approach; it's where Microsoft, Apple, Android, and even Ubuntu have been heading for awhile) ecosystem surrounding it. I'm not sure they are ready for the latter to be honest...
More cores don't yield you more ST performance, same goes for ISA extensions it don't yields you more performance in current apps. It was the same issue that plagued all P4s... SSE2 performance wsn't that bad. But hardly any apps used it, it took years till people adoped it. Today its still the case, look how many apps use SSE4.x and how many can make use of it... The only thing that would yield more performance if IPC goes down is clock speed. And the more IPC you loose the more clock you need. Lets say haswell looses 15% IPC compared to IB, now it needs at least 15% clock to only reach the speed of IB, then you wan't also a performance increase ~10%... so you need 25% more clock... considering that IB probably will be close to the 4ghz mark, you need a 5ghz haswell to beat a 4ghz IB... nope same situation as we see now with BD... powerconsumption will be trough the roof compared to its predecessor for only a marginal increase in performance.
PPl tell me now for nearly a decade consumer market is changing.. yet performance is still determined by ST.. its the same thing with gpu computing... (just that isn't that long). Its a whole other picture in the professional market, but thats not what we discuss right now.
You see, matt... A lot of applications are starting to use 2 threads now, some use up to 4... The thing is, we have 8 threads available (to consumers) since 2008... And there are next to no applications around that can utilise them all now, almost in 2012 (we are talking about an average Joe, not a hardcore cruncher or professional 3D artist who already sits there with a 12 core machine). A lot of tasks are extremely difficult to multi-thread. So for the vast majority of applications single threaded performance will stay extremely important, as long as you have the necessary number of threads available (and let's face it, 4 thread CPUs are dirt cheap these days).
As zalbard said... hell itunes, as much as i hate it, is a very popular app... its still single threaded... the only thing that uses 4 threads+ efficiently is video encoding, audio encoding is also mostly singel or dual threaded (lame) etc. etc....
What I don't understand is that clearly the BD is good for multi-threaded performance such as possibly the server environment. Why release current chips with this poor leakage and clocks, why not just release the server first (where the chip is apparently designed for) then after another respin, then launch the desktop Zambezi later with better power/clocks? If you miss desired clocks by 30%, why release it early and have everyone talk about it really bad. Why not just be straight forward and say that the B2 silicon has issues and they are going to do respin?
http://crazyworldofchips.blogspot.com/ solid write up on the state of bulldozer, and the issues at hand.
I asked myself the same. It looks strange, especially when there is a B3 revision coming out shortly, too.
Only explanation to me is money. Maybe AMD need the cash-flow and/or they wanted to launch previously to SandyE, because the reviews would have been even worse, which would force them to reduce prices@launch even more.
Anybody with another idea?
Sorry I'm a bit lost here. Why are you focusing on one thread When the front end of bulldozer is responsble for two threads, just like Sandybridge?
I know it falls behind clock/clock, but I don't you think that has more to do with other bottlnecks? Including, for integer code, the much debated ALU resources ona single thread? What about the longer pipeine? Are you taking into account there may still be a deficiency in Branch prediction next to Intel ? Pipeline Bubbles (floating point) that get filled by a 2nd thread?
What would be more interesting I think, is comparing code thats exlusivley floating point with a single thread, then two threads, both on the one module. This would remove the integer clusters from the equation completely. (don't know if this is practical.. programming knowlege is my deficiency so help me out here! )
That's all good and well, but stating that overall performance is determined by ST perf is just plain bonkers. Sure there are plenty of applications out there that are still single threaded, and many tasks cannot easily take advantage of threading, but multi-threaded apps are clearly the direction in which things are headed for the future and it is evident today.
You've got to be kidding me on video encoding being the only thing that uses 4 threads efficiently. There are games that can take advantage of and benefit from 8 cores today. Mass audio encoding can use as many cores as you have available (dbpoweramp). Windows, with its various components and services, can also easily utilize many cores when a lot is going on. Sharing files, copying files, live transcoding for DLNA media sharing, recording multiple TV shows, and watching a movie at the same time can certainly utilize more than a quad core on its own every day of the week in a home environment. I'm assuming you've got multiple applications open now, likely more than one active. Provided you are, you have successfully taken advantage of more than one core. You're obviously not getting 4x the perf all the time from 4 cores (or more), but when the going gets tough and a lot is going on it can certainly help. The cpu manufacturers aren't adding cores and threads to processors for their health, they're doing it because software is able to take advantage of them and the user experience benefits from their presence.
I'm definitely not going out on a limb and calling BD good or great in ST apps (I'll settle for a decent first try at a new arch), but stating that multi threaded performance is somehow irrelevant and single threaded performance is the only (or even primary) meaningful yard stick is way off base.
--Matt
Well they released the consumer chips first because they dont require months and months of testing that server chips need. The BD is very server focused but so is SNB-E it will be a very interesting to see them in the server environment where multi-thread applications dominate. I have seen a 6 core SNB-E it was hot and heavy "In cooling req hehe" i dont think it will have 2B transistor count tough :P "4B for 16core BD my god that's huge"
http://img546.imageshack.us/img546/9083/80883532.jpg
Are you from the future? AMD's "Excavator" is planned for 2014 :p:
I think, it must be balanced a bit. Single thread performance and multi. Yes, we are more and more in multitasking age, but still we need "average" single thread performance. If Piledriver will come with Phenom II single thread performance clock to clock or better, it will be good product.
To others, BD is far away from Pentium IV desing and pipelines....
lol, no it won't.
Bulldozer was supposed to be better than Phenom II, if Piledriver only catches up to Phenom II it will be a failure again. They need to get closer to the IPC levels of Sandy Bridge, not match their several years old architecture...if they somehow beat PII, then it might be something different, but there are still issues with Windows core management and I don't see anyone from AMD talking about a possible fix in the works.
No, this is unreal. Because if will be IPC near to SB, Piledriver destroy totally all CPU segment. And with respect, it is not possible now. Example of this: Now FX has 6.02p in R11.5, in Photoshop is near 2500k etc etc. If single thread will be near SB with Piledriver, in multithread has Ivy Bridge no chance. And this is not real from my point of view.
If Piledriver will have clock to clock PII single thread perfomance, means with Piledriver clocks about 3700 MHz at stock+turbo will be better than Denebs on stock (maybe as core i7 930 or 950 at stock). In Multithread could be in R11.5 about 7.5 points and this is simillary as Ivy bridge-DT (expect IB about 7.2 with 3600K).
IB won't really clock higher; it should theoretically just consume less power.
mAJORD is correct. Bulldozer should be able to clock higher but it needs mature 32nm process. The power draw/frequency they achieved with Zambezi B2 is entirely GloFo's fault .
Well, AMD surely aimed 5+ghz clocks on paper. Would have been fairly reasonable cpu, if fx-8150 was 5000+mhz by default and still be inside its TDP.
I already wonder how many can do a 5GHz prime 95 run for hours, my heatoutput is already to high at 4.7, reaching over 90°C in a matter of minutes... ( and no my coolers are mounted fine :p )
I'd rather search for "GenuineIntel" in the exe file. For example at these positions (for the 64b binary):
006F6595, 006F65A4 & 006F65AE.
There are these commands:
cmp eax,0756E6547
cmp eax,049656E69
cmp eax,06C65746E
The hex numbers translated, from bottom to top:
"letn Ieni uneG"
Now read form right to left ... ;-)
What's the purpose of this?
I have to admit however, that there is not much performance difference on a AMD K10. Just wonder what it is doing there ...
You're missing the point that much or all of that activity is likely subject to bottlenecks elsewhere in the system. Most threads doing meaningful computational work on significant amounts of data will be waiting for disk (or Internet) IO requests most of the time, not actually computing, since even with SSDs disk IO is orders of magnitude slower than memory access. Unless you are running only truly multi-threaded apps like encoders, once you've gone past a number of cores (4? 6?) the difference is not likely to be user-noticeable since the core scheduler will be allowing busy threads to use the time the held-up threads don't need while waiting for the IO subsystem.
I love the idea that the problem with ST versus MT is all due to lazy programmers who haven't multi-threaded their software. How, exactly, would multi-threading my email software help? Will it get my email off the remote server faster? Will it display the single email I am looking at any faster? :rofl: Better ST performance might, though...
P4 took over because P3 couldn't go any further as a single core. P4 also caused K7 to lose marketshare.
But for a throurough look at this topic I might recommend taking an course covering microarchitectures. :) OTOH I realized that car technology is much better known in the public. First on understanding cars: Who knows, that cars now usually have hundreds of small processors? That they have many communication networks (e. g. as CAN, Flexray)? That for releasing an airbag the controllers in some cars partly calculate algorithms belonging to the class of artificial intelligence algos (e. g. a neural network trained to detect a happening crash during the first milliseconds and predicting the maximum impact)? That physically relatively small engines reach higher hp and torque numbers than the larger engines in the past thanks to lots of software and hardware improvements? It got really complicated nowadays. Yet we still talk about them based on a view variables.
Now imagine Bulldozer being such a newly developed car. It has 8 cylinders, a different gearbox and so on. Now put in an experienced driver, who used to drive small cars. Would he drive as well as a driver trained to the new car? You surely guessed that the driver represents the software. It looks like we still have to wait for better drivers than the current ones. ;)
Or this explains it:
http://www.xbitlabs.com/news/cpu/dis...er_Fiasco.html
maybe yes, maybe not....
Ein Autovergleich... Jetzt fehlt nur noch Hitler in diesem Thread.
I think both Intel and AMD are already adapting automated design software to speed up the design. But maybe it is just that the engineers will do manual tweaking/routing/fine tuning in critical parts.
If engineers already realized that the automation tools will bring 20% larger die space and 20% less efficient, surely they would do something about it. Unless there is no one left in the team could do manual tweaking at all.
I still doubt the rumor. In my opinion, he mixed up things with Ontario. That chip is indeed the result of automated tools. You can clearly see it by the strange floorplan, it is totally irregular. However, to the contrary, Bulldozer's floorplan is very modular / rectangular, even inside the cores. That's normally the result of handcrafted design. Just compare it with Ontario and the difference is imminent.
Furthermore, Ontario is a success and has a very small die-size and a competitive power consumption. Actually I would even say that it is currently AMD's best chip. Small, cheap to produce and it should sell very well, the typical cash cow.
Why should we look at potential performance with software that doesn't exist instead of real one? its about as useful as non real world benches.
3dmark and such can be fun as toys, but they do not represent reality, just like software specifically tweaked for a certain architecture does not represent the software used by consumers every day.
No one knew how a Northwood shrink on 90nm would do compared Prescott, but we DO have a 32nm K10 and its called Llano. Llano should be slighty faster than 45nm K10 because it had some minor architectural improvements, but is hard to directly compare to them because besides that you would have to isolate the CPU part from the GPU for any serious comparision (Maybe very easily done with a discrete Video Card, but we don't know if the IMC servicing the CPU only is as good as previous ones) and the features differs from all the others. You can't directly put it against Denebs or Thubans: You have twice the Cache L2, but no Cache L3, and are limited to Quad Core. However, you could make interesing IPC based comparisions if you picked an Athlon II X2 Regor (That got 1 MB Cache L2 per Core) against a Llano with two Cores disabled.
We also don't know the true headroom potential of Llano because there is no way to overclock it without hitting a Base Clock wall as every other Bus derives its Frequency from it, so anything could be holding you back (Including Llano very own GPU). If the supposed model with the Unlocked Multiplier shows up, interesing comparisions on Llano true CPU scaling could be made. If Deneb C3 was 3.8 GHz capable (Forget going beyond it, power consumption gets ugly), I don't see why Llano couldn't reach at least the same values, and maybe 200 MHz more, with better power consumption. Basically, Llano could put Bulldozer even to more shame.
It would be a hard pick though. I don't see enthusiasts adopting Socket FM1 even if Llano has potential as a Bulldozer alternative. You would be losing Thuban 2 extra Cores, the Cache L3, and maybe even if you don't miss Bulldozer, chances are you want to stick Piledriver in your current AM3+ Motherboard.
Besides that because the platform wasn't designed for overclocking as you can't currently isolate Llano CPU potential without messing with everything else, under the limited TDP you have both a CPU and a strong GPU, that's the reason for the conservative Frequencies. And considering that AMD is segmenting FM1 as a value/mainstream platform and AM3+ as the enthusiast one, they don't have any real reason to crank up Llano Frequency or putting it too close to Bulldozer.
I'm dissapointed about Bulldozer as a whole. After hearing JF-AMD insisting on Bulldozer having higher IPC (Something that we could consider have considered set on stone coming directly from AMD), I was expecting something consistently superior to K10 and instead we got something that is noticeabily slower. The only way that IPC is higher is if they're comparing an entire Bulldozer Module against a single K10 Core. However, there are some interesing things: Bulldozer got 2000 Millions Transistors at 4 GHz and at nominal Frequency boast a respectable power consumption. However, Bulldozer is pretty much at the top of the Frequency/Voltage curve, this is also the reason why the power consumption gets ridiculous crazy with just a moderate overclock. Not only so, the 2000M Transistors are very densely packed:
Bulldozer 8C 2000M? / 315 mm^2 = 6,35
Llano 4C 1450M / 228 mm^2 = 6,36
Gulftown 6C 1170M / 240 mm^2 = 4,88
Clarkdale 2C 384M / 81 mm^2 = 4,74
Sandy Bridge 4C 995M / 216 mm^2 = 4,61
Sandy Bridge 2C (GT2) 624M / 149 mm^2 = 4,19
Sandy Bridge 2C (GT1) 504M / 131 mm^2 = 3,85
How much does the Transistor density potentially hurts yields or Frequency scaling? I suppose that such 35% higher density compared to Sandy Bridge could be a pain to handle by the fresh process at Global Foundries, and this applies to both Llano and Bulldozer.
Anyways. What dissapoints me is that no matter how faster it was supposed to be compared to a Core i5 or i7, the point is that Bulldozer can't consistently beat what it was mean to replace and you don't need Intel competing when AMD older Processors put it to shame. Not only that, but I doubt many people got enough knowledge on the particularities of Bulldozer to determine the sum of things that it is lacking before it can bring real competitive performance. Maybe one Stepping or two, like for Barcelona TLB bug? It could improve Frequency headroom and maybe fix anything that could cause subpar performance on a subsystem, like the Cache performance. Maybe an architectural revision, requiering us to wait for Piledriver so it does what Stars did for Barcelona? Or is the design flawed and unworkable and maybe we could see a K10 variant at 22nm making a comeback P6-style?
Just assume, AMD really do it with automation tools. Before the processor could be launched, there is testing for the chip. When the performance turns out to be worse than Phenom II in some cases, do you think, their design engineers do not know about it?
If they know it, what makes them not tuning it?(this is not stated in the report)
I have doubt about the insider news too.
Or he really thinks that his colleagues are really dumb? lol
If I worked at some place and say in public a piece of data that is not protected under NDA (While I know all the other details I can NOT say to be able to reach and sustain such a conclusion), you are indeed going to believe me much, much more that some random guy with an Engineering Sample leaking results. I don't think he could have stated something without knowing if it was true or false as he knew all the details we didn't, this is why I don't get why he could have insisting soo firmly in the "IPC increases" thing if it wasn't the case. Maybe he was comparing a Bulldozer module against a single K10 Core, that is the only way you can make sense of it.
Also, he was quite accurate with the statement that Bulldozer wasn't compatible with the standard AM3 platform and at that moment I think most believed that it could been a drop-in replacement.
i am really getting tired of that getting posted 1000 times and jf-amd getting blamed for terrace getting banned. terrace was banned because he couldn't keep his mouth shut plain and simple. and with him being banned we aren't even supposed to be mentioning his name. yet if we were to mention obr's name in a bd thread half the people would jump on you. but no its ok to talk about terrace all we want.
/rant
you copy and past too much the WEB ...
this is the code that test the CPU to report and compare the CPU on the left of the application (yeap ... there is actually a good reason to read the cPU in Cinebench) ... check how many time if you actually tested yourself on cinebench, you ll see if is only very few times.
cheap inaccurate propaganda, really ...
Francois
here is what you google to try to make your point:
http://www.google.com/#sclient=psy-a...w=1920&bih=979
PS: not answering to this anymore, it is not worse my time and energy when people only try to find problems, or invent them
Exactly the area (cost) and power consumption were preventing both archs to introduce them as dual cores that early, since this at least required some optimization for energy efficiency (OTOHit's possible to have 2 dies running at 0.7x frequency in the same power envelope) and/or a newer process (to solve area and power).
No, to the contrary, I did not search enough in the web, because then I would have saved some time.
I dont speak Russian, but your second google hit is from me, I also examined a boinc binary, which inhibited the use of SSE2@AMD chips. The "clean" binary upps the performance +10% on AMD CPUs, nice plus.
I believe you, because the performance differences for cinebench are very small (none of the aforementioned 10% as in case of the other program).Quote:
this is the code that test the CPU to report and compare the CPU on the left of the application (yeap ... there is actually a good reason to read the cPU in Cinebench) ... check how many time if you actually tested yourself on cinebench, you ll see if is only very few times.
But if you state that it is only for naming the chip correctly on the left side of cinebench, then I wonder why they compare to a hardcoded Intel string, and not just read out the CPUID by using the cpuid command and copying the results from the respective return value registers. Well i conclude wired programming style then.
Hmmm sounds like a cheap insult to me now, really. Please show some respect, merci.Quote:
cheap inaccurate propaganda, really ...
Your free wish, however, if you don't want to discuss, don't start to post on a discussion board ;-)Quote:
PS: not answering to this anymore, it is not worse my time and energy when people only try to find problems, or invent them
I thank you for your time then, take care and
Au revouir
Always cool to blame intel rather then the lazy coders... why do they use a compiler form 2008 (or older) in 2011, why they don't set flags, which would put minimum execution path to SSE2?... why they don't give response to people pointing this out?
To me its seems many coders don't care at all for optimisations... Imho the best case was seti.. the best binaries came form the comunity who made binaries for each architecture by themself..
Very good question, I contacted them and never got a reply. I guess the answer is easy, they get it for free and don't have to pay the power bills.
Yes, but then - why are they using ICC in the first place and not stick to the standard MS compiler? Seems they care, somehow - weird.Quote:
To me its seems many coders don't care at all for optimisations...
its not the message, is how you push the message. if i walk into every intel post and say that the next intel chips are going to be overpriced and useless for mainstream, i have every right to believe that, but i dont have the right to flamebait in every thread with that idea.
If what you are saying is true and people are saying you are wrong then you do have the right to argue your point, remember it always took more than one person to argue, jf could have said "we will see" and let time dictate who was right, jf argued strongly that terrace was wrong and terrace argued back.
Oh please :rolleyes:
Knock it off.
Now you're just making excuses for AMD.
I think it's more than likely that Jeff was simply fed inaccurate information from people higher up in the company... which leads me to believe one of two things. Either management is completely clueless about the architecture of Bulldozer and internal performance estimates, and due to incompetence would just tell Jeff what he wanted to hear to hurry his departure from said office. OR.... someone in management was FULLY aware of how Bulldozer would perform and perhaps didn't really like Jeff very much, thus giving him incorrect date AND putting office politics before the image of the company.
Neither are that far fetched. I personally think the latter is most likely considering I once was in that exact same situation a few years back.