Im curios, if we will see BE and FX series together, or "only" FX serie. Hope, some guys here will have Interlagos as homestation :-)
Im curios, if we will see BE and FX series together, or "only" FX serie. Hope, some guys here will have Interlagos as homestation :-)
ROG Power PCs - Intel and AMD
CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread
...talk about beating a dead horse.....wow.
[MOBO] Asus CrossHair Formula 5 AM3+
[GPU] ATI 6970 x2 Crossfire 2Gb
[RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
[CPU] AMD FX-8120 @ 4.8 ghz
[COOLER] XSPC Rasa 750 RS360 WaterCooling
[OS] Windows 8 x64 Enterprise
[HDD] OCZ Vertex 3 120GB SSD
[AUDIO] Logitech S-220 17 Watts 2.1
AMD released both Athlon MP and Athlon FX with dual sockets. Both failed, and that was in a time when these things were much more needed than today. Back then you had single cores, a dual socket solution made a huge difference, still it failed. How many will bother when you can get 8 cores in one single socket?
AMD made the math.
Ive already responded to this, but what the hell, last one time ;-).
Athlon MP was AMDs first attempt at multiprocessing and workstation/server market.And it did OK, boards were relatively cheap, you could mod mobile bartons to work on these.And platform was available for some time.
It did OK all things considered.
Dual FX failed, simply because AMD was selling ultra expensive platform with two power hungry cpus ,that still was slower than single intel desktop quadcore cpu, and more over, AMD and its partners pretty quickly abandoned the platform altogether.
Core2quad`s bulldozed it.
So if bulldozer IS NOT a failure ,than this analogy doesnt fit.
bulldozer has turbo.. so it doesnt need OC even in server environment
good luck buying 4 socket mobo, rav666 for your "graphics design" needs.
i work with it too and I think single bulldozer covers the needs with turbo. there arent many apps optimized for more cores.
better i add more ram, dual SSD, more hdd, triple 24"/30" IPS monitors etc.
to have server for graphics needs, it is like mount a 10 liter v10 motor on old Toyota Scarlet.
Last edited by Tomasis; 02-08-2011 at 06:03 AM.
Vishera 8320@ 5ghz | Gigabyte UD3 | 8gb TridentX 2400 c10| Powercolor 6850 | Thermalight Silver Arrow (bench Super KAZE 3k) | Samsung 830 128gbx2 Raid 0| Fractal case
I tend to believe this is a 16 core server bulldozer LOL.
http://scarletwhore.com/?p=3277
Looks like they're against the law right?Well after months of begging, threats and attempted bribes we finally persuaded our AMD affiliated source to provide us with some baseline comparisons for the soon to be released AMD Bulldozer CPU.
![]()
Last edited by undone; 02-08-2011 at 06:51 AM.
First all those compared are desktop chips so Bulldozer in that chart should be a Zambezi 8 core part. Second of all ,don't believe in fairy tales from dubious sources... That "website" has already claimed that 8C Bulldozer is almost 2x faster than 980x in "3D rendering" (whatever they mean by that). So until we see some real data all leaked benchmarks from dubious sources are not to be trusted.
Highly dubious indeed. Considering that a module is slightly larger than a SB core, that would mean AMD is able to extract twice the performance from the same die area (roughly speaking). :/ I think not.
AMD actually claims they manage to get almost 2 cores worth of performance from a monolithic dual thread capable module(as they call it;another term is optimized dual core). This is achieved via clever sharing of resources. The thing is that with shared front end,all the benefits of SMT (filling in the pipeline bubbles) are still present with AMD's approach.Only this time,you don't share the same execution unit -you get a whole execution core ready for the thread. This is described by AMD as "smoothing of inefficient/bursty usage".AMD invested heavily in both instruction and data prefetch which are now order of magnitude better than in 10h family. So you are left with fully featured cores that have all the advantages of SMT and behave/perform nearly at the level of independent cores in hypothetical non-shared module(as if they were made a la Athlon/C2D with non shared parts except maybe the L2 cache).
What I'm trying to say is that with Bulldozer,we can't use the same comparison methods we used in the past in which we compare die area of single core ,say, Athlon and Nehalem,and derive the die area investments both firms made in terms of logic and cache. Since Bulldozer is organized the way it is,we have no clue how much die area would ,now shared parts ,occupy in a hypothetical non-shared design. There are some numbers being thrown around (from 15 all the way up to 30% bigger module),but the thing is that only AMD knows exact figures. Now ,since sharing has some benefits as previously mentioned and as well some possible performance downsides,AMD invested in areas which will maximize benefits and minimize the bad effects of sharing. How well they did this will make or brake the module approach and whole Bulldozer idea.
Whether Zambezi can be 2x faster than 980x in some cases I don't doubt.In optimized FMA4/AVX applications difference can be even higher than that. But in legacy code I'm skeptical. The fact is they don't need to be 2x faster,they just need to be faster. If they are,this means Zambezi's cores are much closer to the level of Nehalem ,no matter how they achieved it(whether it's just IPC or just clock or combination of the two). Next year will bring improved Bulldozer in the form of 10 core Komodo part for desktop. That's 25% more cores,with possible minor IPC tweaks and 25% more TDP headroom for cores to boost clocks in poorly threaded workloads
.
Do you have a source for that? Not that I find it impossible, but I like to work with verified info.
On SMT, I suspect Bulldozer doesn't really get all the benefits from it. SMT is used to justify creating a very broad execution engine, which would be wanted to maximise single threaded performance. It might be so (and of course we can't be sure) that AMD had to do concessions in this regard (e.g. for efficiency reasons - execution units do take a lot of energy if kept running despite their small size, right? It's a bit doubtful that AMD chose to implement gating parts of a module.). If that's the case it may be 'smoother', and will rock for heavily threaded apps, but a full-on SMT approach might still have been better for single threaded apps.
I do believe AMD made the right choice, by the way - the module is probably better suited for server workloads. And regarding the client space, I suspect that especially in the future most times the CPU is a bottleneck will be in multithreaded situations.
And of course they needn't be twice as fast.Since it's probably about 40 % larger than a SB quad though, it would be nice if they could sell it for at least some 50% more (given lower yields et al), to have similar margins. The "more cores == higher single threaded frequency" thought is a nice one though.
Anyway, I was just commenting on the graph. I'd find it fairly absurd that in billions upon billions of dollars worth of research, Intel missed just that one thing that could double their performance per mm^2. :P Just to express my skepticism about the graph - not so much the Bulldozer architecture.
Everything can be found here:
http://www.hotchips.org/conference-a.../hot-chips-22/
It's a video from HC22.Mike Butler presented Bulldozer design and there was Q&A after the presentation.You can find all about aggressiveness(he couldn't emphasize it more) in the presentation and some of it in the Q&A.
SMT does zero for single threaded workloads so I don't know what you mean by this. In Bulldozer the shared front end is doing all the work for instruction dispatching and the integer schedulers do the actual work according to the dispatchers orders. FP unit in Bulldozer module is full on SMT approach due to latency tolerant nature of instructions it deals with.L2 is shared dynamically by 2 running threads(so SMT in essence as AMD describes it in one of the slides). Everything else is in the module is of vertical MT organization(switching back and forth between the threads).If that's the case it may be 'smoother', and will rock for heavily threaded apps, but a full-on SMT approach might still have been better for single threaded apps.
So while integer cores do have common L1 instruction cache and front end ,they are fully independent.They do can do opportunistic prefecth in the L2 and the data can be then used by either of the cores. FP unit is dedicated or shared. Dedicated it can assign a whole 256bit FMA to one core,or it is SMT organized and shared by 2 cores as 2 x 128bit FMA(being able to even do 2x ADD or 2x MUL,a feat not possible by any of today's x86 cores).
Intel is suffering from "not invented here" syndrome. They don't like using ideas that are developed by their competitors unless they really have to(think AMD64 ISA).They had some of the people behind "CMT" approach working in intel back in the day,ie Andy Glew,but they never really backed that idea up. Since Glew moved to AMD he presented them with the same concept.Coincidentally some of AMD folk were looking into the similar direction and they decided to pursue CMT approach ,roughly in 2005. 1 year before that Glew left AMD .Chuck Moore held a presentation back in 2005 describing CMT as the best choice for next gen. of multithreaded MPUs from perf./watt/mm^2 perspective.Also Fred Webber hinted back in 2005 where AMD was heading with their next gen of multithreaded CPUs.I do believe AMD made the right choice, by the way - the module is probably better suited for server workloads. And regarding the client space, I suspect that especially in the future most times the CPU is a bottleneck will be in multithreaded situations.
And of course they needn't be twice as fast.Since it's probably about 40 % larger than a SB quad though, it would be nice if they could sell it for at least some 50% more (given lower yields et al), to have similar margins. The "more cores == higher single threaded frequency" thought is a nice one though.
Anyway, I was just commenting on the graph. I'd find it fairly absurd that in billions upon billions of dollars worth of research, Intel missed just that one thing that could double their performance per mm^2. :P Just to express my skepticism about the graph - not so much the Bulldozer architecture.
That's very interresting. I've had a feeling that AMDs prefetchers had a large part in their lower IPC.Originally Posted by informal
Anyone know how large performance impact greatly enhanced prefetchers might have?
Last edited by -Boris-; 02-09-2011 at 12:17 AM.
hehe, nice big fake![]()
ROG Power PCs - Intel and AMD
CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread
Great, thanks!
Regarding SMT and single threaded improvements... SMT in itself indeed does nothing to improve single threaded performance. You must look at the whole design process and the decisions made. If you look at the Nehalem case, some strategy planning occurred first: if I recall correctly, the question was whether to do relatively narrow cores, which would be very suitable for server loads, focus all of their attention on making a broad, single thread oriented core or do a broad one that utilises SMT to make the efficiency acceptable for servers (whether or not to use SMT was indeed a question; the chip designer admitted it's quite hard to pull off SMT, so it would require a lot of resources (finances and time)). They chose the latter, so that their architecture would be suitable for both server and client. Servers have high margins - if it weren't for SMT, they couldn't have justified building a very broad core, capable of handling single threaded so well.SMT does zero for single threaded workloads so I don't know what you mean by this. In Bulldozer the shared front end is doing all the work for instruction dispatching and the integer schedulers do the actual work according to the dispatchers orders. FP unit in Bulldozer module is full on SMT approach due to latency tolerant nature of instructions it deals with.L2 is shared dynamically by 2 running threads(so SMT in essence as AMD describes it in one of the slides). Everything else is in the module is of vertical MT organization(switching back and forth between the threads).
So while integer cores do have common L1 instruction cache and front end ,they are fully independent.They do can do opportunistic prefecth in the L2 and the data can be then used by either of the cores. FP unit is dedicated or shared. Dedicated it can assign a whole 256bit FMA to one core,or it is SMT organized and shared by 2 cores as 2 x 128bit FMA(being able to even do 2x ADD or 2x MUL,a feat not possible by any of today's x86 cores).
In the case of BD, AMD could have invested the transistor budget of the second INT cluster in making the design broader, and implementing SMT to make it more efficient in multithreaded environments. You could end up with a core / module of the same size, but better for singlethreaded workloads, though with the downsides you already mentioned.
I'm not sure whether Intel suffers that. The integrated memory controller, HyperTransport-like bus, even the three-layer cache structure all were implemented by AMD first, and Intel later adopted them. One could say the Core 2 architecture was inspired by thinking that lead to Athlon too - a relatively short pipeline and a focus on IPC rather than frequency. In fact, all this lead Tom's Hardware to title their Nehalem review "Architecture by AMD?" ( http://www.tomshardware.co.uk/Intel-...iew-31375.html ).Intel is suffering from "not invented here" syndrome. They don't like using ideas that are developed by their competitors unless they really have to(think AMD64 ISA).They had some of the people behind "CMT" approach working in intel back in the day,ie Andy Glew,but they never really backed that idea up. Since Glew moved to AMD he presented them with the same concept.Coincidentally some of AMD folk were looking into the similar direction and they decided to pursue CMT approach ,roughly in 2005. 1 year before that Glew left AMD .Chuck Moore held a presentation back in 2005 describing CMT as the best choice for next gen. of multithreaded MPUs from perf./watt/mm^2 perspective.Also Fred Webber hinted back in 2005 where AMD was heading with their next gen of multithreaded CPUs.
If there's a reason why Intel chose not to implement CMT, I'd guess it's because Intel never separated INT and FP the way AMD did and still does, so they can't do the FlexFP trick as easily. This would reduce the gains in area efficiency, I suppose.
I do dare to guess that we'll see some form of CMT going on with Haswell...![]()
Last edited by ohnoitseddy; 02-10-2011 at 03:21 AM.
They assure this *fake* is true? WTF?
http://scarletwhore.com/?p=3277
ATI-Forum quotesThis benchmark has produced a lot of interest and controversy. I can assure it is correct. Back in November I predicted that the soon to be released Panasonic GH2 camera was going to have image sensor problems and this would cause a delay in the shipping and availability of the camera. I was correct. Beyond the benchmarks I have made 2 other predictions: Apple is going to use the AMD Fusion platform in a big way and AMD most certainly is going to release a Dual-socket Bulldozer configuration.
http://news.ati-forum.de/index.php/n...ks-aufgetaucht
Last edited by undone; 02-11-2011 at 10:53 AM.
Remains to be seen.This benchmark has produced a lot of interest and controversy. I can assure it is correct.
Source?Back in November I predicted that the soon to be released Panasonic GH2 camera was going to have image sensor problems and this would cause a delay in the shipping and availability of the camera. I was correct.
Wouldn't be surprised.Apple is going to use the AMD Fusion platform in a big way
Not exactly an unlikely prediction, unless he's talking about desktop. In which case that's pretty much definitely a no.AMD most certainly is going to release a Dual-socket Bulldozer configuration.
13 800 can will be real for 8c Zambezi...But, its not nothing oficialy, we must just waiting![]()
ROG Power PCs - Intel and AMD
CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread
That score of 13.88 in C11.5 is 25% higher than the one posted on Donanimhaber. So either the sample for the DH slide had 25% lower clock(not likely) or the score is not real. DH slide suggests that X8 at xxMhz has ~1.9x perf. of 1100T in C11.5,which is around 10.9-11pts. Still a massive leap in performance. 13.89 is therefore hard to swallow(heck even DH slide is very fishy in many regards).
Uhhhhhggggghhhh, these type of posts... who invents what first.It is really irrelevant, companies will design products to sell products, and they make technical decisions separate from one another for different reasons and arrive at different conditions at different times.
But to make a point.
Intel implemented an IMC in the 386 days with the 386 SL ( http://fury-tech.com/en/tag/history-...croprocessors/ ) there was nothing really invented here, where you put a memory controller is a design choice of the platform.
The hypertransport bus is based on the EV6, which was DEC, Intel bought out DEC in 1998. Did nothing with the IP, AMD popularized the serial point to point nicely in the K8 line.
3 level cache is nothing new, several non-x86 designs employed a 3 level cache hierachy, the first 3 level cache x86 CPU was produced by Intel, http://www.dailytech.com/16MB+of+L3+...rticle2564.htm didn't help much it was still craptastic netburst.
Core 2 was inpsired by continuing from Dothan, Banias, and Yonah, all those were P6 lineage, I doubt much of the design was inspired by K8... they are two completely different architectures.
Last edited by JumpingJack; 02-11-2011 at 10:55 PM.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Bookmarks