AMD Zambezi news, info, fans !

**kull** · 10-08-2011, 04:00 AM

Originally Posted by informal

Forget SandyBridge,it's another level. This thing can't match Deneb X4,let alone X6 Thuban or SB..

EDIT: ahh man,you tricked me

. The thread is 2 months old and it's about 8130P that is ES and doesn't exist in retail form... So all those results are irrelevant ,even if they are close to retail(like I believe linpack is).

That thread started yesterday, in central and eastern europe the date is written differently, 7.10.2011. means seventh day of october not tenth day of july.

It is related with this news http://www.techpowerup.com/153213/AM...ian-Store.html regarding FX8120 on stock at ukranian webstore Fixer. Those guys bought it there.
On this very page member Pt1t explained again that you need latest CPU-Z version 1.58.7 to display correctly FX8120 cpu string, otherwise it will be marked as FX8130P.

**mattkosem** · 10-08-2011, 04:04 AM

Originally Posted by informal

Thanks raider!

So 4.5Ghz on 8150 and it scores 40.5Gflops in Linpack. For comparison this is what Deneb X4 @ 3.8Ghz gets : 34.7Gflops (or @ 4.5Ghz 41Gflops). So 8C Bulldozer is slower than 4C Deneb in pure FP benchmark that is used to measure HPC performance in Top500 supercomputer list... I have no idea what is going on with the FPU in Orochi.

LOL! Trying to compare linx gigaflop numbers with different problem sizes. Shame on you.

For laughs though...

Here's one from a Nehalem i7 with over 65.

Maybe another random Deneb DDR2 score over 47.

Maybe a Thuban with over 76Gflops?

You guys crack me up.

--Matt

**dess** · 10-08-2011, 04:08 AM

Originally Posted by informal

Zambezi core in FX4100 has exactly 25% lower SIMD/fp performance than Deneb core...

Exactly? Isn't it an average value? I think you should make a distinction between scalar and vector (SIMD) performance, and also upon optimization. I think it's worse only with non-optimized scalar FP code (such as most if not all the programs you've mentioned).

**informal** · 10-08-2011, 04:33 AM

@ dess
Why do you think cinebench is using scalar simd ?

**liberato87** · 10-08-2011, 04:37 AM

Originally Posted by informal

Thanks raider!

So 4.5Ghz on 8150 and it scores 40.5Gflops in Linpack. For comparison this is what Deneb X4 @ 3.8Ghz gets : 34.7Gflops (or @ 4.5Ghz 41Gflops). So 8C Bulldozer is slower than 4C Deneb in pure FP benchmark that is used to measure HPC performance in Top500 supercomputer list... I have no idea what is going on with the FPU in Orochi.

in linx, with problem size 4000 (memory 127mb)
1090t @ 4ghz does 65 gflops

i cant think is possible than and fx 8120 8c/8t @ 4600mhz (+600mhz over my 1090t) does 40.5gflops vs 65gflops!

jf said that ipc increases, dont?!
and so what?
we have also more frequency and more cores and these are the results?

I cant think this is final cpu, for the reason that performance are the same that someone showed us 3-4 months ago and I cant understand the delay if the cpu are still performing bad...

**mAJORD** · 10-08-2011, 05:07 AM

Originally Posted by informal

Summary of FX4110 @ 4.2Ghz vs Phenom II X4 980 @ 3.7Ghz:
super pi :meh I don't want to bother finding results in this "benchmark"
fritz chess: FX4110 gets 7332pts, X4 980 gets 9067pts. FX is 24% slower at stock(vs stock) and 40% slower per "core" at the same clock. Multithreaded benchmark.
3d mark vantage CPU test : FX4110 gets 10660pts, X4 980 gets 12780pts. FX is 20% slower at stock and 36% slower per "core" at the same clock.Multithreaded benchmark.
wprime 32m: FX4110 gets 17.9s, X4 980 gets 11.45s. FX is 56% slower at stock and 77% slower per "core" at the same clock.Multithreaded benchmark.
c11.5 : FX4110 gets 3.42pts, X4 980 gets 4.34pts. FX is 27% slower at stock and 44% slower per "core" at the same clock.Multithreaded benchmark.

I have skipped over 7zip since I can't find comparable benchmarks . Aida cache and memory shows somewhat better memory read/write and L2/L3 cache BW for reads. The rest of cache performance is on par or slower than Deneb.

Conclusion: overall FX4110 is 32% slower than Deneb X4 @ 3.7Ghz stock vs stock and 49% slower when both are at 4.2GHz. Either all these tests are failure of the platform bugs (or something else) or Bulldozer is much slower than Deneb with the same "thread" count. All above tests utilize the "world's first 256bit FPU" and it fails hard versus "old 128bit" Deneb FPU,even in single thread mode... Imagine the OC you have to reach to just match Deneb,it has to be sky high (think 5.5-6Ghz on air to match 4Ghz Deneb). How is AMD going to charge 140$ for this chip is beyond me.

For the record, and another reference. going back to my Bobcat clock/clock comparisons
Per core K10 is:

Super pi 5% faster
Fritz chess: K10 20% faster
Cinebench 11.5: 49% faster

at the same clk speed.

which means, somehow Bulldozer is about the same performance :S , even on some of these SSE/FPU heavy benches.

Has anyone determined the pipeline length details for Bulldozer? I believe Bobcat is 15 stages vs 12 for K10

It is likely a bit longer to enable these high clockspeeds, but I'm still finding some of the results out of line. I know you didn't look up superpi, but it for one is, according to these results slower than Bobcat (as i've mentioned before), given the architectures seem to be similar, I find this quite odd.

Even if the pipeline stages are longer than Bobcat's the massive amounts of Cache, much larger buffers, much wider more capable performance orientated FPU (Bobcat has a very trimmed down FPU due to its target market) , assumed more aggressive prefteching It certainly doesn't make much sense at this stage.

I can understand similar IPC to Thurban given the higher frequency headroom, and trade-off's to achieve high performance / watt (all valid design decsisions), but these outlier results like Cinebench, Wprime, Fritz, are quite baffling

**dess** · 10-08-2011, 05:16 AM

@ informal: It's either scalar or vector (=SIMD). I can't tell for sure if Cinebench is the former or the latter one, but for a long time ray-tracers were scalar, so I think it's mostly so, as well.

Also, Bulldozer performs much better (according to AMD's slides, at least) with Handbrake/x264, which is certainly SIMD-based.

Anyway, remember what I've wrote regarding the FMAC's... There are the same number of FADD's and FMUL's per core as in K10.x, but those are coupled (1 FADD + 1 FMUL) in FMAC units and there is only one scheduler port per FMAC, so it can have an FADD and an FMUL started in the same cycle only if it's an FMA instruction. According to publicly known information, at least.

**tbone8ty** · 10-08-2011, 06:56 AM

does the cpu come with that watercooling kit? Or is it in a seperate box?

**Oese** · 10-08-2011, 07:04 AM

it might be a special edition, or only for some markets. I doubt it will be included with every boxed 8150..

**freeloader** · 10-08-2011, 07:23 AM

It doesn't even look like AMD can polish this turd.

**doompc** · 10-08-2011, 08:28 AM

Thanks for sharing some results, informal.

Any overclocking tests on the FX-4 ? (5GHz on air, maybe)

MaddMutt · 10-08-2011, 08:28 AM

The only reason that I have made comment's along with some other members about the K-10 < 45nm - 32nm > 12mb cache was Mutithreaded scaling. The Thuban's score better 6c/6t (6.39 my rig) than a 2600K 4c/8t in cinebench (5.45). The Thuban's also score good in video encoding.

AMD is calling this a 4110 as a 4c/4t Proc
AMD is calling this a 8150 as a 8c/8t Proc

By this picture and other post's over that last 2-3 days, they need to do some DAMAGE CONTROL on this release before the ______ hits the fan.

As Chew said ( THINK OF THIS AS A 4C/8T PROC )

Click image for larger version.

Name: 2iu2fed.jpg
Views: 1940
Size: 122.6 KB
ID: 120956

If AMD does a quick respin in the PR department, they might be able to salvage something from this mess.

This only looks good if this IS LABELED AS A 2C/4T CPU!!!!!!!!!!

Thank You
For Your Time.

**demonkevy666** · 10-08-2011, 09:03 AM

Originally Posted by Opteron146

You maybe mean the write (combining) buffer to the L2 which is needed because the L1 is write through?

i spent most the day looking for it yesterday it was trace cache.
even sandy bridge using something similar

**rog** · 10-08-2011, 09:13 AM

Originally Posted by MaddMutt

The only reason that I have made comment's along with some other members about the K-10 < 45nm - 32nm > 12mb cache was Mutithreaded scaling. The Thuban's score better 6c/6t (6.39 my rig) than a 2600K 4c/8t in cinebench (5.45).

2600k scores ~6,90 with Turbo enabled.

**dess** · 10-08-2011, 09:17 AM

Originally Posted by demonkevy666

i spent most the day looking for it yesterday it was trace cache.
even sandy bridge using something similar

Then keep looking as it's called write coalescing cache.

The other one is related to the instruction cache (instead of data cache).

**memmem** · 10-08-2011, 09:49 AM

What I can´t understand from an engineering stand point:

FX-8150: 294mm²
i7 2600K: 216mm² with IGP

Of corse we don´t know the transistor count in FX-8150 or the density of Globalfoundries process, but still is a major difference in die size for less overall performance (assuming the leaks are from a final plataform).

May XOP and FMA4 be responsible for this die size?

MaddMutt · 10-08-2011, 09:51 AM

Originally Posted by rog

2600k scores ~6,90 with Turbo enabled.

Thank you For the correction

I have a question?????

The cpu-z 1.58 shows a max TDP of 136W for the 4110 and 1.58.7 shows a max TDP of 124 for the 8120.

Should they not both have the same TDP???

Is the info wrong as the FX-4110 is from CPU-Z 1.58 and the FX-8120 is with the newer ver 1.58.7??????

Thank you for your time

**Brice MJ** · 10-08-2011, 09:56 AM

Originally Posted by memmem

What I can´t understand from an engineering stand point:

FX-8150: 294mm²
i7 2600K: 216mm² with IGP

http://www.hardwarebenchnews.com/wp-...r_Die_size.png

315mm²

**bamtan2** · 10-08-2011, 10:05 AM

Originally Posted by MaddMutt

This only looks good if this IS LABELED AS A 2C/4T CPU!!!!!!!!!!

I don't know about that. I do know that most of the early reports are bad, so either bulldozer sucks this much, or it doesn't and AMD has completely botched the early product marketing and launch.

judging from how messed up bulldozer has been for YEARS, I'm guessing it could be bad AND amd could be botching the launch. but hopefully it is just the second one.

**Canis-X** · 10-08-2011, 10:05 AM

@chew*.....are you having fun over there? LOL

AMD Demonstrates Bulldozer Technology w/ Liquid Nitrogen <-- Live feed

MaddMutt · 10-08-2011, 10:38 AM

Originally Posted by bamtan2

I don't know about that. I do know that most of the early reports are bad, so either bulldozer sucks this much, or it doesn't and AMD has completely botched the early product marketing and launch.

judging from how messed up bulldozer has been for YEARS, I'm guessing it could be bad AND amd could be botching the launch. but hopefully it is just the second one.

I hope so too

If not then It will be like when AMD FINALLY released the PHENOM I

Also I found in here...post #2914 page #39 DATED 09-19-2011....

COLD2010 links to a web site that has a FX-8120 with a B2 stepping.

Everyone at the time said it's fake?????

Was this Legit?????

Or are we still waiting on a B3 stepping on the 12th????

Thank you
For your time

**FlanK3r** · 10-08-2011, 10:42 AM

oh nice...4 girls to one man

, like me

**The Stilt** · 10-08-2011, 11:03 AM

Originally Posted by MaddMutt

Thank you For the correction

I have a question?????

The cpu-z 1.58 shows a max TDP of 136W for the 4110 and 1.58.7 shows a max TDP of 124 for the 8120.

Should they not both have the same TDP???

Is the info wrong as the FX-4110 is from CPU-Z 1.58 and the FX-8120 is with the newer ver 1.58.7??????

Thank you for your time

TDPs displayed by CPU-Z 1.58.7 and newer are correct.
On earlier versions just ignore the readings, they are completely off.

**Apokalipse** · 10-08-2011, 11:33 AM

Originally Posted by xdan

Quite interesting.
So when i said that BD is not at 8 core it's an quad 8 threads, many guys just don't stop saying is an 8core with cores that share resources.

You can say a module is two cores. It's two cores that doesn't get the same scaling as two "normal" cores when they're both being used, but taking much less die area.
A module is 12% larger than a single core (not deneb core, but hypothetical single BD core with a full 256-bit FPU and single integer unit), while getting significantly greater than 12% more performance from two threads. Basically the point is to get more throughput per die area. Much like hyperthreading does; it just works in a very different way.

So while you could call it two cores, it is analogous to a hyperthreaded core but with higher scaling (but still not as much as two "normal" cores).

Originally Posted by xdan

But you say that we shouldn't compare 1core/1thread of Deneb with 1"core"/ 1thread of BD? but with an module?

I'd say:
If you're using two cores in a module, you can't directly compare them to two "normal" cores.
If you're using one core in the module, you can directly compare it to one "normal" core.
It comes down to thread scaling. It's similar to hyperthreading this way.

Originally Posted by xdan

Than what's FX 4170 a quad or a dual?

I'd call it a quad; You just have to keep in mind that it won't scale as well as a "normal" quad core if using at least 3 threads.

Originally Posted by xdan

Final performance it how it is, bad.

Final performance is unknown. I don't think the current leaks are true representations of final performance; but what their true performance is like I don't know.

If single threaded performance is fast enough, then even with lower >=3 thread scaling of the FX-4xxx it could still be better than Deneb. The difference would just be lower with >=3 threads than two or one.

Originally Posted by informal

Summary of FX4110 @ 4.2Ghz vs Phenom II X4 980 @ 3.7Ghz:
super pi :meh I don't want to bother finding results in this "benchmark"
fritz chess: FX4110 gets 7332pts, X4 980 gets 9067pts. FX is 24% slower at stock(vs stock) and 40% slower per "core" at the same clock. Multithreaded benchmark.
3d mark vantage CPU test : FX4110 gets 10660pts, X4 980 gets 12780pts. FX is 20% slower at stock and 36% slower per "core" at the same clock.Multithreaded benchmark.
wprime 32m: FX4110 gets 17.9s, X4 980 gets 11.45s. FX is 56% slower at stock and 77% slower per "core" at the same clock.Multithreaded benchmark.
c11.5 : FX4110 gets 3.42pts, X4 980 gets 4.34pts. FX is 27% slower at stock and 44% slower per "core" at the same clock.Multithreaded benchmark.

I have skipped over 7zip since I can't find comparable benchmarks . Aida cache and memory shows somewhat better memory read/write and L2/L3 cache BW for reads. The rest of cache performance is on par or slower than Deneb.

Conclusion: overall FX4110 is 32% slower than Deneb X4 @ 3.7Ghz stock vs stock and 49% slower when both are at 4.2GHz. Either all these tests are failure of the platform bugs (or something else) or Bulldozer is much slower than Deneb with the same "thread" count. All above tests utilize the "world's first 256bit FPU" and it fails hard versus "old 128bit" Deneb FPU,even in single thread mode... Imagine the OC you have to reach to just match Deneb,it has to be sky high (think 5.5-6Ghz on air to match 4Ghz Deneb). How is AMD going to charge 140$ for this chip is beyond me.

That's why I think this is not final performance. Because why would AMD release something that's worse than what they had before? They'd have to be bat sheet insane to do that.

**Leeghoofd** · 10-08-2011, 11:55 AM

Wait till the 12th lads, don't overcalculate, it already seems some things are crystal clear (cpc) only one way to get faster then older family CPUs and that is to up Mhz....

FYI all rams I have in my possession react in a similar way as with Llano : less volts required and just reach higher rock stable speeds...

Thread: AMD Zambezi news, info, fans !

Thread Tools

Search Thread

Rate This Thread

Display

AMD's PR needs to rethink it's marketing

Bookmarks

Bookmarks

Posting Permissions