AMD Zambezi news, info, fans !

**radaja** · 09-11-2011, 02:58 PM

Originally Posted by informal

Well if Opteron 6220 results are true ,as xsecret claims,then C11.5 result should roughly follow the same route as the Sisoft's MM benchmark.I say roughly since I have no idea what is the ratio of memory and SIMD instructions in these tests. If it does have similar ratio,then instead of 5.24pts one should have 1.67x the result of 1100T if he would to run C11.5 on FX8150 @ 3.6Ghz.Or in numbers : 9.85pts.
9.85pts is dangerously reminiscent of this (early slide detailing Scorpius platform and Zambezi advantage over Thuban in 3 benchmarks;slide was from Dec 2010 and was pointing roughly at 10pts in C11.5 for 8C Zambezi @ unknown clock).

9.85 seem a little high doesnt it?
i would expect around 7.50 to 8.00 tops(best case scenario)
and a more realistic score of 6.50 to 7.00?
then again i dont know this stuff too well

**informal** · 09-11-2011, 03:01 PM

Originally Posted by xsecret

You should be aware that Zambezi Turbo mode is not that simple. There is 2 "main" levels of Turbo. For example, a 3.6 GHz base CPU can reach 3.9 GHz with all cores used as long as TDP remain under a specified value AND can reach 4.2 GHz in single core mode. So, depending on the usage on a 8-threads application, you can be at 3.6 GHz or 3.9 GHz.

I actually used 3.8Ghz in above post.So Turbo for 8150 is figured in.
Also I was under impression that when you have 8 FP heavy threads,like in the Multimedia benchmark from sisoft or Cinebench,there won't be any turbo engaging and chip will run at default (3.6Ghz).
In any case,the Opteron SSE/AVX results completely disprove the FX8120 score of 5.24pts in C11.5. It doesn't make sense that in one FP heavy benchmark Zambezi kicks Thuban's ass (like in sisoft one,where 8150 @ 3.6Ghz is being 67% faster than 1100T) while in other it is practically slower than same chip or barely faster (6pts for 8150 according to xsecret and Chinese leaks vs 5.91pts for 1100T).

**xsecret** · 09-11-2011, 03:07 PM

Originally Posted by informal

I actually used 3.8Ghz in above post.So Turbo for 8150 is figured in.
Also I was under impression that when you have 8 FP heavy threads,like in the Multimedia benchmark from sisoft or Cinebench,there won't be any turbo engaging and chip will run at default (3.6Ghz).
In any case,the Opteron SSE/AVX results completely disprove the FX8120 score of 5.24pts in C11.5. It doesn't make sense that in one FP heavy benchmark Zambezi kicks Thuban's ass (like in sisoft one,where 8150 @ 3.6Ghz is being 67% faster than 1100T) while in other it is practically slower than same chip or barely faster (6pts for 8150 according to xsecret and Chinese leaks vs 5.91pts for 1100T).

If Sandra uses AVX in the MM Benchmark, that's not strange. Thuban doesn't have AVX and all benchmarks using it are 10 times slower.

**radaja** · 09-11-2011, 03:12 PM

So going from 3.1Ghz to 3.6Ghz do these 2 CB scores seem to be inline with reality?

**drfedja** · 09-11-2011, 03:14 PM

Originally Posted by xsecret

If Sandra uses AVX in the MM Benchmark, that's not strange. Thuban doesn't have AVX and all benchmarks using it are 10 times slower.

Why do you think that AVX is so much powerful than SSE? Thuban Core and BD module can execute same number of raw FLOPS. AVX and SSE are vectorised packed FP instructions. BD module can execute one 256-bit AVX which contain 4DP FP operations, same as two 128-bit AVX or SSE. In some cases 256 AVX can be faster, but how much? Two times...

Originally Posted by radaja

So going from 3.1Ghz to 3.6Ghz do these 2 CB scores seem to be inline with reality?

CB scales perfectly with frequency. 3.6/3.1*5.24 = 6.08. Something is wrong here with this results or frequency of CPU's isn't accurate. Actually I think that is much lower than CPUz's readings.

**BeepBeep2** · 09-11-2011, 03:21 PM

@ rajada
yes.

13% increase in performance over 12.5% increase in base clock speed. Factor complex turbo in and it seems logical to me.

**Opteron146** · 09-11-2011, 03:24 PM

Originally Posted by drfedja

Why do you think that AVX is so much powerful than SSE? Thuban Core and BD module can execute same number of raw FLOPS. AVX and SSE are vectorised packed FP instructions. BD module can execute one 256-bit AVX which contain 4DP FP operations, same as two 128-bit AVX or SSE. In some cases 256 AVX can be faster, but how much? Two times...

Yes, AVX would do nothing, but FMA could be the big difference. SiSoft normally always programs special code for each CPU, thus on Bd, it should use XOP&FMA.

**informal** · 09-11-2011, 03:29 PM

Originally Posted by xsecret

If Sandra uses AVX in the MM Benchmark, that's not strange. Thuban doesn't have AVX and all benchmarks using it are 10 times slower.

I don't know if you have followed bulldozer trheads but actually bulldozer has teh same throughput in all 3 modes: legacy SSE,AVX 128bit and AVX 256bit. This is because the way AMD designed their FPU(or FlexFP as they call it). You have 8 of these FMACs in 8 core chip. All of them are 128bit wide. 128bit AVX usually carries very little to no performance benefit over standard SSE(think 5-10%). This is even seen in Zambezi leaked Sisoft numbers:
Attachment 119979
As you can see 11% faster in 256bit AVX mode than in legacy SSE (128bit) mode.
With bulldozer,when you go to 256bit AVX you may even incur a small penalty ,but this is not the norm(compiler patches state up to 3% penalty and AMD encourages devs to use AVX 128 instead the 256bit one).
So point is: AVX(both 128 and 256bit) brings nothing or close to nothing since Bulldozer has same peak flops in all 3 modes I listed.
The only difference is FMA recompiled software which can bring additional 2x performance over AVX 128.At least this is what AMD listed in their HPC documents from last year. I can't find the pdf but I can link to a recent presentation which included a slide on FlexFP.A picture is worth a thousand words

:
Attachment 119978
As you can see,same peak flops in all 3 cases. I rest my case

.

BTW the leak that I linked above showed that Zambezi @ 2.8Ghz had 132mpix/s for SSE score and 147 for AVX.I already showed that Opterons score better than this(10% higher than Zambezi). There is no Turbo in heavy FP/SIMD mode mind you. If you use 132 score as base and not 147 (AVX one),you get for 3.6Ghz : 132x3.6/2.8=170mpix/s vs 115 for 1100T. That is 48% better and based on Zambezi leak(not Opteron's score). 1.48x 5.91pts (Thuban score) =8.74pts. This is still miles ahead of what you claim and Chinese show. Again,remember that these numbers are based on SSE score I linked above (so legacy SSE code that Cinebech uses too).

**drfedja** · 09-11-2011, 03:30 PM

Originally Posted by Opteron146

Yes, AVX would do nothing, but FMA could be the big difference. SiSoft normally always programs special code for each CPU, thus on Bd, it should use XOP&FMA.

I agree, but we don't know how SiSoft works with FMA and XOP turned on and off. We will know when we get BD on the bench table.

Originally Posted by informal

As you can see,same peak flops in all 3 cases. I rest my case

.

Yes, but what is the module count ? For 64 DP FLOPS you must have 8 SB cores and 16 FlexFP's. That slide is BS, because there is no CPU with 32 BD cores, or 16 BD modules. Interlagos has 8 BD modules or 8 FlexFP's which can execute up to 32 DP FLOPS, or 64 SP FLOPS.
If you compare 8 core Xeon and 16 core Interlagos that slide make sense.

Originally Posted by BeepBeep2

@ rajada
yes.

13% increase in performance over 12.5% increase in base clock speed. Factor complex turbo in and it seems logical to me.

No, there is 16% increase in clock speed and 13% increase in performance. Gap is too big between increase of frequency and performance or scaling is too bad.

**informal** · 09-11-2011, 03:36 PM

@drfedja
We already have Sisoft numbers for SSE and AVX/FMA. Sisoft uses AVX and doesn't use FMA since the speedup with AVX 256 versus SSE 128 is 11% (147/132.3).

**drfedja** · 09-11-2011, 03:41 PM

Originally Posted by informal

@drfedja
We already have Sisoft numbers for SSE and AVX/FMA. Sisoft uses AVX and doesn't use FMA since the speedup with AVX 256 versus SSE 128 is 11% (147/132.3).

Yes, if that numbers are correct.

**informal** · 09-11-2011, 03:47 PM

Well they are correct in a sense that they show us what code path Zambezi runs(AVX and not FMA). Also they kinda align with both opteron 6200 series sisoft results. 2P 6282SE gets 585 @ 2.5Ghz which equates (with perfect scaling of 4x) to 147 or 164mpix/s @ 2.8Ghz (11% higher than 8C Zambezi @ 2.8Ghz). 2P 6220 @ 3Ghz gets 315mpix/s ;with perfect scaling => 315/2=157.5mpix/s or @ 2.8Ghz 147mpix/s (exactly the same as that Zambezi @ 2.8Ghz). So we can say now that results of the Zambezi sample are true for SIMD and kinda off for integer test.

**Formula350** · 09-11-2011, 03:51 PM

Originally Posted by informal

No.The link says 1.78x higher than Thuban in render. Check again.It's the middle bar (yellow) one. It represents Render performance . You can see it being roughly 1.78x higher than 1100T's.

It can't "clearly be" something and "roughly", all at the same time. Nor can it also "say" when no where are there words stating it. :P I get that you've taken the 1.5x bar and lined it up to get 1.78x, but it's a marketing slide. It's meant to look good, not be mathematically accurate lol

I know I'm sounding like a total dbag, which I apologize for, but I'm just trying to point out all the work you're doing for something that wasn't meant to be taken so literally (by dissecting and comparing) :\ I know where you're coming from though, with doing what you're doing being mentally stimulating, as I get that way with stuff.

**informal** · 09-11-2011, 03:57 PM

Well yes it is a marketing slide(doh) but the bars are not drawn just for fun. There is clearly a ratio. 1.5x is for the last test. You can see the color for the individual test. Media benchmark (PCmark) shows the least advantage of the 3 and AMD didn't write "up to 15% in PC Mark TV and movies" for obvious reasons. Also note that it says " performance estimates and subject to change. This means they had no idea what clock speeds they will be hitting with retail chips when the time comes. Maybe they expected 4Ghz stock and now we have "only" 3.6Ghz.
But still my point stands. We had these performance projections from December last year. Rendering showed the greatest improvement. Now it(Zambezi) shows lower performance with the latest ES floating around.

PS You don't sound like a dbag at all. You just need to read up more

. I said 1.78 since nobody knows exactly how long that bar is.It's longer than 1.7x and shorter than 2x. The last one is the only one listed with solid number,even though everything was a projection back in that time.

**Nintendork** · 09-11-2011, 04:20 PM

All i see is crippled chips. Who knows, integrated chip to enable FX performance on a given day?

If a FX-8120 scores less than a 1090T, then what would be the point of the new chip?

Just release an ironed Phenom II and call it Phenom III or Phenom FX.

less latency/more L3 (8-10MB)
1MB L2 per core
DDR3 1866/2133 controller
add SSE4.2 / AVX / FMA / etc
Magically --> 20-25% IPC with more or less the same arq, a monter gaming/mt machine.

10points CB11.5 on a Phenom 8 core
Phenom III X4 3Ghz $149 ~ Phenom II X4 980 3.7Ghz

That should give SB a run for it's money.

Really, what would be the point?

**m411b** · 09-11-2011, 04:31 PM

I know this is probably old news to most. But wanted to show my findings just to verify any speculations:

4. Entry Period: The Contest begins July 21, 2011 at 12:01am Eastern Time (“EDT”) and ends October 12, 2011 at 11:59 pm EDT (the “Entry Period”). Entries that are submitted before or after the Entry Period will be disqualified. Sponsor’s computer will be the official timekeeping device for the Contest.

Can be seen here on the AMD giveaway contest rules!

**informal** · 09-11-2011, 04:46 PM

Yeah we discussed that few days ago. They changed the date from Sept. 9 to October 12. This is in line with Q4 launch or as it was rumored : early October.

**~~Pestilence~~** · 09-11-2011, 04:55 PM

Has amd stated whats coming first? Server or Desktop? Opteron's 6200 is scheduled to arrive on 10-11-11 on BLT so we should assume the desktop chips a week or so later?

http://www.shopblt.com/cgi-bin/shop/...er_id=!ORDERID!

**radaja** · 09-11-2011, 05:36 PM

Originally Posted by Pestilence

Has amd stated whats coming first? Server or Desktop? Opteron's 6200 is scheduled to arrive on 10-11-11 on BLT so we should assume the desktop chips a week or so later?

http://www.shopblt.com/cgi-bin/shop/...er_id=!ORDERID!

all we know is server is shipping for revenue and will launch Q4,so said JF

he also said this in regards to BLT on ETA's.

Originally Posted by JF-AMD

When you see those things pop up on random web sites it is typically a bad data feed from their distributor. The disti turns on SKUs they shouldn't and the reseller just takes the whole feed.

The funny thing is that you can't even be sure that the data is real. Sometimes they load with dummy data as a placeholder. I am specifically not looking at the link because I don't want to have to start answering questions about details. But I would be a bit careful on these things.

**cal_guy** · 09-11-2011, 05:46 PM

The Sandia Processor Arithmetic Benchmark is not a pure integer benchmark, but a aggregate score of the pure integer Dhrystone benchmark and the floating point focused Whetstone benchmark.

**freeloader** · 09-11-2011, 05:48 PM

Originally Posted by AKM

What if IPC will be lower in some cases and higher in others?

IPC should be higher in all cases on a new architecture. No excuses.

**Dimitriman** · 09-11-2011, 06:04 PM

Originally Posted by freeloader

IPC should be higher in all cases on a new architecture. No excuses.

It generally makes a lot of sense now that AMD delayed desktop and pulled in server chips. Because desktops depend heavily on IPC and single threaded workload, and if BD is very weak at both they need to tweak for maximum clocks they can to offset this. But for servers it is not as big of a problem so it became the new priority.

Had BD been a spectacular product it would be in our computers already. I doubt any delays were due to bugs, but rather due to attempting to get clock shigher to make up for poor ipc.

PerryR · 09-11-2011, 07:15 PM

Originally Posted by Dimitriman

Had BD been a spectacular product it would be in our computers already. I doubt any delays were due to bugs, but rather due to attempting to get clock shigher to make up for poor ipc.

I thought servers were more important than desktop? It makes perfect sense that they would get the product to a place that would, more than likely, produce the most revenue.

I'm not sure about bugs or higher clocks being the issue, I think GF didn't produce a enough quantities; hell, from what I understand, the demand for LLano has been overwhelming.

**xsecret** · 09-11-2011, 08:02 PM

Originally Posted by freeloader

IPC should be higher in all cases on a new architecture. No excuses.

When you're not able to increase the IPC on your current µarch, you must use faster clocks to increase the performance. In order to use faster clocks, you need an high throughput engine and remove all bottlenecks in your frontend. Sometimes you need to do some horrible things to achieve this like putting your L1 in Write-Through while trying to amaze ppls with "ultra high bandwidth" FP/SMD units... even if you're not able to feed them correctly with your decode/dispatch unit in all cases. Finally, you'll get a decent CPU, but only at very high frequency and with a LOT of power to dissipate. Worst of all : when your process is not able to give you high yields, you must launch it at low freq.

Say hello to Netburst....

...and Bulldozer ?

**Hondacity** · 09-11-2011, 08:36 PM

Originally Posted by xsecret

When you're not able to increase the IPC on your current µarch, you must use faster clocks to increase the performance. In order to use faster clocks, you need an high throughput engine and remove all bottlenecks in your frontend. Sometimes you need to do some horrible things to achieve this like putting your L1 in Write-Through while trying to amaze ppls with "ultra high bandwidth" FP/SMD units... even if you're not able to feed them correctly with your decode/dispatch unit in all cases. Finally, you'll get a decent CPU, but only at very high frequency and with a LOT of power to dissipate. Worst of all : when your process is not able to give you high yields, you must launch it at low freq.

Say hello to Netburst....

...and Bulldozer ?

best amd bd post ever

Thread: AMD Zambezi news, info, fans !

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions