_________________________________________________
............................ImAcOmPuTeRsPoNgE............................
[SIGPIC][/SIGPIC]
MY HEATWARE 76-0-0
I actually used 3.8Ghz in above post.So Turbo for 8150 is figured in.
Also I was under impression that when you have 8 FP heavy threads,like in the Multimedia benchmark from sisoft or Cinebench,there won't be any turbo engaging and chip will run at default (3.6Ghz).
In any case,the Opteron SSE/AVX results completely disprove the FX8120 score of 5.24pts in C11.5. It doesn't make sense that in one FP heavy benchmark Zambezi kicks Thuban's ass (like in sisoft one,where 8150 @ 3.6Ghz is being 67% faster than 1100T) while in other it is practically slower than same chip or barely faster (6pts for 8150 according to xsecret and Chinese leaks vs 5.91pts for 1100T).
_________________________________________________
............................ImAcOmPuTeRsPoNgE............................
[SIGPIC][/SIGPIC]
MY HEATWARE 76-0-0
Why do you think that AVX is so much powerful than SSE? Thuban Core and BD module can execute same number of raw FLOPS. AVX and SSE are vectorised packed FP instructions. BD module can execute one 256-bit AVX which contain 4DP FP operations, same as two 128-bit AVX or SSE. In some cases 256 AVX can be faster, but how much? Two times...
CB scales perfectly with frequency. 3.6/3.1*5.24 = 6.08. Something is wrong here with this results or frequency of CPU's isn't accurate. Actually I think that is much lower than CPUz's readings.
Last edited by drfedja; 09-11-2011 at 03:22 PM.
"That which does not kill you only makes you stronger." ---Friedrich Nietzsche
PCAXE
@ rajada
yes.
13% increase in performance over 12.5% increase in base clock speed. Factor complex turbo in and it seems logical to me.
Smile
I don't know if you have followed bulldozer trheads but actually bulldozer has teh same throughput in all 3 modes: legacy SSE,AVX 128bit and AVX 256bit. This is because the way AMD designed their FPU(or FlexFP as they call it). You have 8 of these FMACs in 8 core chip. All of them are 128bit wide. 128bit AVX usually carries very little to no performance benefit over standard SSE(think 5-10%). This is even seen in Zambezi leaked Sisoft numbers:
Attachment 119979
As you can see 11% faster in 256bit AVX mode than in legacy SSE (128bit) mode.
With bulldozer,when you go to 256bit AVX you may even incur a small penalty ,but this is not the norm(compiler patches state up to 3% penalty and AMD encourages devs to use AVX 128 instead the 256bit one).
So point is: AVX(both 128 and 256bit) brings nothing or close to nothing since Bulldozer has same peak flops in all 3 modes I listed.
The only difference is FMA recompiled software which can bring additional 2x performance over AVX 128.At least this is what AMD listed in their HPC documents from last year. I can't find the pdf but I can link to a recent presentation which included a slide on FlexFP.A picture is worth a thousand words:
Attachment 119978
As you can see,same peak flops in all 3 cases. I rest my case.
BTW the leak that I linked above showed that Zambezi @ 2.8Ghz had 132mpix/s for SSE score and 147 for AVX.I already showed that Opterons score better than this(10% higher than Zambezi). There is no Turbo in heavy FP/SIMD mode mind you. If you use 132 score as base and not 147 (AVX one),you get for 3.6Ghz : 132x3.6/2.8=170mpix/s vs 115 for 1100T. That is 48% better and based on Zambezi leak(not Opteron's score). 1.48x 5.91pts (Thuban score) =8.74pts. This is still miles ahead of what you claim and Chinese show. Again,remember that these numbers are based on SSE score I linked above (so legacy SSE code that Cinebech uses too).
I agree, but we don't know how SiSoft works with FMA and XOP turned on and off. We will know when we get BD on the bench table.
Yes, but what is the module count ? For 64 DP FLOPS you must have 8 SB cores and 16 FlexFP's. That slide is BS, because there is no CPU with 32 BD cores, or 16 BD modules. Interlagos has 8 BD modules or 8 FlexFP's which can execute up to 32 DP FLOPS, or 64 SP FLOPS.
If you compare 8 core Xeon and 16 core Interlagos that slide make sense.
No, there is 16% increase in clock speed and 13% increase in performance. Gap is too big between increase of frequency and performance or scaling is too bad.
Last edited by drfedja; 09-11-2011 at 03:40 PM.
"That which does not kill you only makes you stronger." ---Friedrich Nietzsche
PCAXE
@drfedja
We already have Sisoft numbers for SSE and AVX/FMA. Sisoft uses AVX and doesn't use FMA since the speedup with AVX 256 versus SSE 128 is 11% (147/132.3).
"That which does not kill you only makes you stronger." ---Friedrich Nietzsche
PCAXE
Well they are correct in a sense that they show us what code path Zambezi runs(AVX and not FMA). Also they kinda align with both opteron 6200 series sisoft results. 2P 6282SE gets 585 @ 2.5Ghz which equates (with perfect scaling of 4x) to 147 or 164mpix/s @ 2.8Ghz (11% higher than 8C Zambezi @ 2.8Ghz). 2P 6220 @ 3Ghz gets 315mpix/s ;with perfect scaling => 315/2=157.5mpix/s or @ 2.8Ghz 147mpix/s (exactly the same as that Zambezi @ 2.8Ghz). So we can say now that results of the Zambezi sample are true for SIMD and kinda off for integer test.
It can't "clearly be" something and "roughly", all at the same time. Nor can it also "say" when no where are there words stating it. :P I get that you've taken the 1.5x bar and lined it up to get 1.78x, but it's a marketing slide. It's meant to look good, not be mathematically accurate lol
I know I'm sounding like a total dbag, which I apologize for, but I'm just trying to point out all the work you're doing for something that wasn't meant to be taken so literally (by dissecting and comparing) :\ I know where you're coming from though, with doing what you're doing being mentally stimulating, as I get that way with stuff.
Last edited by Formula350; 09-11-2011 at 03:58 PM. Reason: Typo
Well yes it is a marketing slide(doh) but the bars are not drawn just for fun. There is clearly a ratio. 1.5x is for the last test. You can see the color for the individual test. Media benchmark (PCmark) shows the least advantage of the 3 and AMD didn't write "up to 15% in PC Mark TV and movies" for obvious reasons. Also note that it says " performance estimates and subject to change. This means they had no idea what clock speeds they will be hitting with retail chips when the time comes. Maybe they expected 4Ghz stock and now we have "only" 3.6Ghz.
But still my point stands. We had these performance projections from December last year. Rendering showed the greatest improvement. Now it(Zambezi) shows lower performance with the latest ES floating around.
PS You don't sound like a dbag at all. You just need to read up more. I said 1.78 since nobody knows exactly how long that bar is.It's longer than 1.7x and shorter than 2x. The last one is the only one listed with solid number,even though everything was a projection back in that time.
Last edited by informal; 09-11-2011 at 03:59 PM.
All i see is crippled chips. Who knows, integrated chip to enable FX performance on a given day?
If a FX-8120 scores less than a 1090T, then what would be the point of the new chip?
Just release an ironed Phenom II and call it Phenom III or Phenom FX.
less latency/more L3 (8-10MB)
1MB L2 per core
DDR3 1866/2133 controller
add SSE4.2 / AVX / FMA / etc
Magically --> 20-25% IPC with more or less the same arq, a monter gaming/mt machine.
10points CB11.5 on a Phenom 8 core
Phenom III X4 3Ghz $149 ~ Phenom II X4 980 3.7Ghz
That should give SB a run for it's money.
Really, what would be the point?
Athlon II X4 620 2.6Ghz @1.1125v | Foxconn A7DA-S (790GX) | 2x2GB OCZ Platinum DDR2 1066
| Gigabyte HD4770 | Seagate 7200.12 3x1TB | Samsung F4 HD204UI 2x2TB | LG H10N | OCZ StealthXStream 500w| Coolermaster Hyper 212+ | Compaq MV740 17"
Stock HSF: 18°C idle / 37°C load (15°C ambient)
Hyper 212+: 16°C idle / 29°C load (15°C ambient)
Why AMD Radeon rumors/leaks "are not always accurate"
Reality check
I know this is probably old news to most. But wanted to show my findings just to verify any speculations:
Can be seen here on the AMD giveaway contest rules!4. Entry Period: The Contest begins July 21, 2011 at 12:01am Eastern Time (“EDT”) and ends October 12, 2011 at 11:59 pm EDT (the “Entry Period”). Entries that are submitted before or after the Entry Period will be disqualified. Sponsor’s computer will be the official timekeeping device for the Contest.
Yeah we discussed that few days ago. They changed the date from Sept. 9 to October 12. This is in line with Q4 launch or as it was rumored : early October.
Has amd stated whats coming first? Server or Desktop? Opteron's 6200 is scheduled to arrive on 10-11-11 on BLT so we should assume the desktop chips a week or so later?
http://www.shopblt.com/cgi-bin/shop/...er_id=!ORDERID!
_________________________________________________
............................ImAcOmPuTeRsPoNgE............................
[SIGPIC][/SIGPIC]
MY HEATWARE 76-0-0
The Sandia Processor Arithmetic Benchmark is not a pure integer benchmark, but a aggregate score of the pure integer Dhrystone benchmark and the floating point focused Whetstone benchmark.
As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"
It generally makes a lot of sense now that AMD delayed desktop and pulled in server chips. Because desktops depend heavily on IPC and single threaded workload, and if BD is very weak at both they need to tweak for maximum clocks they can to offset this. But for servers it is not as big of a problem so it became the new priority.
Had BD been a spectacular product it would be in our computers already. I doubt any delays were due to bugs, but rather due to attempting to get clock shigher to make up for poor ipc.
Gigabyte Z77X-UD5H
G-Skill Ripjaws X 16Gb - 2133Mhz
Thermalright Ultra-120 eXtreme
i7 2600k @ 4.4Ghz
Sapphire 7970 OC 1.2Ghz
Mushkin Chronos Deluxe 128Gb
I thought servers were more important than desktop? It makes perfect sense that they would get the product to a place that would, more than likely, produce the most revenue.
I'm not sure about bugs or higher clocks being the issue, I think GF didn't produce a enough quantities; hell, from what I understand, the demand for LLano has been overwhelming.
When you're not able to increase the IPC on your current ľarch, you must use faster clocks to increase the performance. In order to use faster clocks, you need an high throughput engine and remove all bottlenecks in your frontend. Sometimes you need to do some horrible things to achieve this like putting your L1 in Write-Through while trying to amaze ppls with "ultra high bandwidth" FP/SMD units... even if you're not able to feed them correctly with your decode/dispatch unit in all cases. Finally, you'll get a decent CPU, but only at very high frequency and with a LOT of power to dissipate. Worst of all : when your process is not able to give you high yields, you must launch it at low freq.
Say hello to Netburst....
...and Bulldozer ?
Last edited by xsecret; 09-11-2011 at 08:07 PM.
Doc_TB @ CanardPC.Com (FR)
Bookmarks