Page 7 of 181 FirstFirst ... 456789101757107 ... LastLast
Results 151 to 175 of 4519

Thread: AMD Zambezi news, info, fans !

  1. #151
    Xtreme Member
    Join Date
    Aug 2009
    Posts
    431
    Quote Originally Posted by Alex-Ro View Post
    Well i guess this is the big challenge
    Since when was being in the x86 CPU biz and having Intel as the main competitor not been a challenge?

    if AMD fails to deliver some very good performance on the scene,then the end is not so far away...
    I think people have been saying that about AMD since they started out but they are still here! As long as AMD keeps delivering bang for the buck I don't see any end in sight for them.

    I srsly hope they will make a good generation...it's about time
    AMD has had many great processors over their lifetime and at fair prices which I expect BD to be, it doesn't need to be the fastest and I doubt it will be.

  2. #152
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    AMD can't vanish because this would mean Intel's monopoly... Even if they dragged their feet (and they don't), they would still be around.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  3. #153
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    if will be performance 3.5 GHz stock Zmabezi as 4 GHz SB 2600K, it will be great!
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  4. #154
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    OK guys,now that we have almost all the details of the design with rumored clock speeds,IPC jumps,frontend penalties,Turbo clock information, I'm pretty sure the Donanimhaber slide is genuine . I have no idea if it was AMD who actually made the slide,but the speedups are matching almost exactly the numbers derived by various other sources on the net(rumorpedia's,scarletwh*re,server speed up slides for MT workloads).
    Summary by me ,purely hypothetical of course (for a hypothetical 3.5Ghz Zambezi with 4.2Ghz integer turbo mode : ) ):

    Single thread pure integer desktop workloads :
    -around 12 to 15% IPC improvement -too complicated to explain how I got to this particular range ,
    -around 13% effective clock improvement-4.2Ghz BD Turbo Vs 3.7Ghz Thuban Turbo,
    -cumulative speed up of 30% versus Thuban in this type of workload,on average(can be higher or lower ,depends on application and whether it's ALU bound or memory latency sensitive)

    Single thread fp workload(this covers in my opinion integer SEE too):
    - around 40%-50% better per clock performance than one Thuban core;this is one 128bit FMAC/IMAC Vs one Thuban FP part of the core
    -around 13% effective clock improvement-4.2Ghz BD Turbo Vs 3.7Ghz Thuban Turbo <-Turbo should kick in when single threaded workload is detected ,even if it is a floating point(power draw intensive)
    -cumulative speed up of ~60% versus Thuban in this type of workload,on average.

    Multi threaded integer workloads(SSE in MMX/IMAC pipelines):
    -50-80% faster than Thuban 1100T

    Multi threaded floating point workloads(fp ops in FMAC pipelines)
    -50-80% faster than Thuban 1100T

    All of this equates to a total of around 40-50% faster than Thuban 1100T @ 3.3Ghz. Or around 15-20% on average faster than Westmere 3.33Ghz 6C in client workloads.
    I also think that the leaked AMD Terminator slide (FX,"I wil be back") is not without merit .
    Last edited by informal; 02-25-2011 at 01:49 PM.

  5. #155
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Quote Originally Posted by informal View Post
    Single thread fp workload(this covers in my opinion integer SEE too):
    - around 40%-50% better per clock performance than one Thuban core;this is one 128bit FMAC/IMAC Vs one Thuban FP part of the core.
    This is an absolutely unreal single thread IPC improvement. I hope it's true, but I really doubt it...
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  6. #156
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by zalbard View Post
    This is an absolutely unreal single thread IPC improvement. I hope it's true, but I really doubt it...
    it needs to be true

    right now SB is the king of gaming cpus cause of how high it clocks and how high the IPC is. i want to see a 5ghz BD thats almost 50% better in low threaded apps than thuban: (5.0ghz/4.0ghz)*1.15%IPC = 1.4375x faster
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  7. #157
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    You're mentioning 15% IPC improvement. That's very different from 50% I quoted, LOL. 15-25% is not unreasonable to expect, IMO. 50% is just a crazy number.
    Last edited by zalbard; 02-25-2011 at 02:08 PM.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  8. #158
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by zalbard View Post
    This is an absolutely unreal single thread IPC improvement. I hope it's true, but I really doubt it...
    It may look that way,but Interlagos with 10% higher clock should have 82% higher fp rate score than MC.This equates to ~25% better per core performance(1.82/1.33/1.1),or 1.25x, in multithreaded workload where we have a penalty due to shared front end(which can be as high as 1.25x or 25% due to 80% of CMP design claim=> 100/80=1.25 ). Now if we account for that penalty of 25% over the previous 1.25x over MC we get : 1.25x1.25=1.56x or 56% better in single threaded fp workload. Then you have a Turbo mode kicking in,my guess was 4-4.2Ghz,which gets you another 13% or 1.13x : 1.56x1.13=1.77x or 77% faster .
    All of this is speculation on my part,but I'm almost 100% sure this is how it will end up performing.Even if i have a decent error of margin in my estimation it will still trounce Thuban and be on par with highest end Westmere.

    Or if you want an aggregate estimate of Zambezi 3.5Ghz model,all workloads included you can look at this chart:
    http://www.hardware.fr/articles/815-...dy-bridge.html

    Thuban has 164.1 points . 164.1 x 1.5x=~246pts. SB hexacore @ 3.4Ghz/3.8Ghz should have ~240-245pts and SB 8C @ 3.33/3.8Ghz should have 262pts . I derived SB 6C potential score via IPC+clock improvement of 11% over 3.33Ghz Westmere 6C,while 8C score was gotten from 10% added aggregate improvement from 33% more cores VS 6C SB score @ 3.4Ghz,with correction for 3.33Ghz Vs 3.4Ghz clock speed . 10% aggregate improvement was derived from an improvement 6C Westmere sees in client apps Vs 4C Nehalem at the same clock,based on hothardware review of Thuban on the same page for average results (using ratio equation of 50% more core : 16% improvement for Westmere = 33% more cores : x improvement ,while IPC improvement was already calculated in SB 6C 246pts score ).

    Zambezi @ 3.5Gz ,in that chart,will slide right in between 8C 3.4Ghz SB and 6C 3.4Ghz SB. Whether intel will manage to make 8C 3.4Ghz SB desktop chip is another question(I think they can).
    Summary of potential hypothetical Zambezi X8 @ 3.5Ghz :tiny bit faster than 6C SB @ 3.4Ghz and 7% slower than 8C SB @ 3.33Ghz.
    Let's see how it turns out.You have my post here .
    Last edited by informal; 02-25-2011 at 02:27 PM.

  9. #159
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Wait, what are you saying? IPC is Instruction Per Clock (Cycle). It has nothing to do with CPU Frequency whatsoever. Or did you come up with "Instruction Per Core"?
    I'll read up your post again, give me a few mins.
    Last edited by zalbard; 02-25-2011 at 02:24 PM.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  10. #160
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Read it again,both posts.I tried to figure in IPC,clock,penalty that strikes MT workload and the improvement when there is no penalty(single thread workload).
    IPC is instructions per cycle,or core logic level improvement vs older core.
    Last edited by informal; 02-25-2011 at 02:32 PM.

  11. #161
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    OK. In regards to my comment:
    Quote Originally Posted by zalbard View Post
    Wait, what are you saying? IPC is Instruction Per Clock (Cycle). It has nothing to do with CPU Frequency whatsoever. Or did you come up with "Instruction Per Core"?
    I posted this after you mentioning "~25% better per core performance" (when it should be per clock, per core would be 10% higher) and pulling Turbo mode in.
    Your logic is sound, I can give you that.
    A few issues, though. The biggest one is that you're taking the numbers from the slide with some relative rating which will not necessarily be absolutely true. The second one is that you're going for the maximum possible penalty, and that might not be a very good indication (especially when it comes to a marketing slide and we have no idea what benchmark was used).
    Let's hope you're correct, though!

    Regarding the IPC improvement and why I doubt it's going to be actually true...
    Single threaded Cinebench is a good indication of FP performance, right?



    If your 56% calculation is correct then BD @ 3.2GHz would score 3951*1.56=6164, which is [(6164*31/32)-5405]/5405 = 10% faster than SB clock per clock (single threaded FP workload). It wouldn't be just on par with Westmere, it would blow away SB by 10% (which is more than the difference between two generations of Core i7). Which is pretty crazy if you ask me.
    Last edited by zalbard; 02-25-2011 at 03:01 PM.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  12. #162
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Thanks for the feedback . I agree that I used the worst penalty possible ,which in turn gets best ST performance. This might be very wrong in the end . As for the slide,it is representing estimated performance with unknown compiler settings.This may skew results somewhat but not by much IMO. We already know that the whole FP unit(2xFMAC) should be more powerful,by a good delta, than 2x K10 FP units. Add in 33% more cores and higher clock of 10% or so percent for server parts(without Turbo since fp workload should not permit the clock jump) and you end up at very big jump(~80%) in aggregate score for a whole 16C Interlagos chip Vs 12C MC chip. This aligns pretty good with Donanimhaber's slide which shows Zambezi X8 at unknown clock scoring almost 88% higher than Thuban 3.3Ghz in CB11.5 benchmark * .3.5Ghz for top end is my estimate ,judging by known info from ISCC. So we have a massive uplift in fp score for 8 core Zambezi in commercial workload that does not see AVX or FMA extensions at all . Just pure SSE/SIMD.
    To conclude,I think the numbers will fall in this ballpark.I may be off by a huge mark,who knows.But everything I have seen so far points into the direction of 50% better "aggregate" score ,which takes into account all of the possible scenarios of workloads(the worst and the best).

    * It seems like server parts get the same core count advantage and similar clock advantage over MC as Zambezi will get over Thuban.33% more cores and ~6-10% more clock,not counting Turbo.
    Last edited by informal; 02-25-2011 at 03:00 PM.

  13. #163
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by zalbard View Post

    Regarding the IPC improvement and why I doubt it's going to be actually true...
    Single threaded Cinebench is a good indication of FP performance, right?



    If your 56% calculation is correct then BD @ 3.2GHz would score 3951*1.56=6164, which is [(6164*31/32)-5405]/5405=10% faster than SB clock per clock. It wouldn't be just on par with Westmere, it would blow away SB by 10% (which is more than the difference between two generations of Core i7). Which is pretty crazy if you ask me.

    Yeah it may look crazy high,put the thing you miss out is that one BD core will have access to full 2xFMAC unit,each of which is 128b wide,each of which can do either add or mul -meaning consecutive adds or muls ,a feature no x86 core today can do today. If the single trheaded app is add or mull limited,then you have 2x the resources of a SB/Westmere/Thuban core in theory. Since Westmere/SB perform better than Thuban per clock,the difference comes either from ISA extensions support or better scheduling. Bulldozer should ,as I mentioned,have 2x the flexibility,2x the load/store BW of Thuban core,when you run single trhead fp workload.It will ,on top of all that,have clock speed advantage. Does "56% higher than Thuban in ST fp workload" sound a lot now,from this perspective ?

  14. #164
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    56% single threaded IPC (FP or not) improvement looks scary in any case. This is a huge jump, especially for single threaded performance (which is very difficult to scale).
    The improvements you mentioned above will no doubt help single threaded performance. To what degree, I don't know.
    I think they're more targeted at multi-threaded workloads, though, being some sort of "hardware HT" implementation.
    I really don't want to speculate. I'll just wait for some benchmarks.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  15. #165
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    I somewhat agree,but it's sooo fun to speculate . Like I said,BD's improvements might look impossible from today's perspective,but the core is a radical departure from all what we have known thus far. It's like Sun's Niagara on crack,featuring OoO cores instead of in-order ones and featuring super powerful FP unit(with a ratio of 1:2 instead of 1:8 in Niagara's case).Add in very aggressive prefetch and you have an 8 core design that can kind of "morph" according to workload . It uses many different techniques : shared front end to fill in bubbles (SMT's advantage!), full cores instead of sharing cores(adding more performance over traditional SMT approach),huge L2 benefiting both hardware threads or even single thread, "fat" 256b FMAC FP unit that can "morph" into one or two units according to workload(ST or MT via SMT!),aggressive power gating features and flip-flop design which in turn net much higher/aggressive clock throttling and clock uplift(according to workload),overall improved integer cores that have unified scheduler for mem/ALU ops instead of separate schedulers and shared pipelines,complete ISA support etc.

    All in all,the design is a radical departure from anything we have seen thus far. I personally think it's a winner. Whether it is or not,we have to wait a few more months I guess .
    Last edited by informal; 02-25-2011 at 03:56 PM.

  16. #166
    Xtreme Addict
    Join Date
    Jan 2008
    Location
    milwaukee
    Posts
    1,683
    thanks for the breakdown informal, great info
    LEO!!!!
    amd phenom II x6 1100T | gigabyte 990fxa-ud3 . .
    2x2gb g.skill 2133c8 | 128gb g.skill falcon ssd
    sapphire ati 5850 | x-fi xtrememusic. . .
    samsung f4 2tb | samsung dvdrw . .
    corsair tx850w | windows 7 64-bit.
    ddc3.25 xspc restop | ek ltx | mc-tdx | BIP . .
    lycosa-g9-z2300 | 26" 1920x1200 lcd .

  17. #167
    Xtreme Mentor
    Join Date
    Jun 2008
    Location
    France - Bx
    Posts
    2,601
    Quote Originally Posted by informal View Post
    All in all,the design is a radical departure from anything we have seen thus far. I personally think it's a winner. Whether it is or not,we have to wait a few more months I guess .
    I truly believe 10h will spank Conroe/Penryn across the board in 1P configs (including int besides the fp domination) and all that while consuming less energy and working at lower clocks.
    http://www.xtremesystems.org/forums/...&postcount=210


  18. #168
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Nice to see you follow me Olivon . Actually,Barcelona was a fail only on clock domain.On server workloads it did better,even on 1P,than FSB based Conroe MCM chips.INtel did crank up the clocks on Penryn and in the end made up the difference. If you look at Shanghai vs Penryn,both on 45nm,there is no contest actually. Shanghai is a clear cut winner in int and fp MT workloads.

    Opteron 2389( 2.9Ghz,2P,45nm)
    spec int rate 141
    spec fp rate 121

    Intel Xeon X5450( 3.00 GHz,2P,45nm)
    spec int rate 130
    spec fp rate 74.1

    2.9Ghz Shanghai is 8.4% faster than 3Ghz Penryn in spec int rate 2006 (12% faster per clock)
    2.9Ghz Shanghai is 63% faster than 3Ghz Penryn in spec fp rate 2006 (68% faster per clock)

    I rest my case
    Last edited by informal; 02-25-2011 at 05:06 PM.

  19. #169
    Xtreme Member
    Join Date
    Jul 2010
    Location
    Birmingham, United Kingdom
    Posts
    442
    Interesting read.

    To all the nay sayers look at what AMD did to intel back in the P4 days.

    Now i know that it was intels decision to go down the netburst path but i can see AMD doing the same things again.

  20. #170
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    In case Olivon prefers 1P scores i can provide those too :

    Opteron 2389( 2.9Ghz,1P,45nm)
    spec int rate 72.5
    spec fp rate 60.4
    Intel Xeon X5450( 3.00 GHz,1P,45nm)
    spec int rate 70.2
    spec fp rate 41.2

    2.9Ghz Shanghai is 3.2% faster than 3Ghz Penryn in spec int rate 2006 (6.8% faster per clock)
    2.9Ghz Shanghai is 63% faster than 3Ghz Penryn in spec fp rate 2006 (51% faster per clock)

  21. #171
    Xtreme Mentor
    Join Date
    Jun 2008
    Location
    France - Bx
    Posts
    2,601
    Quote Originally Posted by informal View Post
    Nice to see you follow me Olivon . Actually,Barcelona was a fail only on clock domain.On server workloads it did better,even on 1P,than FSB based Conroe MCM chips.INtel did crank up the clocks on Penryn and in the end made up the difference. If you look at Shanghai vs Penryn,both on 45nm,there is no contest actually. Shanghai is a clear cut winner in int and fp MT workloads.

    Opteron 2389( 2.9Ghz,2P,45nm)
    spec int rate 141
    spec fp rate 121

    Intel Xeon X5450( 3.00 GHz,2P,45nm)
    spec int rate 130
    spec fp rate 74.1

    2.9Ghz Shanghai is 8.4% faster than 3Ghz Penryn in spec int rate 2006 (12% faster per clock)
    2.9Ghz Shanghai is 63% faster than 3Ghz Penryn in spec fp rate 2006 (68% faster per clock)

    I rest my case
    Intel Xeon X5450 release date : Q4'07
    Opteron 2389 release date : Feb 23, 2009
    Intel Xeon X5550 (Gainestown) release date : Q1'09, Mar

    AMD do a great job in the server part but Intel seems really too fast in execution.
    And don't misleading, I really hope AMD will deliver too

    That was a just a reminiscence from the past

  22. #172
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    I said 10h Vs Conroe/Penryn and I provided the results. I didn't say 10h VS Nehalem,did I ?

  23. #173
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    Well, Been playing around with pictures and numbers

    now i'm out of

    Hopefully to scale! (Bulldozer is interesting to scale given the quoted size.. I don't think it is supposed to include the wasted space (and small amt of logic) around the L2 Cache, which what i've run with (not doing so leaves large discrepency in L2 cache density when you look at it next to Llano...


  24. #174
    Xtreme Mentor
    Join Date
    Jun 2008
    Location
    France - Bx
    Posts
    2,601
    Quote Originally Posted by informal View Post
    I said 10h Vs Conroe/Penryn and I provided the results. I didn't say 10h VS Nehalem,did I ?
    Nope, but the batlle was over by time limit

    Nice mAJORD, thanks !

  25. #175
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Thanks for the picture mAJORD!

Page 7 of 181 FirstFirst ... 456789101757107 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •