Page 6 of 48 FirstFirst ... 345678916 ... LastLast
Results 126 to 150 of 1198

Thread: AMD "Piledriver" refresh of Zambezi - info, speculations, test, fans

  1. #126
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    demonkevy666 actually its much worse at least in AIda64.
    AMD Phenom II X4 980 BE vs FX4100
    L1 cache
    read 118,658MB vs 121066MB
    write 59416MB vs 21230MB
    copy 79171MB vs 42420MB

    http://pctuning.tyden.cz/ilustrace3/...0/cachemem.png
    http://pctuning.tyden.cz/ilustrace3/...4/cachemem.png

  2. #127
    Registered User
    Join Date
    Sep 2007
    Posts
    58
    A half clocked L2, if true, would explain the high L2 latencies in Aida64. Even my 1.5Ghz Llano has lower L2 latencies with it been around 4.0ns while 3.6Ghz FX gets above 5.0ns.

    Is a say 3Ghz fully clocked FX slower than a 3.6Ghz FX with half L2 clock...

  3. #128
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    If L2 was half-clocked wouldn't it mean that not just the latency is doubled but also the memory throughput is halved? What I see is that the L2 in BD has comparable memory throughput to Deneb.

  4. #129
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    Some interesting slides from domaninhaber video, my thanks goes to del42sa from pctuning.cz forum
    http://img7.rajce.idnes.cz/d0703/5/5...es/Trinity.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...ty_3D_mark.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...ty_PC_Mark.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...ty_Compute.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...al_graphic.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...al_graphic.jpg
    Honestly I don't know what to think, if you look at the IGP performance in FLOPs
    Llano 415 GFLOPs
    Trinity 715 GFLOPs
    Llano theoretical numbers
    5SIMD * 16ALU*5SP*2*600MHz=480GFLOPs
    415GFLOPs/480GFLOPs = 86.5%
    and now Trinity
    715/0.865=827GFLOPs
    That should mean 10SIMDs VLIW4 at 646Mhz.
    Thats impossible, they wouldn't be able to feed it with data, ~half of the TDP would be just for IGP and the performance in 3D mark doesn't match either. It wouldn't be stronger by just 35% but rather 1.8-2x. Even die size would be bigger compared to Llano's, maybe ~270-280mm2.

    edit: my mistake, the FLOPs should be CPU+IGP but even so It doesn't match by a long shot.

    P.S. Undone was faster
    Last edited by TESKATLIPOKA; 11-19-2011 at 02:10 AM.

  5. #130
    Xtreme Member
    Join Date
    Jan 2011
    Location
    145.21.4.???
    Posts
    319
    Quote Originally Posted by TESKATLIPOKA View Post
    Some interesting slides from domaninhaber video, my thanks goes to del42sa from pctuning.cz forum
    http://img7.rajce.idnes.cz/d0703/5/5...es/Trinity.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...ty_3D_mark.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...ty_PC_Mark.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...ty_Compute.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...al_graphic.jpg
    http://img7.rajce.idnes.cz/d0703/5/5...al_graphic.jpg
    Honestly I don't know what to think, if you look at the IGP performance in FLOPs
    Llano 415 GFLOPs
    Trinity 715 GFLOPs
    Llano theoretical numbers
    5SIMD * 16ALU*5SP*2*600MHz=480GFLOPs
    415GFLOPs/480GFLOPs = 86.5%
    and now Trinity
    715/0.865=827GFLOPs
    That should mean 10SIMDs VLIW4 at 646Mhz.
    Thats impossible, they wouldn't be able to feed it with data, ~half of the TDP would be just for IGP and the performance in 3D mark doesn't match either. It wouldn't be stronger by just 35% but rather 1.8-2x. Even die size would be bigger compared to Llano's, maybe ~270-280mm2.

    edit: my mistake, the FLOPs should be CPU+IGP but even so It doesn't match by a long shot.

    P.S. Undone was faster
    I think, these scores are plausible, unless someone tell me these result are unofficial and are given by some NDAers/testers.

  6. #131
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    undone Most of the GFLOPs should be coming from the IGP and the increase doesn't match with +35% in 3D mark. If the CTP was ~550-590 then I don't have a problem.

  7. #132
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Scores are from AMD document... Someone broke the NDA ,again.
    They are plausible,the GPU part will rock. But we all knew this already. The question is how much of an upgrade will PD be on desktop. The pcmark vantage scores look good but I have already found some reviews of A8 3850 in which it gets 7500+pts in this benchmark,so there are many unknowns such as test setup,bios/drivers etc. If retail Trinity gets propertionally better score versus 3850 in real reviews then it will be good. IMO it needs to be faster than FX4xxx series (which is not that hard of a task ).

  8. #133
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    IGPU looks very good, it will be great for most of games up to 1680x1050 with high resolution witout AA
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  9. #134
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Oh I forgot that someone asked about memory BW. Don't forget that BD has radically improved memory controller versus anything K10(30% due to pure uarchitectural improvements and rest is dram clock increase). I expect they ported some Llano stuff to it in order to service both parts of the die(like Onion and Garlic).

  10. #135
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    The scores seems about right (especially the GFLOPS values). It has been known since the launch of Llano that GFLOPS increase would be 50% and in total above 800 for the whole APU (which includes the CPU cores). I was thinking more in the lines of 8 VLIW4 SIMDs @ 700 MHz + CPU to make up for it.

    Like Informal said the memory controller will be a new one, and Trinity will support memory clocks @ 2133 MHz.
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  11. #136
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    Smartidiot89 What we see is a 72% increase in combined FLOPs and the thing is such an IGP won't give you just 35% increase in performance as shown in 3D mark but should give >+60%.
    The 35% increase should mean the IGP is just
    6 VliW4 SIMDs @700Mhz(+25%SIMD and +17% frequency) at best.

    edit: If I think about it, both systems were using just 1866Mhz memories, so if using 2133Mhz would give the same improvement as the difference in speed then it would be 15%, so in total it would mean +50% and that is acceptable, but then there is a question why use only 1866Mhz memories in the test for Trinity.
    Last edited by TESKATLIPOKA; 11-19-2011 at 05:40 AM.

  12. #137
    Registered User
    Join Date
    Sep 2007
    Posts
    58
    Quote Originally Posted by TESKATLIPOKA View Post
    If L2 was half-clocked wouldn't it mean that not just the latency is doubled but also the memory throughput is halved? What I see is that the L2 in BD has comparable memory throughput to Deneb.
    L2 read is 2x slower, L2 write is 50% slower and L2 copy is faster for BD compared to K10.

  13. #138
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    L2 is almost on par with L3 cache in BD (on par being almost as slow as)... That's what aida reports at least. If it's true (half clocking) then they can try to remedy this if/when GloFo can reach a level of process maturity to allow this change. L2 cache is very very important for client workloads.

  14. #139
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    mrcmtl doesn't look that way, only write is somewhat slower
    http://pctuning.tyden.cz/ilustrace3/...DFX/aida64.png
    http://pctuning.tyden.cz/ilustrace3/...0/cachemem.png

    http://i55.tinypic.com/jjs221.jpg
    http://www.xtremesystems.org/forums/...3&d=1318067554

    informal where is L2 on par with L3? It would be nice if L2 was half clocked but that's not the case.
    Last edited by TESKATLIPOKA; 11-19-2011 at 09:23 AM.

  15. #140
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    No I didn't say it is half clocked.It's just what dresdenboy speculated based on his research. It can be that L2 is just that slow by design,who knows.

  16. #141
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by informal View Post
    L2 is almost on par with L3 cache in BD (on par being almost as slow as)... That's what aida reports at least. If it's true (half clocking) then they can try to remedy this if/when GloFo can reach a level of process maturity to allow this change. L2 cache is very very important for client workloads.
    then HTT clock would show higher gains vs multiplier which is some what true.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  17. #142
    Registered User
    Join Date
    Sep 2007
    Posts
    58
    Quote Originally Posted by TESKATLIPOKA View Post
    mrcmtl doesn't look that way, only write is somewhat slower
    http://pctuning.tyden.cz/ilustrace3/...DFX/aida64.png
    http://pctuning.tyden.cz/ilustrace3/...0/cachemem.png

    http://i55.tinypic.com/jjs221.jpg
    http://www.xtremesystems.org/forums/...3&d=1318067554

    informal where is L2 on par with L3? It would be nice if L2 was half clocked but that's not the case.
    hmm, my bad then. My source had an incredibly low L2 read for some reason. But anyhow, the L2 latencies are really problematic...

    http://realworldtech.com/beta/forums...23488&roomid=2

  18. #143
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Quote Originally Posted by savantu View Post
    Umhh...you're looking at the end result ( CPU performance ) and blaming only one factor that contributes to that performance : process.
    Alongside process, performance and performance/watt is dependent on architecture and circuit design.

    I really doubt BD is perfect uarch and implementation wise and only the process is to blame. If we would have someone from GF here, I'm sure he'd say not so nice things about AMD competence in CPU design.
    Agree or not but the main problem is the process tech. Nobody said that BD is perfect uarch anyway.

    If you don't believe it then just take a look at Llano CPU cores. Despite of the 32nm process it can't reach the previous very similar uarch clocks even with overclocking.



    At this moment the scaling of 32nm is worse than the latest 45nm one.
    -

  19. #144
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    so...problem of GF...
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  20. #145
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Bulldozer didn't turn out as good as AMD have hoped AND GlofFo's process was nowhere near as good as AMD had hoped. "Perfect storm" for a worse than expected desktop performance (although server is certainly affected too,but not as much as desktop).

  21. #146
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    With Llano at 2.7 and 2.9 at VID's up to 1.4v there's no suprise that GloFo's 32nm is trash. They could have produced the same chips on their 45nm process with 1.15-1.25v considering they were producing Deneb up to 3.7 Ghz under 1.4v VID.
    BD for some reason is less efficient performance wise, especially on the server side, had they added 300 Mhz to Magny Cours over 12 cores and a few hundred turbo they would have had a CPU that outperforms the 16 core by a significant margin.

    "Stars" is a very efficient core when it comes to mm^2 to performance, on the level of Lynnfield at 45nm process. BD was quite a step backward from that, not sure how much we can blame the manufacturing process for that.
    Smile

  22. #147
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    The engiqueers at AMD really borked the design of anything Bulldozer related, server or desktop. It's a horrible chip. They have a chance to redeem themselves with Trinity, but that's not going to happen in their work environment at AMD.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  23. #148
    Registered User
    Join Date
    May 2008
    Location
    USA
    Posts
    36
    Quote Originally Posted by freeloader View Post
    The engiqueers at AMD really borked the design of anything Bulldozer related, server or desktop. It's a horrible chip. They have a chance to redeem themselves with Trinity, but that's not going to happen in their work environment at AMD.
    Come on man lets keep this technical, instead of a bunch of rumors that showed up on a blog about how everything is automated and AMD is doomed because their chips use up 20% more space and lose 20% more performance. If you can show me the places where BD's die space can be reduced by 20% and where performance can be increased by 20%, please show me.

    There's plenty of solid proof that the GloFo 32nm process is borked, primarily how there's no clock or VID difference between 32nm BD and 45nm Stars, while Stars was designed for efficiency and BD was designed for high clocks and there's a tradeoff between both.

    Compare TSMC's 55nm and 40nm process with 4870 and 4770 GPUs. http://www.gpureview.com/show_cards....=564&card2=612 4770 is at the same clock, yet uses almost half as much power. These are the kinds of things we should be seeing with GloFo 32nm, but we're not. I doubt BD is that messed up where it would use twice as much power as Stars. I'm not saying it should use half, they are different archs and 4770 and 4870 are the same, but for power consumption to go up pretty much shows that there's serious issues with GloFo 32nm.

    Because of this, I don't see 32nm Stars being much better than BD. Explain to me why a leaky and lossy manufacturing process is going to care about what architecture it's running on. From trinity it looks like BD is good when it's not leaking and running away with power consumption at higher volts.

    FX 8350 @ 5.11ghz | Gigabyte 990FXA UD5 | 16GB Mushkin Blackline | 7970 @ 1.2ghz
    core i7 920 @ 4.05ghz | asus p6t deluxe | 6GB G. Skill @ ~1.6ghz | 7970 @ 1.2ghz - 6ghz - 1.2v
    Opteron 165 @ 2.7Ghz | 1gb G. Skill @ ~520mhz |4870 1GB | asus a8n32 sli

  24. #149
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Quote Originally Posted by freeloader View Post
    The engiqueers at AMD really borked the design of anything Bulldozer related, server or desktop. It's a horrible chip. They have a chance to redeem themselves with Trinity, but that's not going to happen in their work environment at AMD.
    Is that all what you can tell?

    An uarch can fail or win on the process tech. Just try imagine if the top quad core SB would have ~2.5 GHz base clock with 130 watt TDP.
    -

  25. #150
    Xtreme Member
    Join Date
    Jan 2011
    Location
    145.21.4.???
    Posts
    319
    IMO, Llano should be up to 3.5Ghz if everything went well, but it didn't. If trinity could be up to 4.5Ghz(3.8-4.1Ghz for known ES) and has some other fix with mass production silicon, I bet it could do well.

Page 6 of 48 FirstFirst ... 345678916 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •