Page 5 of 48 FirstFirst ... 234567815 ... LastLast
Results 101 to 125 of 1198

Thread: AMD "Piledriver" refresh of Zambezi - info, speculations, test, fans

  1. #101
    Xtreme Member
    Join Date
    Jan 2011
    Location
    145.21.4.???
    Posts
    319
    Quote Originally Posted by TESKATLIPOKA View Post
    undone and what does it have in common with my comment, I never said BD can't be tweaked .
    To me it looks like you'r saying IPC lower than deneb is a failure. But processor design is complicate than we thought.
    Even if Zambezi doesn't have those problems I still doubt IPC would increase.

    I don't know why you think they planned for BD to have worse IPC than K10
    If that was true then why was JF always saying the IPC will increase over K10 and he meant BDv1(bulldozer) not BDv2(piledriver) or BDv3(steamroller) because that was what he heard from the engineers but sadly It wasn't true, simply put BD wasn't what they wanted.
    IIRC, 'IPC increase' is due to some simply statements like '33% more cores but 50% increase compare to Thuban'. But they never mentioned anything about the balance between frequency and ipc.

    Ok, I wanna end up discussion about BD, if my comment lead to off-topic then sorry.
    Last edited by undone; 11-17-2011 at 12:05 PM.

  2. #102
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    undone
    But they never mentioned anything about the balance between frequency and ipc.
    JF said many times IPC will increase even if it wasn't the official statement from AMD.

    To me it looks like you'r saying IPC lower than deneb is a failture. But processor design is complicate than we thought.
    I don't care if the performance is gained from IPC increase or frequency, but if JF said IPC will increase and It turns out It decreased by >10% I think its a failure compared to what it should have been.

  3. #103
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    A8-3850 (2.9 GHz): 3.45 pts
    FX-4100 (3,6 GHz): 2,94 pts

    Trinity (3.8 GHz?): 3,45 pts?
    I've been thinking, if true it doesn't sound all that bad IMO.

    FX-4100 3,6 GHz = 2,94 pts / 36 * 38 = ~3,1 pts

    3,1 pts compared to Trinity @ 3,8 GHz which is 3,45 pts it's an increase of about 10%. This is in FPU performance which will be lacking in Trinity (considering it's the same/tweaked architecture as Bulldozer). We can't say for sure that this 10% increase translates to the architectures strong side which is Integer performance but if that's the case together with better power efficiency it doesn't sound all that bad.
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  4. #104
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    Smartidiot89 Maybe we will see some increase even in integer performance. Probably some still remember Charlie's claim about 20% increase in ALU performance in the next stepping.

  5. #105
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    Quote Originally Posted by TESKATLIPOKA View Post
    Smartidiot89 Maybe we will see some increase even in integer performance. Probably some still remember Charlie's claim about 20% increase in ALU performance in the next stepping.
    I find it hard to believe with a stepping, even with a new revision I'd raise an eyebrow. I will take 10-15% increase with Trinity as that is what I've heard from AMD and that this guy from Chiphell shows in hes results. 20% i don't think will happen with Piledriver, not IPC-wise - but IPC+clocks i find it very possible, even very likely.

    AMD have already gone public that they aren't happy with GlobalFoundries performance with 32nm, and the power consumption of Bulldozer (especially overclocked) points the finger towards high leakage. I think there's alot that can/will/has be done, both on the design and manufacturing side.
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  6. #106
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    Smartidiot89 My mistake I should have used the world revision and not stepping and I am not saying Charlie is right, he can be wrong. I think what you said is quite possible, we will see.

  7. #107
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Quote Originally Posted by Smartidiot89 View Post
    I find it hard to believe with a stepping, even with a new revision I'd raise an eyebrow. I will take 10-15% increase with Trinity as that is what I've heard from AMD and that this guy from Chiphell shows in hes results. 20% i don't think will happen with Piledriver, not IPC-wise - but IPC+clocks i find it very possible, even very likely.

    AMD have already gone public that they aren't happy with GlobalFoundries performance with 32nm, and the power consumption of Bulldozer (especially overclocked) points the finger towards high leakage. I think there's alot that can/will/has be done, both on the design and manufacturing side.
    The biggest issue is GF's crappy 32nm manufacturing tech.

    Dresdenboy:

    Some more power hungry units _seem_ to run at half the clock. Integer MUL shows a 2 cycle granularity in latency and a throughput of one every 2 (32 bit) or 4 (64 bit) cycles.

    L2 cache latency is 18 (1 MB) or 20 (2 MB) cycles. It could be the case that it runs at half the clock too. Years ago an AMD designer (Jerry Moench) talked about half clocked L2 cache for "K9" in Stanford. Bobcat has a half clocked L2.

    This is not about speed paths but about power consumption. Those 2 billion transistors cause a heck of a leakage.
    source
    -

  8. #108
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    Quote Originally Posted by Oliverda View Post
    The biggest issue is GF's crappy 32nm manufacturing tech.
    Yepp, and I honestly don't think AMD will bother to "fix" Bulldozer. They will most likely wait for Piledriver.
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  9. #109
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Quote Originally Posted by Smartidiot89 View Post
    Yepp, and I honestly don't think AMD will bother to "fix" Bulldozer. They will most likely wait for Piledriver.
    First of all GF should fix his manufacturing tech. It would solve some major problems regarding Bulldozer.

    90 nm SOI:
    65 nm SOI:
    45 nm SOI:
    32 nm SOI:

    so far...
    -

  10. #110
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Oliverda View Post
    First of all GF should fix his manufacturing tech. It would solve some major problems regarding Bulldozer.

    90 nm SOI:
    65 nm SOI:
    45 nm SOI:
    32 nm SOI:

    so far...
    Umhh...you're looking at the end result ( CPU performance ) and blaming only one factor that contributes to that performance : process.
    Alongside process, performance and performance/watt is dependent on architecture and circuit design.

    I really doubt BD is perfect uarch and implementation wise and only the process is to blame. If we would have someone from GF here, I'm sure he'd say not so nice things about AMD competence in CPU design.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  11. #111
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    savantu I can agree that It's not just glofo's fault.
    BD and module concept is good but many thinks need to be tweaked and I think the biggest problem is the cache subsystem and It needs a major change.
    I would like to see something like this:
    L1: L1I 64KB + 2* L1D 32-64KB.
    L2: 512-1024KB per module
    L3: 512-1024KB per module
    current BD 4module: 4*64KB+8*16KB+4*2048KB+8*1024KB=16768KB
    BD 4module version1: 4*64KB+8*64KB+4*2*512KB+8*512KB=9056KB
    BD 4module version2: 4*64KB+8*32KB+4*2*256KB+8*256KB=4608KB
    This way you save some power consumption along with big die size and the performance doesn't need to decrease by much, actually in BD I wouldn't be surprised if it increased because of Write-back L1 cache and smaller but faster L2 and L3 cache, finally L3 could be working at full speed and not just half.
    Last edited by TESKATLIPOKA; 11-18-2011 at 05:13 AM.

  12. #112
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Oliverda View Post

    Dresdenboy:



    source
    Thanks for that link! I really suspected before that AMD had to downlock many parts of the module since they missed the performance target by a wide margin(especially in projected fp performance vs K10). Half clocking of L2 and maybe even FPU can explain this. FPU is really one of the power hogs of the design and big L2 is not helping either. If they manage to fix the power issues(they: GloFo) ,then maybe PD can be solid improvement in some weak areas.

  13. #113
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by undone View Post
    I never argue about ipc even before Zambezi being announced.
    When you go over the history you'll find something similar to nowaday situation. K6 have 10% higher IPC than K7, netburst is terrible, and now bulldozer is the same. Reanson why they developed a lower IPC model is because the frequency is bottlenecked by architecture. These design always not only need tweak but more important is the process node, lately CPU bottlenecked below 4Ghz and now bulldozer makes a breakthrough.
    Bulldozer = K7, and it would be another K8 when everything is OK, since then don't be surprise to see a CPU that stock at 6Ghz+.
    Wrong, K7 had much higher IPC than K6, K6-III was far behind P2 overall (not integer), and P3 was behind K7. And the frequency is not bottlenecked by architecture, all architectures have a hard time over 4GHz. So it's stupid to sacrifice lots of IPC and die size to squeeze a few hundred MHz more from the chip at these frequencies, every 100Mhz over 4GHz has a high price, AMD decided to pay up. I can't imagine Intels factories being capable of producing BD at competitive frequencies (5-7GHz with less power consumption). The problem is in the design, they decided to pay a high price to get some extra frequency, just the thing that killed Prescott.

    No architecture with lower IPC than its predecessor has been successful. All the really successful architectures has had large gains in IPC, like Core i7, Core 2, Athlon 64, K7 and Pentium Pro.
    Last edited by -Boris-; 11-18-2011 at 06:16 AM.

  14. #114
    Xtreme Enthusiast
    Join Date
    Nov 2009
    Posts
    526
    Quote Originally Posted by informal View Post
    Thanks for that link! I really suspected before that AMD had to downlock many parts of the module since they missed the performance target by a wide margin(especially in projected fp performance vs K10). Half clocking of L2 and maybe even FPU can explain this. FPU is really one of the power hogs of the design and big L2 is not helping either. If they manage to fix the power issues(they: GloFo) ,then maybe PD can be solid improvement in some weak areas.
    Maybe removing L3 lets them have L2 at full clock (for trinity).

  15. #115
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    Mechanical Man I don't think so, because instead of L3 cache they have a big IGP.

    informal FPU in BD should be working at full speed, after all FX 4100 and Llano 3850 have the same performance in Sandra.
    Last edited by TESKATLIPOKA; 11-18-2011 at 06:35 AM.

  16. #116
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by TESKATLIPOKA View Post
    Mechanical Man I don't think so, because instead of L3 cache they have a big IGP.

    informal FPU in BD should be working at full speed, after all FX 4100 and Llano 3850 have the same performance in Sandra.
    Well if you read what dresdenboy said,some units at least seem to be half clocked. But we just don't know as of now since AMD is not saying anything. Officially, according to them, all is fine... But we know all is not fine since as JF-AMD said,back in 2010 before tape out ,original goal was higher IPC. Easy explanation is half clocked units within the module(whether it's L2,FPU etc.).

    As for FX4100,sandra is synthetic benchmark. In real world FP intensive workloads FX4100 with Turbo is usually behind or just on par with 2.9Ghz Llano.

    What I think would be perfect PD scenario for AMD is: ~5-10% higher IPC and 4-4.2Ghz base clock for top end model. This would put it in 16-28% range over 8150 ,a pretty good spot (over 2600/2700K on average and very close to 980/990x).
    Last edited by informal; 11-18-2011 at 06:52 AM.

  17. #117
    Registered User
    Join Date
    Jul 2008
    Posts
    73
    Quote Originally Posted by -Boris- View Post
    And the frequency is not bottlenecked by architecture, all architectures have a hard time over 4GHz.
    Completely wrong. Frequencies are bottlenecked by both the process (AKA transistor switching speed) and architecture (AKA number of sequentially placed transistors on critical path). And resulting processor frequency is a result of division of the first by the second. Bulldozer significantly shortens critical path and so it's frequency is much higher than 32nm Llano within the same power budget. Or you should expect ~4,1 Gz base clock for imaginery 6-core 45nm Bulldozer within the power budget of 1100T.
    Last edited by sergiojr; 11-18-2011 at 07:29 AM.

  18. #118
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    informal
    As for FX4100,sandra is synthetic benchmark. In real world FP intensive workloads FX4100 with Turbo is usually behind or just on par with 2.9Ghz Llano.
    JF said turbo is not working when FPU is used and this is 2FlexFP vs 4FPU so I think its pretty good, if its like you said then I don't understand why not lower the clocks and voltage so the FPU can work at full speed, that will give you way bigger boost and still be within the TDP limit and for integer they can use a more aggressive turbo.
    Something like this
    FX8150v2 default 3Ghz, 4.2Ghz turbo for ALU, fullspeed FPU and L2. Its easy to see this way would be better than having FX8150v1 default 3.6Ghz, 4.2Ghz turbo for ALU, halfspeed FPU and L2.

  19. #119
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by informal View Post
    What I think would be perfect PD scenario for AMD is: ~5-10% higher IPC and 4-4.2Ghz base clock for top end model. This would put it in 16-28% range over 8150 ,a pretty good spot (over 2600/2700K on average and very close to 980/990x).
    That's flawed logic. Being 30% behind in a bench is not the same thing as being 30% slower. BD needs much more than 28% more performance to match SB.


    Quote Originally Posted by sergiojr View Post
    Completely wrong. Frequencies are bottlenecked by both the process (AKA transistor switching speed) and architecture (AKA number of sequentially placed transistors on critical path). And resulting processor frequency is a result of division of the first by the second. Bulldozer significantly shortens critical path and so it's frequency is much higher than 32nm Llano within the same power budget. Or you should expect ~4,1 Gz base clock for imaginery 6-core 45nm Bulldozer within the power budget of 1100T.
    Of course architecture matters, but what you don't take in to consideration is that frequencygains isn't linear. And over 4GHz the sacfrifices you have to do to gain each MHz isn't worth it at this point. You can't say that Bulldozer is more efficient than K10 or Llano, Bulldozer is less power efficient tha K10 on 45nm! Your comparision to Llano doesn't work since Llano has an integrated GPU, you don't know how much power the cores in Llano consumes and you don't know how the GPU affects the cores power consumption. If llano is made on a different kind of silicon to make the GPU work good enough then that could cripple energy efficieny in the cores.

  20. #120
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Well first of all we don't know for sure if anything is half clocked(but as dresdenboy found out,there are good indicators). Second, why do you think 4 flexfp can work at 3Ghz within 125W(if they are indeed "half clocked")?
    As for L2 cache,if it's half clocked then bringing it to full core speed could potentially increase performance quite significantly in some cases. L2 is what matters for desktop workloads while L3 is what matters for server workloads(due to data set size differences).
    We have to wait for trinity and see what it can do without L3. If it is 10% or so faster than Bdver1 at same clock,then Vishera(with L3) could be regarded almost as real next gen Bulldozer since it would bring larger speedup versus Bdver1 than Bdver1 brought over Thuban (on desktop).

  21. #121
    Registered User
    Join Date
    Jul 2008
    Posts
    73
    Quote Originally Posted by -Boris- View Post
    Of course architecture matters, but what you don't take in to consideration is that frequencygains isn't linear. And over 4GHz the sacfrifices you have to do to gain each MHz isn't worth it at this point.
    It is not processor frequencies that are not linear. It is transistor frequencies, that are. But as I said before processor frequency ~ transistor frequency/ critical path. So if there is a hypothetical 4Ghz barrier for K10 on 45nm, then this barrier will be around 5Ghz for Bulldozer on the same 45 nm techprocess (I assume that Bulldozer's critical path is around 1,25 times shorter based on Llano and Bulldozer 4-core frequencies and power consumption in CPU-dependent tasks).
    You can't say that Bulldozer is more efficient than K10 or Llano, Bulldozer is less power efficient tha K10 on 45nm!
    It is because 32nm techprocess is currently worse the 45nm (est. 5-10%). Of course it saves costs, but performance wise it's just worse. Probably they will match with Q1 2012 Bulldozer update.
    Your comparision to Llano doesn't work since Llano has an integrated GPU, you don't know how much power the cores in Llano consumes and you don't know how the GPU affects the cores power consumption. If llano is made on a different kind of silicon to make the GPU work good enough then that could cripple energy efficieny in the cores.
    There is something like 2,6Ghz 100W non-GPU Llano
    http://products.amd.com/en-us/Deskto...False&f12=True
    Not a fair comparison of course, but at least it is much more then you have to backup you claim.
    Last edited by sergiojr; 11-18-2011 at 09:29 AM.

  22. #122
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    informal
    Second, why do you think 4 flexfp can work at 3Ghz within 125W(if they are indeed "half clocked")?
    because the needed voltage for the chip to be stable is lower at 3Ghz than at 3.6Ghz, and lower voltage means lower power consumption so what you save can be used to increase FPU frequency which will in return increase power consumption.

    P.S. voltage increases the power consumption much more than just the frequency, Anand or Xbit did such a test some time ago.

    edit: just decreasing the voltage by 0.15V 1.26V->1.11V they decreased the cpu power consumption by 1/4, the chip was still working at 3.6Ghz turbo off.
    http://translate.google.sk/translate...%2Findex26.php
    Last edited by TESKATLIPOKA; 11-18-2011 at 08:34 AM.

  23. #123
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    Quote Originally Posted by Oliverda View Post
    First of all GF should fix his manufacturing tech. It would solve some major problems regarding Bulldozer.

    90 nm SOI:
    65 nm SOI:
    45 nm SOI:
    32 nm SOI:

    so far...
    Fixing the manufacturing process is only half the story as others have said. Bulldozer is tweaked for the current state 32nm is in, getting improvements will require a new stepping/revision. There is a new stepping coming apparently so these "fixes" should be there as well.
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  24. #124
    Xtreme Member
    Join Date
    Jan 2011
    Location
    145.21.4.???
    Posts
    319
    DH got some new slides about trinity, benchmark is included in video

    http://www.donanimhaber.com/islemci/...da-her-sey.htm

  25. #125
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by TESKATLIPOKA View Post
    savantu I can agree that It's not just glofo's fault.
    BD and module concept is good but many thinks need to be tweaked and I think the biggest problem is the cache subsystem and It needs a major change.
    I would like to see something like this:
    L1: L1I 64KB + 2* L1D 32-64KB.
    L2: 512-1024KB per module
    L3: 512-1024KB per module
    current BD 4module: 4*64KB+8*16KB+4*2048KB+8*1024KB=16768KB
    BD 4module version1: 4*64KB+8*64KB+4*2*512KB+8*512KB=9056KB
    BD 4module version2: 4*64KB+8*32KB+4*2*256KB+8*256KB=4608KB
    This way you save some power consumption along with big die size and the performance doesn't need to decrease by much, actually in BD I wouldn't be surprised if it increased because of Write-back L1 cache and smaller but faster L2 and L3 cache, finally L3 could be working at full speed and not just half.
    I love how everyone skips over the fact you got 1/4 size data cache 16Kbytes being 75% as fast as 64kbytes at 100%.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

Page 5 of 48 FirstFirst ... 234567815 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •