Page 21 of 39 FirstFirst ... 111819202122232431 ... LastLast
Results 501 to 525 of 954

Thread: AMD's Bobcat and Bulldozer

  1. #501
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by MAS View Post
    http://blogs.amd.com/work/2010/08/23...ge-4/#comments

    AMD guy promised bulldozer review next week (see comments)
    Quote Originally Posted by generics_user View Post
    i think you REALLY misunderstood his post

    Round 2 of the questions on Bulldozer are UNDER legal review to be posted; there will be no product review only more questions on Bulldozer
    Quote Originally Posted by superrugal View Post
    He is definitely saying the "question round 2" not bulldozer review.
    you guys should read who posted that, its none other than me

    the BD 20 questions are all broken up sections, so we get the next set answered very soon it seems

  2. #502
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    Since we're comparing with Sandy Bridge here guys... is there even an 8-core counterpart for SB? You sounded like it's easy to fit 8-cores. AMD just did, even then, they went the modules way to decrease the diesize area.

  3. #503
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Mitch Alsup says when he left AMD, Bulldozer was : performance *decrease* of 5% from the microarch-slimming, together with hoped-for 20-25% frequency increase from the pipeline-lengthening.

    Even assuming *perfect* perf scaling with clock, that's 15-20% increase over Ph-II.

    When I left, BD was supposed to be 20-25% faster frequency wise, and
    loose a little architectural figure (5%-ish) of merit due to the
    microarchitecture.
    http://groups.google.de/group/comp.a...14f6049?hl=de#

    So they are really counting on speed-racer to bring the performance increase.

  4. #504
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    my guess would be 17 stages. a speed racer in a modern process is arguably going to be more efficient than a brainiac as long as you dont go over the top with pipelining. increasing IPC has much much worse diminishing benefits excluding multicore.

  5. #505
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Singapore
    Posts
    970
    Quote Originally Posted by terrace215 View Post
    Mitch Alsup says when he left AMD, Bulldozer was : performance *decrease* of 5% from the microarch-slimming, together with hoped-for 20-25% frequency increase from the pipeline-lengthening.

    Even assuming *perfect* perf scaling with clock, that's 15-20% increase over Ph-II.



    http://groups.google.de/group/comp.a...14f6049?hl=de#

    So they are really counting on speed-racer to bring the performance increase.
    When did Mitch Alsup leave AMD?
    Main Rig:
    Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
    Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
    Graphic Card:XFX RX 580 4GB
    Power Supply Unit:FSP AURUM 92+ Series PT-650M
    Storage Unit:Crucial MX 500 240GB SATA III SSD
    Processor Heatsink Fan:AMD Wraith Spire RGB
    Chasis:Thermaltake Level 10GTS Black

  6. #506
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    More interesting bits on the pipeline changes:

    Most of what got cut was cut to enable the 12-gate pipe (if indeed
    they did achieve that.) In Athlon/Opteron, one can forward a byte,
    word, double, or quad from any of the 5 results to any operand of any
    6 integer computation units {ALU, AGU}. If BD can't (or couldn't when
    I left) forward anything to anywhere, and eats a little AFoM because
    of this. This probably saved 2 real gate delays. Lopping off the extra
    ALU, and a few other things saves another gate and we are then within
    spitting distance (1-gate) of the desired 12-gate pipe in the integer
    pipe. More lopping occured in the L1cache pipe to reach the cycle time
    goal.

  7. #507
    Xtreme Monster
    Join Date
    May 2006
    Location
    United Kingdom
    Posts
    2,182
    Bobcat? come on AMD, you had better names.

  8. #508
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Why is no one else wondering about Bulldozer's Decode details?
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  9. #509
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by nn_step View Post
    Why is no one else wondering about Bulldozer's Decode details?
    my understanding of cpu arch is still limited compared to most others here. why "should" we be concerned with the decode unit?
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  10. #510
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    He left just in time when the BD 1 was canceled(end of 2007) and BD 2 ,the one that is coming out 2 years later, was starting to take shape.

  11. #511
    Xtreme Member
    Join Date
    May 2009
    Location
    São Paulo, Brazil
    Posts
    317
    Quote Originally Posted by god_43 View Post
    my understanding of cpu arch is still limited compared to most others here. why "should" we be concerned with the decode unit?
    Because BD needs to feed all those integer cores somehow. We know prefetching is getting improvements, but decoding needs to improve as well to keep up with everything else.

  12. #512
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by god_43 View Post
    my understanding of cpu arch is still limited compared to most others here. why "should" we be concerned with the decode unit?
    The decode rate determines execution unit utilization
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  13. #513
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by danielkza View Post
    Because BD needs to feed all those integer cores somehow. We know prefetching is getting improvements, but decoding needs to improve as well to keep up with everything else.
    Quote Originally Posted by nn_step View Post
    The decode rate determines execution unit utilization
    thank you both, i understand now...that does seem worth "dissecting"!
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  14. #514
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by nn_step View Post
    Why is no one else wondering about Bulldozer's Decode details?
    It has 4-wide decoding feeding 2 cores....

  15. #515
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by informal View Post
    He left just in time when the BD 1 was canceled(end of 2007) and BD 2 ,the one that is coming out 2 years later, was starting to take shape.
    From design to launch, a modern cpu arch is about a 5 year cycle.

  16. #516
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by terrace215 View Post
    It has 4-wide decoding feeding 2 cores....
    But how is the 4-wide partitioned, and how many bytes does it read per clock.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  17. #517
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by nn_step View Post
    But how is the 4-wide partitioned, and how many bytes does it read per clock.
    It's 32bits Fetch

  18. #518
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by terrace215 View Post
    From design to launch, a modern cpu arch is about a 5 year cycle.
    BD 2 was not all around new since it's naturally based on the BD 1 version that was supposed to come out at 45nm. I suspect that like in the case of Barcelona,they were power limited at 45nm and perfromance was not up there where they wanted .So they went with an improved core,done on a smaller node and delayed it 2 years(2009->2011). This gives them more room for improvements at the core level and more clocks ,all within the same power envelope.I expect 15-20% in core level improvement + 30% in clocks.

    Quote Originally Posted by nn_step View Post
    The decode rate determines execution unit utilization
    Its 4+1(branch fusion supported) decoder at the front end,with a so called "accelerate mode" if certain conditions are met.AMD is not disclosing anything about this particular feature ,but essentially this increases the decode rate by some unknown factor.
    Last edited by informal; 08-29-2010 at 03:08 AM.

  19. #519
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by danielkza View Post
    Because BD needs to feed all those integer cores somehow. We know prefetching is getting improvements, but decoding needs to improve as well to keep up with everything else.
    So, not to talk out of school, but I did ask one of our design engineers about the ability of the shared front end to keep two integer cores fed and he had absolutely no concern because of things that are done to improve the front end.

    Can't say any more beyond that because a.) it is not public info and b.) I don't really know enough about how those things work to accurately describe them.

    In my mind this is not a concern of the engineering team. After all it is a completely new design. If they had taken the front end off of an existing product it might be more of an issue, but as I understand it, that has not happened.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  20. #520
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by JF-AMD View Post
    So, not to talk out of school, but I did ask one of our design engineers about the ability of the shared front end to keep two integer cores fed and he had absolutely no concern because of things that are done to improve the front end.

    Can't say any more beyond that because a.) it is not public info and b.) I don't really know enough about how those things work to accurately describe them.

    In my mind this is not a concern of the engineering team. After all it is a completely new design. If they had taken the front end off of an existing product it might be more of an issue, but as I understand it, that has not happened.
    thanks JF!
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  21. #521
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by madcho View Post
    It's 32bits Fetch
    Given that Phenom has a 32 BYTE pick buffer and a 408bit fetch, I see that has highly unlikely.

    Added to the fact that Bobcat has a 22 byte decode

    Quote Originally Posted by informal View Post
    BD 2 was not all around new since it's naturally based on the BD 1 version that was supposed to come out at 45nm. I suspect that like in the case of Barcelona,they were power limited at 45nm and perfromance was not up there where they wanted .So they went with an improved core,done on a smaller node and delayed it 2 years(2009->2011). This gives them more room for improvements at the core level and more clocks ,all within the same power envelope.I expect 15-20% in core level improvement + 30% in clocks.



    Its 4+1(branch fusion supported) decoder at the front end,with a so called "accelerate mode" if certain conditions are met.AMD is not disclosing anything about this particular feature ,but essentially this increases the decode rate by some unknown factor.
    But without more details, optimizing the decode rate is impossible.

    For example, can a single thread take up the entire decode unit for a couple clock cycles if the other thread is sleeping?

    Quote Originally Posted by JF-AMD View Post
    So, not to talk out of school, but I did ask one of our design engineers about the ability of the shared front end to keep two integer cores fed and he had absolutely no concern because of things that are done to improve the front end.

    Can't say any more beyond that because a.) it is not public info and b.) I don't really know enough about how those things work to accurately describe them.

    In my mind this is not a concern of the engineering team. After all it is a completely new design. If they had taken the front end off of an existing product it might be more of an issue, but as I understand it, that has not happened.
    Could you find out if the threads share a pick buffer or if it is shared.
    and if so, what size(s)
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  22. #522
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by JF-AMD View Post
    So, not to talk out of school, but I did ask one of our design engineers about the ability of the shared front end to keep two integer cores fed and he had absolutely no concern because of things that are done to improve the front end.

    Can't say any more beyond that because a.) it is not public info and b.) I don't really know enough about how those things work to accurately describe them.

    In my mind this is not a concern of the engineering team. After all it is a completely new design. If they had taken the front end off of an existing product it might be more of an issue, but as I understand it, that has not happened.
    This almost seems like a single module may possibly use both integer units along with the FPU when executing a single thread. If this is the case, single threaded performance on BD will not be a weak point at all . I remember that old marketing slide saying BD would have the highest single threaded performance ever. It better be true dammit.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  23. #523
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by nn_step View Post
    Given that Phenom has a 32 BYTE pick buffer and a 408bit fetch, I see that has highly unlikely.

    Added to the fact that Bobcat has a 22 byte decode



    But without more details, optimizing the decode rate is impossible.

    For example, can a single thread take up the entire decode unit for a couple clock cycles if the other thread is sleeping?



    Could you find out if the threads share a pick buffer or if it is shared.
    and if so, what size(s)
    Damm I believed it was 32bits ... OMG BYTES !!! lol

  24. #524
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by nn_step View Post




    But without more details, optimizing the decode rate is impossible.
    For example, can a single thread take up the entire decode unit for a couple clock cycles if the other thread is sleeping?

    Single thread can occupy all the shared resources in the module.Decoder and thew whole front end ,with the extra beefed up prefetch is shared.FPU is shared.
    Quote Originally Posted by Mechromancer View Post
    This almost seems like a single module may possibly use both integer units along with the FPU when executing a single thread. If this is the case, single threaded performance on BD will not be a weak point at all . I remember that old marketing slide saying BD would have the highest single threaded performance ever. It better be true dammit.
    Integer cores can't "combine" to work on single integer thread,but one integer core can use the whole FPU to itself.Also one FPU can be used a la SMT by 2 integer cores. What is shared in the module can be used by integer core(s).
    Last edited by informal; 08-29-2010 at 08:10 AM.

  25. #525
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Quote Originally Posted by informal View Post
    .I expect 15-20% in core level improvement + 30% in clocks.
    And AMD's ex-Chief Architect expects 5% in core level perf/clock LOSSES + 20-25% in clocks.

    I wonder who will turn out to be closer?

Page 21 of 39 FirstFirst ... 111819202122232431 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •