Page 1 of 2 12 LastLast
Results 1 to 25 of 33

Thread: AMD's Answer to Conroe? AMD's upcoming Rev G processors

  1. #1
    Xtreme News
    Join Date
    Dec 2005
    Location
    California
    Posts
    1,594

    AMD's Answer to Conroe? AMD's upcoming Rev G processors

    http://129.15.202.185/athlon_rev_g/wtf_mates.html

    Athlon Rumor Mill has posted some exceptional information regarding AMD's upcoming Rev G processors. Somehow, the editor managed to obtain die shots of both a current Revision F processor as well as an upcoming Revision G CPU. An animated clip overlays the two images and reveals the fact that there are some significant architectural changes. The wild card here is the use of two different stains in the die shots, a fact which can make the same objects look radically different. Regardless, the animation does seem to hint at some major changes though we'll wait for official word from AMD before we dive further into this rumor.

  2. #2
    Xtreme Addict
    Join Date
    Oct 2004
    Location
    Boston, MA
    Posts
    1,448
    Quote Originally Posted by sladesurfer
    AMD's Answer to Conroe?
    Chapter 7 Bankruptcy

    File Server:
    Super Micro X8DTi
    2x E5620 2.4Ghz Westmere
    12GB DDR3 ECC Registered
    50GB OCZ Vertex 2
    RocketRaid 3520
    6x 1.5TB RAID5
    Zotac GT 220
    Zippy 600W

    3DMark05: 12308
    3DMark03: 25820

  3. #3
    -150c Club Member
    Join Date
    May 2005
    Location
    Northeast, USA
    Posts
    10,090
    Not different stains, just an inversion of color, sounds like its a fakeage of copy paste and rotation.

    You mean Chapter 11?


    If you have a cooling question or concern feel free to contact me.

  4. #4
    Xtreme Member
    Join Date
    Jan 2005
    Location
    Sydney, Australia
    Posts
    192
    NICE FIND.

    Can anyone possibly 'guesstimate' the benefit's of an added out-of-order L2 read/write buffer and an extra complex decoder?
    Q6600 @ 3.0 | 8GB G-Skill @ 800 Mhz 5-5-5-20 | ATI 3870 Stock 1 x 500 GB Seagate 7200.11
    1 x 1TB Seagate 7200.11 | 2 x 320GB Seagate 7200.10
    Seasonic S12-550 Energy +CM Stacker

  5. #5
    Love and Peace!
    Join Date
    Dec 2004
    Location
    hiding somewhere!
    Posts
    3,675
    Quote Originally Posted by n00b 0f l337
    Not different stains, just an inversion of color, sounds like its a fakeage of copy paste and rotation.

    You mean Chapter 11?
    odd that they call it rev G, but look at this. i pointed this out elsewhere.. a about a week ago:



    the "secret" die shot look familiar?
    Got a fan over those memory sticks? No? Well get to it before you kill them

  6. #6
    Xtreme Addict
    Join Date
    Apr 2003
    Posts
    1,092
    Well I can, won't reveil how or what regarding this pic, everbody knows it came from tweakers.net and what guys they discussed it with.

    The extra decoders means the pipeline gets filled much faster giving an overal boost in pure ALU performance. In the 'older' cores more complex instructions would even have to be broken up an extra time (they get broken up anyways) so we'll see SSE3 performance raise a lot.

    Also they prolly combine this fast feeder with more powerfull ALUs comparable to what Conroe has. This means you get excellent ALU performance (clearly seen in synthetic ALU benches like DryStone, WhetStone, PiFast, SuperPi etc). It's a total guess at how fast the new decoders and ALUs will be, we'll have to wait.

    The out-of-order L2 has been a heavily requested feature and the part AMD has gotten a lot of bad press (in techy world that is). It's kinda hard to explain how it really works without writing up a 3000 word essay with a lot of technobabble, so I'm not going to do that. I'll try to explain why they did it instead of what it does:

    It's well known cache isn't as important for K8 as it is for netburst, this is because of a couple of reasons:

    Shorter pipelines so pipeline stalls have less of an impact (but relatively very bad anyways)
    Reasonable prefetchers and branch predictors (not as good as Intel's netburst)
    Very high speed memory interface (so pipeline stalls can be fixed much faster)

    There are a few problems when trying to increase the performance of the cache and the predictors/prefetchers. You can simply increase cache size which reduces cache misses and thus pipeline stalls. You can also improve the prefecters and predictors.

    However: Cache is expensive, they are simple memory circuits but need to operate at core speed. If you have a lot of cache the yields can go down and cache is expensive stuff. Compare it to TFT technologies, even if yields are very good 1 in say 100.000.000 cache circuits go wrong if you have a big cache the chance the cache is faulty becomes bigger and bigger.

    Cache also increases die size and use up a lot of resources. The effect is also exponentially flawed, you need to increase the cache size exponentially to get the same results. So that's something you only do if you're desperate (like we've seen Intel do with Xeons).

    The other way is to increase the predictors and pre-fetchers, this is not only difficult (the circuits become very complex and hard to design and fabricate) it also requires a LOT of extra circuits to make even small improvements. We've all seen the Presler die pics and saw how much of the core was actually dedicated to the pre-fetchers. Intel has a lot of experience from the netburst, AMD does not.

    The branch predictors and pre-fetchers are expensive to design, but aren't as expensive to produce. Altough the die size increases it isn't as bad as the cache where die size increases exponentially. There is however a BIG downside: These circuits get hot, they need to switch a lot so micro-wear becomes an issue as well.

    Because AMD has developed this new more dense cache technology we'll prolly see an increase from 2MB to 4MB or maybe 8MB. This will improve performance, AMD has prolly done a lot of research where the 'cut-off' point is between more expensive CPUs and improved performance and prolly found a good point. (Don't think the CPUs will become more expensive as they now are, they'll release single core and sempron versions with less cache and the AM2 CPUs are now expensive but prices will drop making place for the new CPUs with the same prices as they are now).

    The out-of-order L2 buffer helps with the new branch predictors and prefetchers so that's prolly why they did it. This alone has almost no impact on performance but is needed to get better branch predictors and prefecters. It also takes away a argument for most techies who want to badmouth AMD.

    Rev G. will improve performance and when die-shrinking it to 65nm we can see clockspeeds up to 4 ghz. This however isn't the answer to Conroe from AMD. AMD and Intel are out of sync, the answer from one is put into the market about 6 months after the release of the other.

    It's important to also understand the way AMD runs their factories, it's completely different from Intel's fabs. AMD actually has a system where yield automatically get better and better. They have techniques where faults in the production of the CPUs (and the whole process before the actual CPUs are made) can be corrected. Also they have the possibility to implement improvements in the design at a weekly basis (where Intel requires as much as a month or 2 to implement changes).

    I've got a report written by Wouter Tinus in Dutch about AMD's fabs, it's actually amazing what they've got.

    I know my post doesn't always make stuff any clearer, but that's because it is a complex world, the world of computers and especially CPUs
    The world vs the USA: The whole world hates you!
    USA: Why?? Why does the whole world hate us?
    The world: Because the whole world hates you, and you don't even know why!

  7. #7
    Registered User
    Join Date
    Dec 2004
    Location
    Iceland
    Posts
    1
    I'll take your word for it.

  8. #8
    Xtreme Addict
    Join Date
    Apr 2003
    Posts
    1,092
    Wow, you actually registered Dec 2004 and this is your first post?

    Welcome

    (One might say it's very un-nn of you :P)
    The world vs the USA: The whole world hates you!
    USA: Why?? Why does the whole world hate us?
    The world: Because the whole world hates you, and you don't even know why!

  9. #9
    Xtreme Addict
    Join Date
    Oct 2004
    Location
    Boston, MA
    Posts
    1,448
    Quote Originally Posted by Thorry
    Wow, you actually registered Dec 2004 and this is your first post?

    Welcome

    (One might say it's very un-nn of you :P)
    He's a man of few words

    File Server:
    Super Micro X8DTi
    2x E5620 2.4Ghz Westmere
    12GB DDR3 ECC Registered
    50GB OCZ Vertex 2
    RocketRaid 3520
    6x 1.5TB RAID5
    Zotac GT 220
    Zippy 600W

    3DMark05: 12308
    3DMark03: 25820

  10. #10
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    they forgot the SSE changes..
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  11. #11
    Xtreme X.I.P.
    Join Date
    Aug 2004
    Location
    Chile
    Posts
    4,151
    Quote Originally Posted by Thorry
    Well I can, won't reveil how or what regarding this pic, everbody knows it came from tweakers.net and what guys they discussed it with.

    The extra decoders means the pipeline gets filled much faster giving an overal boost in pure ALU performance. In the 'older' cores more complex instructions would even have to be broken up an extra time (they get broken up anyways) so we'll see SSE3 performance raise a lot.

    Also they prolly combine this fast feeder with more powerfull ALUs comparable to what Conroe has. This means you get excellent ALU performance (clearly seen in synthetic ALU benches like DryStone, WhetStone, PiFast, SuperPi etc). It's a total guess at how fast the new decoders and ALUs will be, we'll have to wait.

    The out-of-order L2 has been a heavily requested feature and the part AMD has gotten a lot of bad press (in techy world that is). It's kinda hard to explain how it really works without writing up a 3000 word essay with a lot of technobabble, so I'm not going to do that. I'll try to explain why they did it instead of what it does:

    It's well known cache isn't as important for K8 as it is for netburst, this is because of a couple of reasons:

    Shorter pipelines so pipeline stalls have less of an impact (but relatively very bad anyways)
    Reasonable prefetchers and branch predictors (not as good as Intel's netburst)
    Very high speed memory interface (so pipeline stalls can be fixed much faster)

    There are a few problems when trying to increase the performance of the cache and the predictors/prefetchers. You can simply increase cache size which reduces cache misses and thus pipeline stalls. You can also improve the prefecters and predictors.

    However: Cache is expensive, they are simple memory circuits but need to operate at core speed. If you have a lot of cache the yields can go down and cache is expensive stuff. Compare it to TFT technologies, even if yields are very good 1 in say 100.000.000 cache circuits go wrong if you have a big cache the chance the cache is faulty becomes bigger and bigger.

    Cache also increases die size and use up a lot of resources. The effect is also exponentially flawed, you need to increase the cache size exponentially to get the same results. So that's something you only do if you're desperate (like we've seen Intel do with Xeons).

    The other way is to increase the predictors and pre-fetchers, this is not only difficult (the circuits become very complex and hard to design and fabricate) it also requires a LOT of extra circuits to make even small improvements. We've all seen the Presler die pics and saw how much of the core was actually dedicated to the pre-fetchers. Intel has a lot of experience from the netburst, AMD does not.

    The branch predictors and pre-fetchers are expensive to design, but aren't as expensive to produce. Altough the die size increases it isn't as bad as the cache where die size increases exponentially. There is however a BIG downside: These circuits get hot, they need to switch a lot so micro-wear becomes an issue as well.

    Because AMD has developed this new more dense cache technology we'll prolly see an increase from 2MB to 4MB or maybe 8MB. This will improve performance, AMD has prolly done a lot of research where the 'cut-off' point is between more expensive CPUs and improved performance and prolly found a good point. (Don't think the CPUs will become more expensive as they now are, they'll release single core and sempron versions with less cache and the AM2 CPUs are now expensive but prices will drop making place for the new CPUs with the same prices as they are now).

    The out-of-order L2 buffer helps with the new branch predictors and prefetchers so that's prolly why they did it. This alone has almost no impact on performance but is needed to get better branch predictors and prefecters. It also takes away a argument for most techies who want to badmouth AMD.

    Rev G. will improve performance and when die-shrinking it to 65nm we can see clockspeeds up to 4 ghz. This however isn't the answer to Conroe from AMD. AMD and Intel are out of sync, the answer from one is put into the market about 6 months after the release of the other.

    It's important to also understand the way AMD runs their factories, it's completely different from Intel's fabs. AMD actually has a system where yield automatically get better and better. They have techniques where faults in the production of the CPUs (and the whole process before the actual CPUs are made) can be corrected. Also they have the possibility to implement improvements in the design at a weekly basis (where Intel requires as much as a month or 2 to implement changes).

    I've got a report written by Wouter Tinus in Dutch about AMD's fabs, it's actually amazing what they've got.

    I know my post doesn't always make stuff any clearer, but that's because it is a complex world, the world of computers and especially CPUs

    nice read.

  12. #12
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    ok, so here is a question: what part of a pipeline usually "stalls" ?

    If buffers/caches were placed along the pipe or a second one added to work in "SLI/Crossfire" when the first fails (And the "pipes" can be loaded to be executed one after the other in case of stalls) would that not prove exceedingly beneficial?

    The mini buffer/cache's could even help to hold data so that if the pipe stalls the work is dumped and their data is loaded in while new work is lined up to refill the pipe(s). This could make the single pipe look like two (or two become 3+) in effect.

    All along the watchtower the watchmen watch the eternal return.

  13. #13
    Xtreme Addict
    Join Date
    Mar 2005
    Location
    Dallas, TX USA
    Posts
    1,381
    yeah, i think i watched a documentry on discovry/science/etc....(one of those) on AMD, and they talked about how their fabs basically would learn to do things better and better...though im sure they meant they make it modular, so they can implement changes quickly

    edit: on topic, i hope rev. g turns out well...

    also, will k8l work on sAM2?
    if K8L allows for ddr3, will they fit in ddr2 slots?
    Last edited by VulgarHandle; 05-27-2006 at 11:21 PM.
    Athlon XP-M 2500+ 0343MPMW The King is Dead!
    Phenom II X6 1090T 1025GPMW Long Live the King!

    -------------------------------------------
    I'm from the church of the operating room

  14. #14
    Aint No Real Gangster
    Join Date
    Jun 2004
    Location
    Port Credit/GTA, Ontario, Canada
    Posts
    3,004
    if you look, there is also extra "boxes" by the data cache.
    Specs
    Asus 780i Striker II Formula
    Intel E8400 Wolfdale @ 4050Mhz
    2x2GB OCZ Platinum @ 1200Mhz 5-4-3-18
    MSI 5850 1000Mhz/5000Mhz
    Wester Digital Black 2TB
    Antec Quatro 850W

    Cooling
    Swiftech Apogee
    Swiftech MCP-600
    HardwareLabes Black Ice Extreme 2


    Audio Setup
    X-fi w/AD8066, Clock mod, & polymer caps > PPAV2 > Grado SR60 & Grado SR325i & Beyerdynamic DT770 Pro & Beyerdynamic DT990 & AKG K701 & Denon D2000

  15. #15
    Xtreme Addict
    Join Date
    Apr 2003
    Posts
    1,092
    Quote Originally Posted by STEvil
    ok, so here is a question: what part of a pipeline usually "stalls" ?

    If buffers/caches were placed along the pipe or a second one added to work in "SLI/Crossfire" when the first fails (And the "pipes" can be loaded to be executed one after the other in case of stalls) would that not prove exceedingly beneficial?

    The mini buffer/cache's could even help to hold data so that if the pipe stalls the work is dumped and their data is loaded in while new work is lined up to refill the pipe(s). This could make the single pipe look like two (or two become 3+) in effect.
    That's a very good question (shows you have been paying attention :P).

    A pipeline is a hard concept to understand, but fortunatly there is 1 good website on the internet (actually one on my very short list of good sites on the internet). ars-technica: http://arstechnica.com/

    They've got this great series about the CPU on a technical level, it's a bit hard to understand if you're not into the subject but they have provided some basis knowledge.

    There is actually a two part (just to show how complex the concept of a pipeline is) guide about how a pipeline works, what pipeline stalls are, why this is a bad thing and why this is a fatal flaw in the netburst design.

    http://arstechnica.com/articles/paed...pelining-1.ars
    http://arstechnica.com/articles/paed...pelining-2.ars

    They don't really go into what happens if a cache miss or branch prediction fault occurs, but if you read these two articles and read up on how cache works, why it's important etc you can form a clear image in you mind what actually does happen to the pipeline when a cache miss or branch prediction fault occurs. (I can tell you, it isn't pretty).

    The short answer for those of you that are too lazy to read all this stuff (or simply don't have the time, skills, brain capacity etc):

    The pipeline operates almost at the most basic level, any kind of higher intellegent behavoir at this level is almost impossible. The benefits would be nice, but the price is most certainly too high (if the current level of technology can even do it at all)
    The world vs the USA: The whole world hates you!
    USA: Why?? Why does the whole world hate us?
    The world: Because the whole world hates you, and you don't even know why!

  16. #16
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    good ol'e arse

    yes, a multi-staged pipeline would be hard to make..

    All along the watchtower the watchmen watch the eternal return.

  17. #17
    Xtreme Enthusiast
    Join Date
    Dec 2003
    Location
    UK
    Posts
    567
    Quote Originally Posted by Thorry
    That's a very good question (shows you have been paying attention :P).

    A pipeline is a hard concept to understand, but fortunatly there is 1 good website on the internet (actually one on my very short list of good sites on the internet). ars-technica: http://arstechnica.com/

    They've got this great series about the CPU on a technical level, it's a bit hard to understand if you're not into the subject but they have provided some basis knowledge.

    There is actually a two part (just to show how complex the concept of a pipeline is) guide about how a pipeline works, what pipeline stalls are, why this is a bad thing and why this is a fatal flaw in the netburst design.

    http://arstechnica.com/articles/paed...pelining-1.ars
    http://arstechnica.com/articles/paed...pelining-2.ars

    They don't really go into what happens if a cache miss or branch prediction fault occurs, but if you read these two articles and read up on how cache works, why it's important etc you can form a clear image in you mind what actually does happen to the pipeline when a cache miss or branch prediction fault occurs. (I can tell you, it isn't pretty).

    The short answer for those of you that are too lazy to read all this stuff (or simply don't have the time, skills, brain capacity etc):

    The pipeline operates almost at the most basic level, any kind of higher intellegent behavoir at this level is almost impossible. The benefits would be nice, but the price is most certainly too high (if the current level of technology can even do it at all)
    Thanks for sharing that article, very informative

  18. #18
    Xtreme Enthusiast
    Join Date
    Nov 2005
    Posts
    817
    As far as I can see the die shot isn't nearly as impressive as this guy made it out to be. It looks like a more detailed shot and of course there are the extra "boxes" but apart from that we can't get any real information from it. The extra ALU performance looks like its specifically designed to meet the new expectations (sub 30s 2M SuperPi for example) but will that give any extra performance outside of those very specific tasks?
    <eMesreveR>Do "girls" ever appear outside the Internet? Can i randomly encounter them?
    <Aleph-One>I believe they are a fabrication. Most of the evidence would suggest that they were created in a studio during the Cold War to display our industrial superiority over the Soviet Union.

    -----
    "Microsoft is not the answer. Microsoft is the question. NO is the answer." - Erik Naggum

  19. #19
    Xtreme X.I.P. MaxxxRacer's Avatar
    Join Date
    Aug 2004
    Location
    Los Angeles, Ca USA
    Posts
    12,551
    Adding a complex decoder will really help things out in the fight against conroe, and here is why. Conroe has 3 simple and 1 complex decoders, whereas AMD has 3 complex decoders. AMD's approach may seem better, but for the most part complex instructions are NOT used and thus they can be handled by the simple decoders. Futuremore Intels Core architecture can break down some complex isntructions into simple ones (usually 2, sometimes 3) and thus doesnt loose much performance. Now lets move to RevG of AMD. With 4 COMPLEX decoders, there is not a snowballs chance in hell that conroe could keep up clock for clock in the decoding market, even with their advanced system to break down the complex instructions. One thing I should point out though. Intel has what is called "Macro-Op Fusion" which is essentially combining 2 x86 instructiosn into one. This enbaled to actaully let the decoders do 5 theoretical decodes per clock, assuming that a Macro-OP Fusion is performed and all of the decoders are working at full tick.

    As well, an improved out of order loader will help which AMD K8's architecture is in need of.

    nn also pointed out another direly important thing that AMD needs to upgrade; the SSE units. IIRC currently AMD is using 2 64bit units whereas Intel is using 3 128bit units which, obviously, have twice the bandwidth (per unit). Because of this Intel rapes AMD whenever any heavily SSE optimized programs are used. Can somebody say SuperPi!

    For more info on this subject read this article from AnAndTech
    http://www.anandtech.com/cpuchipsets...oc.aspx?i=2748

    P.S. I hope this revision "G" is true!

  20. #20
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    We need to remember that if rev G still will be 12/14 stage pipeline, it will be shorter than Conroe 14 stage pipeline, so at the same clock it may be even faster! Of course now we have to little information about Rev G to give some certain judgement about preformance.

    PS. If AMD will play same like intel, they should show some pre-production Rev. G samples on 23 July (Conroe market debut)
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  21. #21
    Xtreme Enthusiast
    Join Date
    Dec 2003
    Location
    UK
    Posts
    567
    Quote Originally Posted by Lightman
    We need to remember that if rev G still will be 12/14 stage pipeline, it will be shorter than Conroe 14 stage pipeline, so at the same clock it may be even faster! Of course now we have to little information about Rev G to give some certain judgement about preformance.

    PS. If AMD will play same like intel, they should show some pre-production Rev. G samples on 23 July (Conroe market debut)
    That would be pretty cool, AMD sure keep their advances low-key and quiet!

  22. #22
    Xtreme Addict
    Join Date
    Apr 2005
    Posts
    2,208
    Even if AMD can match Conroe with a Rev G, can they match the price vs performance?

    $316 for a 6600 and $530 for a 6700 are rock bottom prices.

  23. #23
    Xtreme Cruncher
    Join Date
    Mar 2005
    Location
    venezuela caracas
    Posts
    6,460
    Quote Originally Posted by dogsx2
    Even if AMD can match Conroe with a Rev G, can they match the price vs performance?

    $316 for a 6600 and $530 for a 6700 are rock bottom prices.
    remember right now amd is running 90nm if revision G is 65nm they will lower prices for sure
    Incoming new computer after 5 long years

    YOU want to FIGHT CANCER OR AIDS join us at WCG and help to have a better FUTURE

  24. #24
    Xtreme Addict
    Join Date
    Apr 2003
    Posts
    1,092
    Quote Originally Posted by Willis
    did anyone say 4ghz and 65nano?
    Yes, I did actually
    The world vs the USA: The whole world hates you!
    USA: Why?? Why does the whole world hate us?
    The world: Because the whole world hates you, and you don't even know why!

  25. #25
    Xtreme Mentor
    Join Date
    Sep 2005
    Location
    Netherlands
    Posts
    2,693
    Quote Originally Posted by Thorry
    Well I can, won't reveil how or what regarding this pic, everbody knows it came from tweakers.net and what guys they discussed it with.
    ...
    GOOD STORY
    ...
    I know my post doesn't always make stuff any clearer, but that's because it is a complex world, the world of computers and especially CPUs
    good read.

    to add (dunno if its fully true as its been a while since ive read it)
    The system AMD uses to get better and better yields is a piece of patented software.

    The software by itself improves the production process and corrects stuff (no idea how).
    This allows their fabs to correct stuff and improve things on the fly where others (like Intel) have to shut down the machines todo this.
    Time flies like an arrow. Fruit flies like a banana.
    Groucho Marx



    i know my grammar sux so stop hitting me

Page 1 of 2 12 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •