MMM
Page 5 of 10 FirstFirst ... 2345678 ... LastLast
Results 101 to 125 of 238

Thread: Deneb Samples are almost out

  1. #101
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by Lightman View Post
    No it's not and Particle is more or less right.
    B3 fix for TBL errata was a bypass. It didn't fix originally planned functionality. The fix brings minimal performance penalty, because TLB is flushed every time there can be some dirty data in it. I think Anand was doing more in-depth analysis.
    If you mean this?

    http://www.nordichardware.com/news,7189.html
    or this?
    http://www.xbitlabs.com/articles/cpu...m-x4-9850.html

    Unfortunately, AMD engineers didn’t really explain to us what was done specifically to fix the TLB bug in the new B3 processor stepping. However, some indirect data we have at our disposal gives us reason to believe that now, after the processor core changes the bit flags for page table entries stored in L2 cache, they are all evicted into L3 cache. This may be the reason fore the latency to get a little bit higher.

  2. #102
    Registered User
    Join Date
    Nov 2007
    Location
    Stuttgart
    Posts
    57
    That's also what I remembered, so I looked it up @ anand:

    The hardware fix implemented in B3 Phenoms is that whenever a page table entry is modified, it's evicted out of L2 and placed in L3. There's a very minor performance penalty because of this but no where near as bad as the software/BIOS TLB fix mentioned above.
    link
    Last edited by malice85; 11-05-2008 at 09:17 AM.

  3. #103
    Banned
    Join Date
    Jul 2008
    Posts
    165
    If this fixes the TLB bug, it is a fix :P
    it might not be the best fix around (i don't know really) but it fixes the problem

  4. #104
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Any chance that the errata is now fixed properly? E.g. the source of the problem is corrected not to generate errors?

  5. #105
    Xtreme Addict
    Join Date
    Oct 2006
    Posts
    2,141
    Pardon my ignorance, but I thought the TLB bug was fixed in the B3 revision of the Phenom? Was that not a hardware fix?
    Rig 1:
    ASUS P8Z77-V
    Intel i5 3570K @ 4.75GHz
    16GB of Team Xtreme DDR-2666 RAM (11-13-13-35-2T)
    Nvidia GTX 670 4GB SLI

    Rig 2:
    Asus Sabertooth 990FX
    AMD FX-8350 @ 5.6GHz
    16GB of Mushkin DDR-1866 RAM (8-9-8-26-1T)
    AMD 6950 with 6970 bios flash

    Yamakasi Catleap 2B overclocked to 120Hz refresh rate
    Audio-GD FUN DAC unit w/ AD797BRZ opamps
    Sennheiser PC350 headset w/ hero mod

  6. #106
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Birmingham AL.
    Posts
    1,079
    Quote Originally Posted by EniGmA1987 View Post
    Pardon my ignorance, but I thought the TLB bug was fixed in the B3 revision of the Phenom? Was that not a hardware fix?
    just read the last couple pages for that answer, thats what we have been discusing.
    Particle's First Rule of Online Technical Discussion:
    As a thread about any computer related subject has its length approach infinity, the likelihood and inevitability of a poorly constructed AMD vs. Intel fight also exponentially increases.

    Rule 1A:
    Likewise, the frequency of a car pseudoanalogy to explain a technical concept increases with thread length. This will make many people chuckle, as computer people are rarely knowledgeable about vehicular mechanics.

    Rule 2:
    When confronted with a post that is contrary to what a poster likes, believes, or most often wants to be correct, the poster will pick out only minor details that are largely irrelevant in an attempt to shut out the conflicting idea. The core of the post will be left alone since it isn't easy to contradict what the person is actually saying.

    Rule 2A:
    When a poster cannot properly refute a post they do not like (as described above), the poster will most likely invent fictitious counter-points and/or begin to attack the other's credibility in feeble ways that are dramatic but irrelevant. Do not underestimate this tactic, as in the online world this will sway many observers. Do not forget: Correctness is decided only by what is said last, the most loudly, or with greatest repetition.

    Remember: When debating online, everyone else is ALWAYS wrong if they do not agree with you!

  7. #107
    Registered User
    Join Date
    Nov 2007
    Location
    Stuttgart
    Posts
    57
    As far is I understand with B3 the errata was fix resulting in a very small to no perfomance penalty, but wasn't fix to work the way it was initially planned to. So for me the question is if they were able to achieve this with shanghai/deneb.

  8. #108
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Yes, I meant that
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  9. #109
    Xtreme Addict
    Join Date
    Oct 2005
    Location
    EvE-Online, Tranquility
    Posts
    1,978
    Anyway, my prediction for Deneb vs Intel's products? I think untill Lynnsfield is out Deneb went trough a few revisions, eventually high/k and metal gates, being able to actually be a good competitor for Lynnsfield and the skt 775 platform. Performance wise I do see Deneb being able to beat/keep up with Yorkfield a lot better, at least in most daily apps. Vs Lynnsfield, not sure about performance, but AMD will be able to be more competitive with price/performance competition than it's now with Agena vs Yorkfield or even Kentsfield.

    Anyway, time will tell and it's only what I think. It depends on more factors than just 'Deneb owns' or 'Lynnsfield has only dual channel anyway'. For example, price of Lynnsfield's platform. I dont think it's going to be as overpriced as current Bloomsfield platform due to less technology etc. But as said, time will tell.
    Last edited by Cooper; 11-05-2008 at 01:45 PM. Reason: flame removed
    Synaptic Overflow

    CPU:
    -Intel Core i7 920 3841A522
    --CPU: 4200Mhz| Vcore: +120mV| Uncore: 3200Mhz| VTT: +100mV| Turbo: On| HT: Off
    ---CPU block: EK Supreme Acetal| Radiator: TCF X-Changer 480mm
    Motherboard:
    -Foxconn Bloodrage P06
    --Blck: 200Mhz| QPI: 3600Mhz
    Graphics:
    -Sapphire Radeon HD 4870X2
    --GPU: 750Mhz| GDDR: 900Mhz
    RAM:
    -3x 2GB Mushkin XP3-12800
    --Mhz: 800Mhz| Vdimm: 1.65V| Timings: 7-8-7-20-1T
    Storage:
    -3Ware 9650SE-2LP RAID controller
    --2x Western Digital 74GB Raptor RAID 0
    PSU:
    -Enermax Revolution 85+ 1250W
    OS:
    -Windows Vista Business x64


    ORDERED: Sapphire HD 5970 OC
    LOOKING FOR: 2x G.Skill Falcon II 128GB SSD, Windows 7

  10. #110
    Xtreme X.I.P.
    Join Date
    Apr 2005
    Posts
    4,475
    Let's not bring the TLB errata back again please. I don't recall anyone experiencing it on B2 chips with fix disabled. B3 performance is the same as B2 - dont know where you got the BS of B2 being faster w/o the fix applied.

  11. #111
    Banned
    Join Date
    Jul 2008
    Posts
    165
    Quote Originally Posted by Cooper View Post
    Let's not bring the TLB errata back again please. I don't recall anyone experiencing it on B2 chips with fix disabled. B3 performance is the same as B2 - dont know where you got the BS of B2 being faster w/o the fix applied.
    B3 is a little bit faster in some benches, but its within margin of error

    xbitlabs just concluded the latency in everest went up a bit

  12. #112
    Xtreme Addict
    Join Date
    Nov 2007
    Location
    Illinois
    Posts
    2,095
    Quote Originally Posted by Cooper View Post
    Let's not bring the TLB errata back again please. I don't recall anyone experiencing it on B2 chips with fix disabled. B3 performance is the same as B2 - dont know where you got the BS of B2 being faster w/o the fix applied.
    Quoted for some massive truth.
    E7200 @ 3.4 ; 7870 GHz 2 GB
    Intel's atom is a terrible chip.

  13. #113
    Brilliant Idiot
    Join Date
    Jan 2005
    Location
    Hell on Earth
    Posts
    11,015
    Quote Originally Posted by Cooper View Post
    Let's not bring the TLB errata back again please. I don't recall anyone experiencing it on B2 chips with fix disabled. B3 performance is the same as B2 - dont know where you got the BS of B2 being faster w/o the fix applied.
    Quite the contrary cooper I only brought it up to support the fact that the 25% increase in performance might be feasible IF they were able to erradicate the TLB errata on a hardware level. IF that is the case IPC performance could easily be 15% and hardware level repair ( A better fix than current ) could easily add up to another 10% gain totaling performance gains up to 25% in some apps.
    heatware chew*
    I've got no strings to hold me down.
    To make me fret, or make me frown.
    I had strings but now I'm free.
    There are no strings on me

  14. #114
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    Quote Originally Posted by chew* View Post
    Quite the contrary cooper I only brought it up to support the fact that the 25% increase in performance might be feasible IF they were able to erradicate the TLB errata on a hardware level. IF that is the case IPC performance could easily be 15% and hardware level repair ( A better fix than current ) could easily add up to another 10% gain totaling performance gains up to 25% in some apps.
    I'd like to get the same drug as you're using, where can I get it? *just kidding*

    I think you are doing some major exaggerations. Although I'd like you to be right with you 25% IPC increase, I just don't see it happening from AMD anytime soon.
    "When in doubt, C-4!" -- Jamie Hyneman

    Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |

  15. #115
    Brilliant Idiot
    Join Date
    Jan 2005
    Location
    Hell on Earth
    Posts
    11,015
    Quote Originally Posted by Helmore View Post
    I'd like to get the same drug as you're using, where can I get it? *just kidding*

    I think you are doing some major exaggerations. Although I'd like you to be right with you 25% IPC increase, I just don't see it happening from AMD anytime soon.
    It's more wishful thinking, but I did say "totaling performance gains up to 25% in some apps"
    heatware chew*
    I've got no strings to hold me down.
    To make me fret, or make me frown.
    I had strings but now I'm free.
    There are no strings on me

  16. #116
    Xtreme Enthusiast
    Join Date
    Jun 2008
    Posts
    746
    They did say 15-20% ipc increase and 20% increase from clocks. So ~35% overall.

  17. #117
    Xtreme Enthusiast
    Join Date
    Oct 2006
    Posts
    658
    Quote Originally Posted by Caveman787 View Post
    They did say 15-20% ipc increase and 20% increase from clocks. So ~35% overall.
    Let me just say this: 15 - 20% IPC gains from a die shrink would be absolutely incredible and unheard of, but many here seem to think its a realistic figure? 15 - 20% improvements are more akin to an architectural overhaul, for example K8 -> K10 gained about that much.

    Just as a point of comparison, Penryn gained on average 5 - 6% per clock through a larger cache and several minor architectural improvements. AMD would have to work miracles to get 15 - 20% gains from Deneb, frankly I very much doubt it but I would be more than happy to eat humble pie if proven wrong.

  18. #118
    Xtreme Member
    Join Date
    Apr 2008
    Posts
    463
    Quote Originally Posted by Epsilon84 View Post
    Let me just say this: 15 - 20% IPC gains from a die shrink would be absolutely incredible and unheard of, but many here seem to think its a realistic figure? 15 - 20% improvements are more akin to an architectural overhaul, for example K8 -> K10 gained about that much.

    Just as a point of comparison, Penryn gained on average 5 - 6% per clock through a larger cache and several minor architectural improvements. AMD would have to work miracles to get 15 - 20% gains from Deneb, frankly I very much doubt it but I would be more than happy to eat humble pie if proven wrong.
    isn't deneb bringing HT 3.1 and 3x as much L3 cache
    amd 720
    M4A78T-E
    i gig of crap
    visionek 4850
    seagate 320 GB 7200.10
    WD 640 GB
    swiftech MCR320 swiftech MCP355
    Apogee GTZ
    XPSC Restop


    Quote Originally Posted by road-runner View Post
    I can say one thing I learned out of all this, I am not buying any Intel SSDs thats for sure...

  19. #119
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by Epsilon84 View Post
    Let me just say this: 15 - 20% IPC gains from a die shrink would be absolutely incredible and unheard of, but many here seem to think its a realistic figure? 15 - 20% improvements are more akin to an architectural overhaul, for example K8 -> K10 gained about that much.

    Just as a point of comparison, Penryn gained on average 5 - 6% per clock through a larger cache and several minor architectural improvements. AMD would have to work miracles to get 15 - 20% gains from Deneb, frankly I very much doubt it but I would be more than happy to eat humble pie if proven wrong.
    he added higher clocks to that.

    phenom 15% slower then kents 4mbs vs 8mbs

    50% less cache.

    where getting 50% more cache 512kbs x4 plus 6144kbs on L3 cache.

    8mbs in total
    Last edited by demonkevy666; 11-05-2008 at 07:20 PM.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  20. #120
    Xtreme Enthusiast
    Join Date
    Oct 2006
    Posts
    658
    Quote Originally Posted by stangracin2 View Post
    isn't deneb bringing HT 3.1 and 3x as much L3 cache
    Faster HT won't bring anything for desktop performance. A larger L3 cache alone won't account for a 15 - 20% IPC gain except in very cache bound apps. I guess L3 speeds may go up as well which helps, but I still think these figures are very optimistic.

  21. #121
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    I agree Elipson,

    I guess the factors that seperate the 2 are:

    A. People are of the belief that K10 has never been 'all there' , I tend to agree in some areas.. memory performance is lacking, performance is a little below expectations - A lot below in some cases where one would expect much more.. It's the areas were K10 barely outperforms K8 that makes it lose to Core on average.

    Did the rush to get K10 out there leave some apsects of the design not functioning as they should be?

    B. AMD need the performance boost.


    Factors working against any large IPC gains are:

    A. Just looking at the die shots, anything visable on a core level is identical.. any uarch enhancments there would have to be minor - stands to reason anyway, no one does a meaningfull overall of an arch when changing process node - old news

    B. benchmarks we've seen so far show 7-15% max.

    Personally A realistic guess would be 10% across the board, but next to nothing in some areas. and 15%+ in rare cases, like what we saw with Pov-Ray - a benchmark that was a sore spot and still will lag behind Core.

    Its important they squeezed what they could out of it for Deneb - especially at 3Ghz plus. Lets not forget a 5% IPC boost is the equivilent of a whole 200Mhz speed bin at these frequencies.. and lets face it, no one's going over the high 3's (Ghz) with these sort of architectures any time soon.

    Not that it really matters long term, Hyperthreading and any other means of increasing multi-threaded performance is all either CPU company will care about from now on. Lets face it, If Deneb was 15% slower clock/clock than i7 at single threaded (as it most likely will be) but had 4 extra cores, it would still be the winner.

    Deneb's lack of SMT technology is now more of an issue than lack of IPC. Now if only they had a lare enough die size and power advantage to sneak on a couple of extra phy cores

  22. #122
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by stangracin2 View Post
    isn't deneb bringing HT 3.1 and 3x as much L3 cache
    As he said above ... HT communicates to the chipset NB/SB arrangement and is not the bottleneck in single socket implementations. Raising the HT speed will do nothing for observed IPC. Ironically, I just finished a FPS skew on lost planet with a Phenom (HT3.0) and FPS doesn't begin to drop off until about 600-800 MHz (down form 2000 MHz)... I can post that data if you like.

    L3 cache will certainly help, but the rule of thumb is for every doubling of the cache expect a factor of sqrt(2) improvement in cache miss rate ... mAJORD mentioned above that the memory performance was poor, that is sorta a relative statement ... it's still very good, just not quite hitting expectations. AMD's IMC approach decreases penalties for cache misses, so it is even more likely that making L3 3x larger will have less of a general impact.

    It would be nice though if AMD would disclose some other of their tweaks (as they most certainly made them) ... one area is in the L3 latency, they use an asychronous link between different clock domains by implementing a FIFO buffer between L3 and the cores to absorb the clock skew ... this adds latency, and looking at the overall results on Phenom it was a pretty significant hit ... my guess is they really improved this part of the cache structure, which will be a big help.

    I agree with mAJORD -- for desktop applications, ~10% IPC improvement is likely, with the a few app specific 15% ...

    When AMD quoted 20% over barcelona, you need to be careful to take that in the right context, they are comparing at the server level, with server related benchmarks. Today's barcleona opteron's are still on HT 2.0 I believe, going to a HT 3.0, unlike desktop, will improve 2P server performance significantly ... as good as barcelona is today on throughput, a faster socket to socket link will be even better. That 20% is not likely to translate into desktop.

    That doesn't sound good, espeically if you are a devoted AMD customer -- but 10% is a good, healthy gain IPC wise for just a shrink -- (btw, the leaked Deneb benchmarks are pointing to this 10% number that mAJORD mentioned above) ... with this, and a healthy 45 nm to get to 3.0 GHz clock... AMD will have a nice competitive CPU ... I plan on getting one when they launch, so they have already sold one

    Jack
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  23. #123
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Lansing, MI / London / Stinkaypore
    Posts
    1,788
    Hmm JJ, how does Nehalem achieve that then? No FIFOs, yet the "Core" and "Uncore" run at different speeds (except the 965 extreme)? Or is it related to per-core clocking instead?


    Deneb won't be too much of a different experience on desktop for multimedia and rendering (POV has proved me wrong though), but in games we could see a sizable gain if they prefetch in a more aggressive manner.
    Quote Originally Posted by radaja View Post
    so are they launching BD soon or a comic book?

  24. #124
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Macadamia View Post
    Hmm JJ, how does Nehalem achieve that then? No FIFOs, yet the "Core" and "Uncore" run at different speeds (except the 965 extreme)? Or is it related to per-core clocking instead?


    Deneb won't be too much of a different experience on desktop for multimedia and rendering (POV has proved me wrong though), but in games we could see a sizable gain if they prefetch in a more aggressive manner.
    This was a topic of discussion a few months back as I recall, so a google to 'Nehalem Synchronous' yielded some hits:

    http://techreport.com/discussions.x/14950
    he processor runs all of its internal components—the CPU cores, memory controller, and I/O—in a decoupled fashion, so one can tune their respective frequencies and voltages independently. This isn't a new idea, Kumar stressed, but Intel's implementation is new in that it uses a synchronous interface between those components. Most past implementations have asynchronous interfaces, he claimed, which result in both higher latency and indeterminism—"if you test five different systems, you will get five different results." Because of the synchronous approach, Nehalem's memory-to-cache latency is allegedly "drastically smaller" than that of the competition.
    How the heck they did it, I don't know -- the science of process technology, I can read and understand, architectural details I have been able to accumulate a great deal understanding (much with Kanter's help and reading a lot of Hennesy) and I am always eager to learn more, but circuit level implementations -- frankly, I am clueless -- I can sketch out a 6T transistor SRAM cell or some simple 4T inverters or a NOR gate circuit, but ask me to string it together or throw in a PLL or a power gate -- all you will get is a dumb look

    In terms of gaming -- I am not so sure, I don't think the cache/prefetching is a huge deal here (this is my opinion, and I could be wrong, there is no way to quantitatively ascertain anything) ... by it's nature, gaming algorithms are 'branchy' for lack of a better word, by this I mean -- the flow of the code has dependencies that simply require code paths to branch, for example ... shoot a gun -- does it hit a dude (yes / no) -branch, does the dude die (yes/no) branch - do you issue the animation to lop of his head (yes/no) branch ... if AMD spent sometime improving the branch prediction (direct or indirect, doesn't matter) then there will be a very nice gaming improvement, I think the cache will not be as important.

    Jack
    Last edited by JumpingJack; 11-05-2008 at 10:05 PM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  25. #125
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Lansing, MI / London / Stinkaypore
    Posts
    1,788
    I know previous gaming code was branchy, but I don't think the current trend is emphasized there any more.
    Xenon and Cell for the consoles aren't too apt at branching, I last remember, especially with buffed up SIMD units. There will always be branchy code, but does it still comprise the majority of the engine?

    AMD does need serious work on their predictors though - for general performance more than anything. Despite the improvements Intel has a really decisive lead here ever since Conroe.
    Quote Originally Posted by radaja View Post
    so are they launching BD soon or a comic book?

Page 5 of 10 FirstFirst ... 2345678 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •