Page 1 of 7 1234 ... LastLast
Results 1 to 25 of 157

Thread: 5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)

  1. #1
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535

    Exclamation 5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)

    The 5870 is fast, that's for sure, but looking like and enhanced and doubled 4890 it should be much faster, and definitely shouldn't be beat by 4890CF across the board. The card has power, but obviously isn't able to use very well. The two main culprits are immature drivers and a possible bandwidth bottleneck. While we have to wait and see for drivers, anyone with a 5870 in hand can easily test if the memory system is at fault.

    This is a call for anyone who has a 5870 right now, and if no one steps up to the plate I will when mine comes in. A good methodology would be dropping the core and memory frequencies in half while observing how the performance changes across various programs (especially those that the 5870 doesn't scale well in, like hawx) as core and memory are individually adjusted. Come on guys, lets get to it!

  2. #2
    Xtreme Addict
    Join Date
    Aug 2006
    Location
    The Netherlands, Friesland
    Posts
    2,244
    Bs!
    >i5-3570K
    >Asrock Z77E-ITX Wifi
    >Asus GTX 670 Mini
    >Cooltek Coolcube Black
    >CM Silent Pro M700
    >Crucial M4 128Gb Msata
    >Cooler Master Seidon 120M
    Hell yes its a mini-ITX gaming rig!

  3. #3
    Xtreme Enthusiast
    Join Date
    Jun 2008
    Posts
    619
    I'd like to see a comparison between 4890 CF and a single 5870 in direct benchmarks on the same system. Didn't see any today in the reviews.
    ASRock 990FX Extreme4
    AMD FX 8350
    Kingston 16GB (4GBx4) DDR3 1333
    Gigabyte NVidia GTX 680 2GB
    Silverstone 1000W PSU

  4. #4
    Xtreme Enthusiast TheBlueChanell's Avatar
    Join Date
    Dec 2006
    Posts
    565
    I'd imagine drivers would also have some to do with it. The 5870 should pretty much always be faster than a 4870x2.
    Main: 900D - Prime 1000T - Asus Crosshair VI Extreme - R7 1700X @ 4.0ghz - RX Vega 64? - 32GB DDR4 3466 - 1TB 960 Pro -
    --- XSPC AX360 x3 - HK IV Pro - HK RX480 - HK 200 D5 - BP Compression ---
    HTPC: 250D - Prime 850T - Gigabyte G1 ITX - i7 6700K @ 4.5ghz - GTX 1080 Ti - 16GB 3200 - 1TB 960 Pro -
    --- ST30 x UT60 - Kyros HF - KryoGraphics 1080 - HK100 DDC - Monsoon Compression ---
    HV01: Define XL R2 - Prime 1200P - Asus Zenith Extreme - TR 1950X - RX580CF - 128GB DDR4 ECC - 512GB 960P - 4x 2TB RE
    HV02: Node 804 - Prime 850T - SuperMicro X1SSH - E3-1230 v6 - Vega FE - 64GB ECC - 512GB 960 Pro - 4x 6TB Gold -

  5. #5
    Registered User
    Join Date
    Jul 2006
    Location
    Guest in Thailand
    Posts
    74
    So it will take month again until the drivers catch up ? *yawn*

    Nobody should touch those latest and greatest until they run like they should - it would speed up this process by a fair margin... Reds and greens alike

  6. #6
    Xtreme Addict
    Join Date
    Dec 2006
    Location
    Malaysia
    Posts
    1,383
    dont think there will be a bottleneck by the pcie

    the 5870 compare to 4890 is botllenecked by its mem bandwidth in comparison... if only it was 512bit or 7-8ghz..
    then it would in my opinion might bottleneck a pcie 2.0

    but in high res games it practically kills the 4890..
    this i guess is where the bandwidth doesnt play much role as to computing power

  7. #7
    Xtreme Mentor
    Join Date
    Feb 2007
    Location
    Oxford, England
    Posts
    3,433
    Quote Originally Posted by cstkl1 View Post
    dont think there will be a bottleneck by the pcie

    the 5870 compare to 4890 is botllenecked by its mem bandwidth in comparison... if only it was 512bit or 7-8ghz..
    then it would in my opinion might bottleneck a pcie 2.0

    but in high res games it practically kills the 4890..
    this i guess is where the bandwidth doesnt play much role as to computing power
    an min fps... it kills EVERYTHING at min fps
    "Cast off your fear. Look forward. Never stand still, retreat and you will age. Hesitate and you will die. SHOUT! My name is…"
    //James

  8. #8
    Xtreme Member
    Join Date
    Dec 2008
    Posts
    250
    the 5870 compare to 4890 is botllenecked by its mem bandwidth in comparison... if only it was 512bit or 7-8ghz..
    then it would in my opinion might bottleneck a pcie 2.0

    but in high res games it practically kills the 4890..
    this i guess is where the bandwidth doesnt play much role as to computing power
    I think thats partly the case, however from reading ~5 reviews so far, I think that is partly that and partly VRAM bound. looking at the benches, the only cards that are able to outperform it are (single or multi GPU) cards that have 1GB++. If I had the money to buy a 5870, I would wait and get one (or two ) that had 2GB mem as I was assuming that would be the case.

    Plus Im not sure how many places have used the 9.10 beta drivers that AMD released, in the few test bits I scanned, they used 9.8 I think.
    TBA

  9. #9
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    Quote Originally Posted by mcmeat51 View Post
    looking at the benches, the only cards that are able to outperform it are (single or multi GPU) cards that have 1GB++. If I had the money to buy a 5870, I would wait and get one (or two ) that had 2GB mem as I was assuming that would be the case.
    In multigpu cards, that 2GB is mostly just the same texture memory mirrored for each one, so it behaves more like a 1GB card, although it's possible there are some optimizations to reduce memory usage.

    I've been told my card is shipping this morning, when I get it, if no one else has stepped up to the plate, I'll settle once and for all if the memory bandwidth is what's holding the card back.

  10. #10
    Xtreme Member
    Join Date
    Dec 2008
    Posts
    250
    In multigpu cards, that 2GB is mostly just the same texture memory mirrored for each one, so it behaves more like a 1GB card, although it's possible there are some optimizations to reduce memory usage.
    True however that seemes like a trend to me. what is the theoretical mem bus throughput of a 5870 and the gtx295?
    TBA

  11. #11
    Xtreme Member
    Join Date
    Jul 2009
    Location
    Madrid (Spain)
    Posts
    352
    Quote Originally Posted by mcmeat51 View Post
    True however that seemes like a trend to me. what is the theoretical mem bus throughput of a 5870 and the gtx295?
    Theoretical mem bus throughput in a dual GPU card (with AFR) it's the same case than memory occupation. It has twice the bandwidth, because it has one bus for each GPU, but chances are that most of the bandwidth of the second GPU is being used to send to the second GPU the same info that the first bus is sending to the first GPU, because both need the same data.

    With Alternate Frame Rendering, you have to think as if you have two completely different cards, each one rendering half of the frames (indeed, that's exactly what you have), so you can use the 2nd card to render the next frame while the 1st one is rendering the current one, if the CPU has finished to process that next frame.

    You can't compare theoretical specs of dual and single GPU solutions so easily.

  12. #12
    Registered User
    Join Date
    Dec 2008
    Posts
    26
    Quote Originally Posted by mcmeat51 View Post
    True however that seemes like a trend to me. what is the theoretical mem bus throughput of a 5870 and the gtx295?
    AFAIK theoretically AFR on 295 means that you have 275 with same amount of SPs, TMUs, ROPs, VRAM but with 2x core, SPs, bandwith clocks and 1 additional frame input lag(or same input lag as one gtx275).
    natyralnoe yvelichenie chlena na gratis.pp.ru

  13. #13
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,463
    Quote Originally Posted by hurleybird View Post
    I've been told my card is shipping this morning, when I get it, if no one else has stepped up to the plate, I'll settle once and for all if the memory bandwidth is what's holding the card back.
    take a couple of games and try them at three settings
    (1. medium quality, lower res, noAA)
    (2. high quality, 1080, 4xAA) and
    (3. Insane-o maximum quality, 1920 or 2560 or Eyefinity range res, 8xAA, 16xAF)

    Clock your memory from 800mhz to 1400+mhz in 100mhz or so increments, and record the data. Guru3d has a new GPU tool for overclocking and overvolting core/mem on 5800 cards. I would be interested to see your benchmarks. Another exciting bench coming out from a poster at ocforums.com is going to be 5870 crossfire tests ran extensively on a P55 vs. an X58 to test the 8x/8x vs. 16x/16x PCIE.

    I am really interested to see how your memory performance benchmarks go. good luck, and thanks!
    Bring... bring the amber lamps.
    [SIGPIC][/SIGPIC]

  14. #14
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    Actually, the plan is to clock both the memory and the core down to about half. That way the compute/bandwidth ratio stays the same and it's much easier to increase one or the other. Then I'll have the core or the memory stay the same while I increase on or the other in increments of maybe 150MHz. If the card reacts substantially more favorably to either core or memory frequency, we'll know there's an imbalance in the design.

    Hopefully I get mine in tommorow (express shipping), but I do live somewhat north...

  15. #15
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,463
    That might work too, but I would try both ways just to replicate real world scenarios.
    Bring... bring the amber lamps.
    [SIGPIC][/SIGPIC]

  16. #16
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    We'll just have to see how far my card overclocks. If I can't get the memory high enough I won't get very much data the other way. Good compromise might be to lower both clocks to 3/4 as opposed to half and go from there.

  17. #17
    Xtreme X.I.P.
    Join Date
    Nov 2002
    Location
    Shipai
    Posts
    31,147
    lots of talk, but barely anybody has a 5870 lol...
    im really curious about this myself, 5870 SHOULD be much faster than a 4890 considering ati doubled everything except for mem bw, which is still improved notably over a 4890... so its really odd the perf is only 40% higher than a 4890 on average... it could be that the shader cores are less efficient now that dx11 is added?

    w1zzard disabled some sps on his 5870 to simulate a 5850, it would be really cool if we could disable half the sps and connected rops and ideally tmus, to basically have a 5870 cut down to the same specs as a 4890 and we can do a clock for clock compare...

  18. #18
    Registered User
    Join Date
    May 2006
    Location
    The Netherlands
    Posts
    22
    The overclockworld would look completely different without Oskar Wu. Take me back to the ABIT & DFI era !

  19. #19
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,463
    Quote Originally Posted by db87 View Post
    This review is testing PCI-e lane link width. Hurleybird is testing memory bandwidth on the board by under/over clocking the memory frequency.
    Quote Originally Posted by saaya View Post
    a 5870 cut down to the same specs as a 4890 and we can do a clock for clock compare...
    that would be sweet.
    Bring... bring the amber lamps.
    [SIGPIC][/SIGPIC]

  20. #20
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    What tests do you want guys?

    I can run them for you now

    BTW first and foremost bottleneck is triangle setup. Same speed per clock as in RV770!
    Let's look for more
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  21. #21
    Registered User
    Join Date
    Dec 2008
    Posts
    26
    Quote Originally Posted by Lightman View Post
    BTW first and foremost bottleneck is triangle setup. Same speed per clock as in RV770!
    So G300 won't be much faster then gtx285 too?
    natyralnoe yvelichenie chlena na gratis.pp.ru

  22. #22
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,463
    Quote Originally Posted by Lightman View Post
    What tests do you want guys?
    I can run them for you now
    Bench a video game at different ram frequencies 800mhz - 1400mhz
    Bring... bring the amber lamps.
    [SIGPIC][/SIGPIC]

  23. #23
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by jaredpace View Post
    Bench a video game at different ram frequencies 800mhz - 1400mhz
    I came close CCC limits 900-1300MHz for mem

    All tests done @1920x1200













    Enjoy!
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  24. #24
    Xtreme Legend
    Join Date
    Jan 2003
    Location
    Stuttgart, Germany
    Posts
    929
    dont forget to check for memory error correction (more details on overclocking page of my review)

  25. #25
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    Well, from the initial results it looks like HD 5870 responds to more memory bandwidth, but 100MHz mem OC doesen't tell us much. For all we know 150 more MHz on the memory would stop scaling. What we want to know is the point where overclocking the the memory stops giving more performance, as well as the point where overclocking core stops giving more performance. Here's a better methodology:

    1. Download AMD GPU clock tool for HD 5870

    2. Underclock both core and memory to half frequency (425/600) to keep the same Compute/bandwidth ratio.

    3. Test at that frequency

    4. While keeping the core @ 425MHz, start increasing the memory clock by some increment, say 50-100MHz, until it stops scaling or becomes unstable.

    5. While keeping the memory @ 600MHz, start increasing the core clock by some increment, say 50-100MHz*, until it stops scaling or becomes unstable.

    6. Repeat with as many programs as you care for.

    7. Analyze

    *Might be easier to have the same % increment as for memory. For example, if you use 100MHz increments on both, each increment is more substantial for compute resources because it starts at a lower frequency (450 vs. 600MHz.) If you want to have proper proportions, a 100MHz increase on mem has the same significance as a ~71MHz (70.833) increase in compute. In other words, multiply whatever you choose as the memory increment by 0.70833 to get the amount you should increment core by.
    Last edited by hurleybird; 09-25-2009 at 10:53 AM.

Page 1 of 7 1234 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •